Search | VHL Regional Portal

Label-Aware Calibration and Relation-Preserving in Visual Intention Understanding.

Shi, QingHongYa; Ye, Mang; Huang, Wenke; Ruan, Weijian; Du, Bo.

IEEE Trans Image Process ; 33: 2627-2638, 2024.

Article in English | MEDLINE | ID: mdl-38536683

ABSTRACT

Visual intention understanding is a challenging task that explores the hidden intention behind the images of publishers in social media. Visual intention represents implicit semantics, whose ambiguous definition inevitably leads to label shifting and label blemish. The former indicates that the same image delivers intention discrepancies under different data augmentations, while the latter represents that the label of intention data is susceptible to errors or omissions during the annotation process. This paper proposes a novel method, called Label-aware Calibration and Relation-preserving (LabCR) to alleviate the above two problems from both intra-sample and inter-sample views. First, we disentangle the multiple intentions into a single intention for explicit distribution calibration in terms of the overall and the individual. Calibrating the class probability distributions in augmented instance pairs provides consistent inferred intention to address label shifting. Second, we utilize the intention similarity to establish correlations among samples, which offers additional supervision signals to form correlation alignments in instance pairs. This strategy alleviates the effect of label blemish. Extensive experiments have validated the superiority of the proposed method LabCR in visual intention understanding and pedestrian attribute recognition. Code is available at https://github.com/ShiQingHongYa/LabCR.

Cross-Modality Pyramid Alignment for Visual Intention Understanding.

Ye, Mang; Shi, Qinghongya; Su, Kehua; Du, Bo.

IEEE Trans Image Process ; 32: 2190-2201, 2023.

Article in English | MEDLINE | ID: mdl-37018096

ABSTRACT

Visual intention understanding is the task of exploring the potential and underlying meaning expressed in images. Simply modeling the objects or backgrounds within the image content leads to unavoidable comprehension bias. To alleviate this problem, this paper proposes a Cross-modality Pyramid Alignment with Dynamic optimization (CPAD) to enhance the global understanding of visual intention with hierarchical modeling. The core idea is to exploit the hierarchical relationship between visual content and textual intention labels. For visual hierarchy, we formulate the visual intention understanding task as a hierarchical classification problem, capturing multiple granular features in different layers, which corresponds to hierarchical intention labels. For textual hierarchy, we directly extract the semantic representation from intention labels at different levels, which supplements the visual content modeling without extra manual annotations. Moreover, to further narrow the domain gap between different modalities, a cross-modality pyramid alignment module is designed to dynamically optimize the performance of visual intention understanding in a joint learning manner. Comprehensive experiments intuitively demonstrate the superiority of our proposed method, outperforming existing visual intention understanding methods.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL