Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13117-13133, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37390000

ABSTRACT

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets and define two real practical instance-level retrieval tasks that enable evaluations on price comparison and personalized recommendations. For both instance-level tasks, accurately identifying the intended product target mentioned in visual-linguistic data and mitigating the impact of irrelevant content are quite challenging. To address this, we devise a more effective cross-modal pretraining model capable of adaptively incorporating key concept information from multi-modal data. This is accomplished by utilizing an entity graph, where nodes represented entities and edges denoted the similarity relations between them. Specifically, a novel Entity-Graph Enhanced Cross-Modal Pretraining (EGE-CMP) model is proposed for instance-level commodity retrieval, which explicitly injects entity knowledge in both node-based and subgraph-based ways into the multi-modal networks via a self-supervised hybrid-stream transformer. This could reduce the confusion between different object contents, thereby effectively guiding the network to focus on entities with real semantics. Experimental results sufficiently verify the efficacy and generalizability of our EGE-CMP, outperforming several SOTA cross-modal baselines like CLIP Radford et al. 2021, UNITER Chen et al. 2020 and CAPTURE Zhan et al. 2021.

2.
IEEE Trans Image Process ; 28(8): 3703-3713, 2019 Aug.
Article in English | MEDLINE | ID: mdl-30835222

ABSTRACT

Recognizing a camera wearer's actions from videos captured by an egocentric camera is a challenging task. In this paper, we employ a two-stream deep neural network composed of an appearance-based stream and a motion-based stream to recognize egocentric actions. Based on the insight that human action and gaze behavior are highly coordinated in object manipulation tasks, we propose a spatial attention network to predict human gaze in the form of attention map. The attention map helps each of the two streams to focus on the most relevant spatial region of the video frames to predict actions. To better model the temporal structure of the videos, a temporal network is proposed. The temporal network incorporates bi-directional long short-term memory to model the long-range dependencies to recognize egocentric actions. The experimental results demonstrate that our method is able to predict attention maps that are consistent with human attention and achieve competitive action recognition performance with the state-of-the-art methods on the GTEA Gaze and GTEA Gaze+ datasets.


Subject(s)
Attention/physiology , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Pattern Recognition, Automated/methods , Databases, Factual , Human Activities/classification , Humans , Video Recording
3.
IEEE Trans Image Process ; 24(11): 3257-65, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26054068

ABSTRACT

In this paper, we investigate how the recently emerged photography technology--the light field--can benefit depth map estimation, a challenging computer vision problem. A novel framework is proposed to reconstruct continuous depth maps from light field data. Unlike many traditional methods for the stereo matching problem, the proposed method does not need to quantize the depth range. By making use of the structure information amongst the densely sampled views in light field data, we can obtain dense and relatively reliable local estimations. Starting from initial estimations, we go on to propose an optimization method based on solving a sparse linear system iteratively with a conjugate gradient method. Two different affinity matrices for the linear system are employed to balance the efficiency and quality of the optimization. Then, a depth-assisted segmentation method is introduced so that different segments can employ different affinity matrices. Experiment results on both synthetic and real light fields demonstrate that our continuous results are more accurate, efficient, and able to preserve more details compared with discrete approaches.

4.
IEEE Trans Cybern ; 45(9): 1864-75, 2015 Sep.
Article in English | MEDLINE | ID: mdl-25423662

ABSTRACT

Gait, as a promising biometric for recognizing human identities, can be nonintrusively captured as a series of acceleration signals using wearable or portable smart devices. It can be used for access control. Most existing methods on accelerometer-based gait recognition require explicit step-cycle detection, suffering from cycle detection failures and intercycle phase misalignment. We propose a novel algorithm that avoids both the above two problems. It makes use of a type of salient points termed signature points (SPs), and has three components: 1) a multiscale SP extraction method, including the localization and SP descriptors; 2) a sparse representation scheme for encoding newly emerged SPs with known ones in terms of their descriptors, where the phase propinquity of the SPs in a cluster is leveraged to ensure the physical meaningfulness of the codes; and 3) a classifier for the sparse-code collections associated with the SPs of a series. Experimental results on our publicly available dataset of 175 subjects showed that our algorithm outperformed existing methods, even if the step cycles were perfectly detected for them. When the accelerometers at five different body locations were used together, it achieved the rank-1 accuracy of 95.8% for identification, and the equal error rate of 2.2% for verification.


Subject(s)
Accelerometry/methods , Biometric Identification/methods , Gait/physiology , Algorithms , Cluster Analysis , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...