Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Type of study
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38153822

ABSTRACT

Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video. It requires not only a comprehensive understanding of each object scattered on the whole scene but also a deep dive into their temporal motions and interactions. Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image and temporal consistency/transition correlations across different images, which can serve as prior knowledge to facilitate VidSGG model learning and inference. In this work, we propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism to learn more representative relationship representations. Specifically, we first learn spatial co-occurrence and temporal transition correlations in a statistical manner. Then, we design spatial and temporal knowledge-embedded layers that introduce the multi-head cross-attention mechanism to fully explore the interaction between visual representation and the knowledge to generate spatial- and temporal-embedded representations, respectively. Finally, we aggregate these representations for each subject-object pair to predict the final semantic labels and their relationships. Extensive experiments show that STKET outperforms current competing algorithms by a large margin, e.g., improving the mR@50 by 8.1%, 4.7%, and 2.1% on different settings over current algorithms.

2.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9887-9903, 2022 12.
Article in English | MEDLINE | ID: mdl-34847019

ABSTRACT

Facial expression recognition (FER) has received significant attention in the past decade with witnessed progress, but data inconsistencies among different FER datasets greatly hinder the generalization ability of the models learned on one dataset to another. Recently, a series of cross-domain FER algorithms (CD-FERs) have been extensively developed to address this issue. Although each declares to achieve superior performance, comprehensive and fair comparisons are lacking due to inconsistent choices of the source/target datasets and feature extractors. In this work, we first propose to construct a unified CD-FER evaluation benchmark, in which we re-implement the well-performing CD-FER and recently published general domain adaptation algorithms and ensure that all these algorithms adopt the same source/target datasets and feature extractors for fair CD-FER evaluations. Based on the analysis, we find that most of the current state-of-the-art algorithms use adversarial learning mechanisms that aim to learn holistic domain-invariant features to mitigate domain shifts. However, these algorithms ignore local features, which are more transferable across different datasets and carry more detailed content for fine-grained adaptation. Therefore, we develop a novel adversarial graph representation adaptation (AGRA) framework that integrates graph representation propagation with adversarial learning to realize effective cross-domain holistic-local feature co-adaptation. Specifically, our framework first builds two graphs to correlate holistic and local regions within each domain and across different domains, respectively. Then, it extracts holistic-local features from the input image and uses learnable per-class statistical distributions to initialize the corresponding graph nodes. Finally, two stacked graph convolution networks (GCNs) are adopted to propagate holistic-local features within each domain to explore their interaction and across different domains for holistic-local feature co-adaptation. In this way, the AGRA framework can adaptively learn fine-grained domain-invariant features and thus facilitate cross-domain expression recognition. We conduct extensive and fair comparisons on the unified evaluation benchmark and show that the proposed AGRA framework outperforms previous state-of-the-art methods.


Subject(s)
Algorithms , Facial Recognition , Benchmarking , Learning
3.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1371-1384, 2022 03.
Article in English | MEDLINE | ID: mdl-32986543

ABSTRACT

Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture sequential region/label dependencies, which cannot fully explore mutual interactions among the semantic regions/labels and do not explicitly integrate label co-occurrences. In addition, these works require large amounts of training samples for each category, and they are unable to generalize to novel categories with limited samples. To address these issues, we propose a knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks. The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples. Specifically, it first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features to initialize the graph, and it exploits a graph propagation network to explore graph node interactions, enabling learning contextualized image feature representations. Moreover, we initialize each graph node with the classifier weights for the corresponding label and apply another propagation network to transfer node messages through the graph. In this way, it can facilitate exploiting the information of correlated labels to help train better classifiers, especially for labels with limited training samples. We conduct extensive experiments on the traditional multi-label image recognition (MLR) and multi-label few-shot learning (ML-FSL) tasks and show that our KGGR framework outperforms the current state-of-the-art methods by sizable margins on the public benchmarks.


Subject(s)
Algorithms , Neural Networks, Computer , Benchmarking , Machine Learning , Semantics
4.
Nanotechnology ; 32(10): 105402, 2021 Mar 05.
Article in English | MEDLINE | ID: mdl-33242845

ABSTRACT

Transition metal oxides with high theoretical capacities are widely investigated as potential anodes for alkali-metal ion batteries. However, the intrinsic conductivity deficiency and large volume changes during cycles result in poor cycling stability and low rate capabilities. Graphene has been widely used to support metal oxide for enhanced performance, but the cycling life is limited by the aggregation/collapse of active materials on graphene surface. Herein, we significantly improve the battery performance of graphene-metal oxide composite via pore engineering and surface protection. In this architecture, the mesoporous NiFe2O4 is designed for fast ion diffusion and volume accommodation, and the outer graphene protection can further enhance the electrical conductivity and prevent the aggregation during cycle. Thus, as-prepared G@p-NiFe2O4@G composite for lithium storage delivers high capacity (1244 mA h g-1 after 300 cycles at 0.2 A g-1), excellent rate performance (563 mA h g-1 at 4 A g-1), and outstanding cycling life up to 1200 cycles at 1.5 A g-1. For sodium storage, it also displays good cycling stability and superior rate performance. Moreover, the effects of various microstructures on the battery performance, the reaction kinetics of various electrodes, and the reaction mechanism of NiFe2O4 have been systematically investigated in this work.

5.
Nanotechnology ; 30(46): 465402, 2019 Nov 15.
Article in English | MEDLINE | ID: mdl-31426037

ABSTRACT

In this work, we report a high-performance anode material created by rationally encapsulating multi-walled carbon nanotubes (MWNTs) within hollow Fe3O4 nanotubes followed by applying a carbon coating. When tested for lithium storage, as-prepared MWNT@hollow Fe3O4@C coaxial nanotubes present high specific capacity, superior rate performance, and outstanding cycling stability. It is capable of delivering high capacities of 758 mA h g-1 at 500th cycle at 0.2 A g-1, and 409 mA h g-1 after 1000 cycles at a high rate of 1.5 A g-1. This excellent performance can be attributed to its unique architecture, which provides high electrical conductivity, offers enough void space for volume accommodation, and mitigates the pulverization of Fe3O4 during cycles.

6.
IEEE Trans Cybern ; 45(1): 89-102, 2015 Jan.
Article in English | MEDLINE | ID: mdl-24860044

ABSTRACT

In this paper, we study a novel hierarchical background model for intelligent video surveillance with the pan-tilt-zoom (PTZ) camera, and give rise to an integrated system consisting of three key components: background modeling, observed frame registration, and object tracking. First, we build the hierarchical background model by separating the full range of continuous focal lengths of a PTZ camera into several discrete levels and then partitioning the wide scene at each level into many partial fixed scenes. In this way, the wide scenes captured by a PTZ camera through rotation and zoom are represented by a hierarchical collection of partial fixed scenes. A new robust feature is presented for background modeling of each partial scene. Second, we locate the partial scenes corresponding to the observed frame in the hierarchical background model. Frame registration is then achieved by feature descriptor matching via fast approximate nearest neighbor search. Afterwards, foreground objects can be detected using background subtraction. Last, we configure the hierarchical background model into a framework to facilitate existing object tracking algorithms under the PTZ camera. Foreground extraction is used to assist tracking an object of interest. The tracking outputs are fed back to the PTZ controller for adjusting the camera properly so as to maintain the tracked object in the image plane. We apply our system on several challenging scenarios and achieve promising results.

SELECTION OF CITATIONS
SEARCH DETAIL
...