Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38607710

ABSTRACT

Audio-visual video recognition (AVVR) aims to integrate audio and visual clues to categorize videos accurately. While existing methods train AVVR models using provided datasets and achieve satisfactory results, they struggle to retain historical class knowledge when confronted with new classes in real-world situations. Currently, there are no dedicated methods for addressing this problem, so this paper concentrates on exploring Class Incremental Audio-Visual Video Recognition (CIAVVR). For CIAVVR, since both stored data and learned model of past classes contain historical knowledge, the core challenge is how to capture past data knowledge and past model knowledge to prevent catastrophic forgetting. As audio-visual data and model inherently contain hierarchical structures, i.e., model embodies low-level and high-level semantic information, and data comprises snippet-level, video-level, and distribution-level spatial information, it is essential to fully exploit the hierarchical data structure for data knowledge preservation and hierarchical model structure for model knowledge preservation. However, current image class incremental learning methods do not explicitly consider these hierarchical structures in model and data. Consequently, we introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models, respectively. Specifically, HAM implements a novel augmentation strategy, segmental feature augmentation, to preserve hierarchical model knowledge. Meanwhile, HDM introduces newly designed hierarchical (video-distribution) logical distillation and hierarchical (snippet-video) correlative distillation to capture and maintain the hierarchical intra-sample knowledge of each data and the hierarchical inter-sample knowledge between data, respectively. Evaluations on four benchmarks (AVE, AVK-100, AVK-200, and AVK-400) demonstrate that the proposed HAD effectively captures hierarchical information in both data and models, resulting in better preservation of historical class knowledge and improved performance. Furthermore, we provide a theoretical analysis to support the necessity of the segmental feature augmentation strategy.

2.
IEEE Trans Image Process ; 33: 1070-1079, 2024.
Article in English | MEDLINE | ID: mdl-38285573

ABSTRACT

Text field labelling plays a key role in Key Information Extraction (KIE) from structured document images. However, existing methods ignore the field drift and outlier problems, which limit their performance and make them less robust. This paper casts the text field labelling problem into a partial graph matching problem and proposes an end-to-end trainable framework called Deep Partial Graph Matching (dPGM) for the one-shot KIE task. It represents each document as a graph and estimates the correspondence between text fields from different documents by maximizing the graph similarity of different documents. Our framework obtains a strict one-to-one correspondence by adopting a combinatorial solver module with an extra one-to-(at most)-one mapping constraint to do the exact graph matching, which leads to the robustness of the field drift problem and the outlier problem. Finally, a large one-shot KIE dataset named DKIE is collected and annotated to promote research of the KIE task. This dataset will be released to the research and industry communities. Extensive experiments on both the public and our new DKIE datasets show that our method can achieve state-of-the-art performance and is more robust than existing methods.

3.
IEEE Trans Image Process ; 26(9): 4182-4192, 2017 Sep.
Article in English | MEDLINE | ID: mdl-28541200

ABSTRACT

In the literature, most existing graph-based semi-supervised learning methods only use the label information of observed samples in the label propagation stage, while ignoring such valuable information when learning the graph. In this paper, we argue that it is beneficial to consider the label information in the graph learning stage. Specifically, by enforcing the weight of edges between labeled samples of different classes to be zero, we explicitly incorporate the label information into the state-of-the-art graph learning methods, such as the low-rank representation (LRR), and propose a novel semi-supervised graph learning method called semi-supervised low-rank representation. This results in a convex optimization problem with linear constraints, which can be solved by the linearized alternating direction method. Though we take LRR as an example, our proposed method is in fact very general and can be applied to any self-representation graph learning methods. Experiment results on both synthetic and real data sets demonstrate that the proposed graph learning method can better capture the global geometric structure of the data, and therefore is more effective for semi-supervised learning tasks.

4.
IEEE Trans Image Process ; 24(11): 3717-28, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26057712

ABSTRACT

This paper aims at constructing a good graph to discover the intrinsic data structures under a semisupervised learning setting. First, we propose to build a nonnegative low-rank and sparse (referred to as NNLRS) graph for the given data representation. In particular, the weights of edges in the graph are obtained by seeking a nonnegative low-rank and sparse reconstruction coefficients matrix that represents each data sample as a linear combination of others. The so-obtained NNLRS-graph captures both the global mixture of subspaces structure (by the low-rankness) and the locally linear structure (by the sparseness) of the data, hence it is both generative and discriminative. Second, as good features are extremely important for constructing a good graph, we propose to learn the data embedding matrix and construct the graph simultaneously within one framework, which is termed as NNLRS with embedded features (referred to as NNLRS-EF). Extensive NNLRS experiments on three publicly available data sets demonstrate that the proposed method outperforms the state-of-the-art graph construction method by a large margin for both semisupervised classification and discriminative analysis, which verifies the effectiveness of our proposed method.

SELECTION OF CITATIONS
SEARCH DETAIL
...