Pesquisa | Portal Regional da BVS

Learning Graph Embeddings for Open World Compositional Zero-Shot Learning.

Mancini, Massimiliano; Naeem, Muhammad Ferjad; Xian, Yongqin; Akata, Zeynep.

IEEE Trans Pattern Anal Mach Intell ; PP2022 Mar 30.

Artigo em Inglês | MEDLINE | ID: mdl-35353693

RESUMO

Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions of state and object visual primitives seen during training. A problem with standard CZSL is the assumption of knowing which unseen compositions will be available at test time. In this work, we overcome this assumption operating on the open world setting, where no limit is imposed on the compositional space at test time, and the search space contains a large number of unseen compositions. To address this problem, we propose a new approach, Compositional Cosine Graph Embedding (Co-CGE), based on two principles. First, Co-CGE models the dependency between states, objects and their compositions through a graph convolutional neural network. The graph propagates information from seen to unseen concepts, improving their representations. Second, since not all unseen compositions are equally feasible, and less feasible ones may damage the learned representations, Co-CGE estimates a feasibility score for each unseen composition, using the scores as margins in a cosine similarity-based loss and as weights in the adjacency matrix of the graphs. Experiments show that our approach achieves state-of-the-art performances in standard CZSL while outperforming previous methods in the open world scenario.

Generalized Few-Shot Video Classification With Video Retrieval and Feature Generation.

Xian, Yongqin; Korbar, Bruno; Douze, Matthijs; Torresani, Lorenzo; Schiele, Bernt; Akata, Zeynep.

IEEE Trans Pattern Anal Mach Intell ; 44(12): 8949-8961, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-34652997

RESUMO

Few-shot learning aims to recognize novel classes from a few examples. Although significant progress has been made in the image domain, few-shot video classification is relatively unexplored. We argue that previous methods underestimate the importance of video feature learning and propose to learn spatiotemporal features using a 3D CNN. Proposing a two-stage approach that learns video features on base classes followed by fine-tuning the classifiers on novel classes, we show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks. To circumvent the need of labeled examples, we present two novel approaches that yield further improvement. First, we leverage tag-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities. Second, we learn generative adversarial networks that generate video features of novel classes from their semantic embeddings. Moreover, we find existing benchmarks are limited because they only focus on 5 novel classes in each testing episode and introduce more realistic benchmarks by involving more novel classes, i.e., few-shot learning, as well as a mixture of novel and base classes, i.e., generalized few-shot learning. The experimental results show that our retrieval and feature generation approach significantly outperform the baseline approach on the new benchmarks.

Zero-Shot Learning-A Comprehensive Evaluation of the Good, the Bad and the Ugly.

Xian, Yongqin; Lampert, Christoph H; Schiele, Bernt; Akata, Zeynep.

IEEE Trans Pattern Anal Mach Intell ; 41(9): 2251-2265, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-30028691

RESUMO

Due to the importance of zero-shot learning, i.e., classifying images where there is a lack of labeled training data, the number of proposed approaches has recently increased steadily. We argue that it is time to take a step back and to analyze the status quo of the area. The purpose of this paper is three-fold. First, given the fact that there is no agreed upon zero-shot learning benchmark, we first define a new benchmark by unifying both the evaluation protocols and data splits of publicly available datasets used for this task. This is an important contribution as published results are often not comparable and sometimes even flawed due to, e.g., pre-training on zero-shot test classes. Moreover, we propose a new zero-shot learning dataset, the Animals with Attributes 2 (AWA2) dataset which we make publicly available both in terms of image features and the images themselves. Second, we compare and analyze a significant number of the state-of-the-art methods in depth, both in the classic zero-shot setting but also in the more realistic generalized zero-shot setting. Finally, we discuss in detail the limitations of the current status of the area which can be taken as a basis for advancing it.

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA