Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12167-12178, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37339038

RESUMO

In zero-shot learning (ZSL), the task of recognizing unseen categories when no data for training is available, state-of-the-art methods generate visual features from semantic auxiliary information (e.g., attributes). In this work, we propose a valid alternative (simpler, yet better scoring) to fulfill the very same task. We observe that, if first- and second-order statistics of the classes to be recognized were known, sampling from Gaussian distributions would synthesize visual features that are almost identical to the real ones as per classification purposes. We propose a novel mathematical framework to estimate first- and second-order statistics, even for unseen classes: our framework builds upon prior compatibility functions for ZSL and does not require additional training. Endowed with such statistics, we take advantage of a pool of class-specific Gaussian distributions to solve the feature generation stage through sampling. We exploit an ensemble mechanism to aggregate a pool of softmax classifiers, each trained in a one-seen-class-out fashion to better balance the performance over seen and unseen classes. Neural distillation is finally applied to fuse the ensemble into a single architecture which can perform inference through one forward pass only. Our method, termed Distilled Ensemble of Gaussian Generators, scores favorably with respect to state-of-the-art works.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12038-12049, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37134033

RESUMO

We propose an end-to-end solution to address the problem of object localisation in partial scenes, where we aim to estimate the position of an object in an unknown area given only a partial 3D scan of the scene. We propose a novel scene representation to facilitate the geometric reasoning, Directed Spatial Commonsense Graph (D-SCG), a spatial scene graph that is enriched with additional concept nodes from a commonsense knowledge base. Specifically, the nodes of D-SCG represent the scene objects and the edges are their relative positions. Each object node is then connected via different commonsense relationships to a set of concept nodes. With the proposed graph-based scene representation, we estimate the unknown position of the target object using a Graph Neural Network that implements a sparse attentional message passing mechanism. The network first predicts the relative positions between the target object and each visible object by learning a rich representation of the objects via aggregating both the object nodes and the concept nodes in D-SCG. These relative positions then are merged to obtain the final position. We evaluate our method using Partial ScanNet, improving the state-of-the-art by 5.9% in terms of the localisation accuracy at a 8x faster training speed.

3.
IEEE Trans Image Process ; 31: 7102-7115, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36346862

RESUMO

Acoustic images are an emergent data modality for multimodal scene understanding. Such images have the peculiarity of distinguishing the spectral signature of the sound coming from different directions in space, thus providing a richer information as compared to that derived from single or binaural microphones. However, acoustic images are typically generated by cumbersome and costly microphone arrays which are not as widespread as ordinary microphones. This paper shows that it is still possible to generate acoustic images from off-the-shelf cameras equipped with only a single microphone and how they can be exploited for audio-visual scene understanding. We propose three architectures inspired by Variational Autoencoder, U-Net and adversarial models, and we assess their advantages and drawbacks. Such models are trained to generate spatialized audio by conditioning them to the associated video sequence and its corresponding monaural audio track. Our models are trained using the data collected by a microphone array as ground truth. Thus they learn to mimic the output of an array of microphones in the very same conditions. We assess the quality of the generated acoustic images considering standard generation metrics and different downstream tasks (classification, cross-modal retrieval and sound localization). We also evaluate our proposed models by considering multimodal datasets containing acoustic images, as well as datasets containing just monaural audio signals and RGB video frames. In all of the addressed downstream tasks we obtain notable performances using the generated acoustic data, when compared to the state of the art and to the results obtained using real acoustic images as input.


Assuntos
Acústica , Localização de Som
4.
Data Brief ; 44: 108557, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-36111283

RESUMO

This data article presents an online yarn spinning dataset for evaluation and benchmarking of a variety of image processing algorithms and computer vision models for imaging based testing of textile yarn quality. The dataset comprises of continuous yarn spinning videos of 59.05 tex, 29.5 tex and 14.76 tex cotton yarns. These videos were recorded during yarn production on a ring spinning frame using a customised image acquisition system. Three videos of 250 meters yarn length each were recorded for all three yarn varieties. Each yarn spinning video was 29.26 gigabytes in size and contained 20200 image frames. After image acquisition, each yarn sample was physically tested on an industrial yarn quality tester to generate ground truth labels for various yarn quality parameters. The online yarn spinning dataset was recently used to validate computer vision models for online detection of nep like defects in yarn spinning process through a comparison of defect count with ground truth labels [1]. Similarly, in the future, this dataset can be used to evaluate performance of a variety of other imaging based online and offline yarn quality testing and defect detection systems.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...