Pesquisa | Portal Regional da BVS (teste)

Learning shared template representation with augmented feature for multi-object pose estimation.

Luo, Qifeng; Xu, Ting-Bing; Liu, Fulin; Li, Tianren; Wei, Zhenzhong.

Neural Netw ; 176: 106352, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38713968

RESUMO

Template matching pose estimation methods based on deep learning have made significant advancements via metric learning or reconstruction learning. Existing approaches primarily build distinct template representation libraries (codebooks) from rendered images for each object, which complicate the training process and increase memory cost for multi-object tasks. Additionally, they struggle to effectively handle discrepancies between the distributions of training and test sets, particularly for occluded objects, resulting in suboptimal matching accuracy. In this study, we propose a shared template representation learning method with augmented semantic features to address these issues. Our method learns representations concurrently using metric and reconstruction learning as similarity constraints, and augments response of network to objects through semantic feature constraints for better generalization performance. Furthermore, rotation matrices serve as templates for codebook construction, leading to excellent matching accuracy compared to rendered images. Notably, it contributes to the effective decoupling of object categories and templates, necessitating the maintenance of only a shared codebook in multi-object pose estimation tasks. Extensive experiments on Linemod, Linemod-Occluded and TLESS datasets demonstrate that the proposed method employing shared templates achieves superior matching accuracy. Moreover, proposed method exhibits robustness on a collected aircraft dataset, further validating its efficacy.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Humanos , Processamento de Imagem Assistida por Computador/métodos , Semântica , Algoritmos

Deep Neural Network Self-Distillation Exploiting Data Representation Invariance.

Xu, Ting-Bing; Liu, Cheng-Lin.

IEEE Trans Neural Netw Learn Syst ; 33(1): 257-269, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-33074828

RESUMO

To harvest small networks with high accuracies, most existing methods mainly utilize compression techniques such as low-rank decomposition and pruning to compress a trained large model into a small network or transfer knowledge from a powerful large model (teacher) to a small network (student). Despite their success in generating small models of high performance, the dependence of accompanying assistive models complicates the training process and increases memory and time cost. In this article, we propose an elegant self-distillation (SD) mechanism to obtain high-accuracy models directly without going through an assistive model. Inspired by the invariant recognition in the human vision system, different distorted instances of the same input should possess similar high-level data representations. Thus, we can learn data representation invariance between different distorted versions of the same sample. Especially, in our learning algorithm based on SD, the single network utilizes the maximum mean discrepancy metric to learn the global feature consistency and the Kullback-Leibler divergence to constrain the posterior class probability consistency across the different distorted branches. Extensive experiments on MNIST, CIFAR-10/100, and ImageNet data sets demonstrate that the proposed method can effectively reduce the generalization error for various network architectures, such as AlexNet, VGGNet, ResNet, Wide ResNet, and DenseNet, and outperform existing model distillation methods with little extra training efforts.

Dynamical Channel Pruning by Conditional Accuracy Change for Deep Neural Networks.

Chen, Zhiqiang; Xu, Ting-Bing; Du, Changde; Liu, Cheng-Lin; He, Huiguang.

IEEE Trans Neural Netw Learn Syst ; 32(2): 799-813, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-32275616

RESUMO

Channel pruning is an effective technique that has been widely applied to deep neural network compression. However, many existing methods prune from a pretrained model, thus resulting in repetitious pruning and fine-tuning processes. In this article, we propose a dynamical channel pruning method, which prunes unimportant channels at the early stage of training. Rather than utilizing some indirect criteria (e.g., weight norm, absolute weight sum, and reconstruction error) to guide connection or channel pruning, we design criteria directly related to the final accuracy of a network to evaluate the importance of each channel. Specifically, a channelwise gate is designed to randomly enable or disable each channel so that the conditional accuracy changes (CACs) can be estimated under the condition of each channel disabled. Practically, we construct two effective and efficient criteria to dynamically estimate CAC at each iteration of training; thus, unimportant channels can be gradually pruned during the training process. Finally, extensive experiments on multiple data sets (i.e., ImageNet, CIFAR, and MNIST) with various networks (i.e., ResNet, VGG, and MLP) demonstrate that the proposed method effectively reduces the parameters and computations of baseline network while yielding the higher or competitive accuracy. Interestingly, if we Double the initial Channels and then Prune Half (DCPH) of them to baseline's counterpart, it can enjoy a remarkable performance improvement by shaping a more desirable structure.

Assuntos

Aprendizado Profundo , Redes Neurais de Computação , Algoritmos , Inteligência Artificial , Sistemas Computacionais , Compressão de Dados , Bases de Dados Factuais , Humanos , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA