Pesquisa | Portal Regional da BVS

Purely Attention Based Local Feature Integration for Video Classification.

Long, Xiang; de Melo, Gerard; He, Dongliang; Li, Fu; Chi, Zhizhen; Wen, Shilei; Gan, Chuang.

IEEE Trans Pattern Anal Mach Intell ; 44(4): 2140-2154, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-33026984

RESUMO

Recently, substantial research effort has focused on how to apply CNNs or RNNs to better capture temporal patterns in videos, so as to improve the accuracy of video classification. In this paper, we investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we first propose Basic Attention Clusters (BAC), which concatenates the output of multiple attention units applied in parallel, and introduce a shifting operation to capture more diverse signals. Experiments show that BAC can achieve excellent results on multiple datasets. However, BAC treats all feature channels as an indivisible whole, which is suboptimal for achieving a finer-grained local feature integration over the channel dimension. Additionally, it treats the entire local feature sequence as an unordered set, thus ignoring the sequential relationships. To improve over BAC, we further propose the channel pyramid attention schema by splitting features into sub-features at multiple scales for coarse-to-fine sub-feature interaction modeling, and propose the temporal pyramid attention schema by dividing the feature sequences into ordered sub-sequences of multiple lengths to account for the sequential order. Our final model pyramid×pyramid attention clusters (PPAC) combines both channel pyramid attention and temporal pyramid attention to focus on the most important sub-features, while also preserving the temporal information of the video. We demonstrate the effectiveness of PPAC on seven real-world video classification datasets. Our model achieves competitive results across all of these, showing that our proposed framework can consistently outperform the existing local feature integration methods across a range of different scenarios.

Assuntos

Algoritmos , Redes Neurais de Computação

Employing Shadows for Multi-Person Tracking Based on a Single RGB-D Camera.

Gai, Wei; Qi, Meng; Ma, Mingcong; Wang, Lu; Yang, Chenglei; Liu, Juan; Bian, Yulong; de Melo, Gerard; Liu, Shijun; Meng, Xiangxu.

Sensors (Basel) ; 20(4)2020 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-32075274

RESUMO

Although there are many algorithms to track people that are walking, existing methods mostly fail to cope with occluded bodies in the setting of multi-person tracking with one camera. In this paper, we propose a method to use people's shadows as a clue to track them instead of treating shadows as mere noise. We introduce a novel method to track multiple people by fusing shadow data from the RGB image with skeleton data, both of which are captured by a single RGB Depth (RGB-D) camera. Skeletal tracking provides the positions of people that can be captured directly, while their shadows are used to track them when they are no longer visible. Our experiments confirm that this method can efficiently handle full occlusions. It thus has substantial value in resolving the occlusion problem in multi-person tracking, even with other kinds of cameras.

Assuntos

Reconhecimento Automatizado de Padrão , Fotografação/instrumentação , Algoritmos , Humanos , Movimento (Física) , Fatores de Tempo

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA