A Robust Visual Tracking Method Based on Reconstruction Patch Transformer Tracking.
Sensors (Basel)
; 22(17)2022 Aug 31.
Article
em En
| MEDLINE
| ID: mdl-36081017
Recently, the transformer model has progressed from the field of visual classification to target tracking. Its primary method replaces the cross-correlation operation in the Siamese tracker. The backbone of the network is still a convolutional neural network (CNN). However, the existing transformer-based tracker simply deforms the features extracted by the CNN into patches and feeds them into the transformer encoder. Each patch contains a single element of the spatial dimension of the extracted features and inputs into the transformer structure to use cross-attention instead of cross-correlation operations. This paper proposes a reconstruction patch strategy which combines the extracted features with multiple elements of the spatial dimension into a new patch. The reconstruction operation has the following advantages: (1) the correlation between adjacent elements combines well, and the features extracted by the CNN are usable for classification and regression; (2) using the performer operation reduces the amount of network computation and the dimension of the patch sent to the transformer, thereby sharply reducing the network parameters and improving the model-tracking speed.
Palavras-chave
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Fontes de Energia Elétrica
/
Redes Neurais de Computação
Idioma:
En
Revista:
Sensors (Basel)
Ano de publicação:
2022
Tipo de documento:
Article
País de afiliação:
China
País de publicação:
Suíça