Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-37917525

RESUMO

Universal adversarial patch attacks, which are readily implemented, have been validated to be able to fool real-world deep convolutional neural networks (CNNs), posing a serious threat to practical computer vision systems based on CNNs. Unfortunately, current defending approaches are severely understudied facing the following problems. Patch detection-based methods suffer from dramatic performance drops against white-box or adaptive attacks since they rely heavily on empirical clues. Methods based on adversarial training or certified defense are difficult to be scaled up to large-scale datasets or complex practical networks due to prohibitively high computational overhead or over strong assumptions on the network structure. In this article, we focus on two cases of widely adopted universal adversarial patch attacks, namely the universal targeted attack on image classifiers and the universal vanishing attack on object detectors. We find that, for popular CNNs, the attacking success of the adversarial patch relies on feature vectors centered at the patch location with large norm in classifiers and large channel-aware norm (CA-Norm) in detectors, and further present a mathematical explanation for this phenomenon. Based on this, we propose a simple but effective defending method using the feature norm suppressing (FNS) layer, which can renormalize the feature norm by nonincreasing functions. As a differentiable module, FNS can be adaptively inserted in various CNN architectures to achieve multistage suppression of the generation of large norm feature vectors. Moreover, FNS is efficient with no trainable parameters and very low computational overhead. We evaluate our proposed defending method across multiple CNN architectures and datasets against the strong adaptive white-box attacks in both visual classification and detection tasks. In both tasks, FNS significantly outperforms previous defending methods on adversarial robustness with a relatively low influence on the performance of benign images. Code is available at https://github.com/jschenthu/FNS.

2.
IEEE Trans Image Process ; 31: 4278-4291, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35709111

RESUMO

Monocular 3D human pose estimation is challenging due to depth ambiguity. Convolution-based and Graph-Convolution-based methods have been developed to extract 3D information from temporal cues in motion videos. Typically, in the lifting-based methods, most recent works adopt the transformer to model the temporal relationship of 2D keypoint sequences. These previous works usually consider all the joints of a skeleton as a whole and then calculate the temporal attention based on the overall characteristics of the skeleton. Nevertheless, the human skeleton exhibits obvious part-wise inconsistency of motion patterns. It is therefore more appropriate to consider each part's temporal behaviors separately. To deal with such part-wise motion inconsistency, we propose the Part Aware Temporal Attention module to extract the temporal dependency of each part separately. Moreover, the conventional attention mechanism in 3D pose estimation usually calculates attention within a short time interval. This indicates that only the correlation within the temporal context is considered. Whereas, we find that the part-wise structure of the human skeleton is repeating across different periods, actions, and even subjects. Therefore, the part-wise correlation at a distance can be utilized to further boost 3D pose estimation. We thus propose the Part Aware Dictionary Attention module to calculate the attention for the part-wise features of input in a dictionary, which contains multiple 3D skeletons sampled from the training set. Extensive experimental results show that our proposed part aware attention mechanism helps a transformer-based model to achieve state-of-the-art 3D pose estimation performance on two widely used public datasets. The codes and the trained models are released at https://github.com/thuxyz19/3D-HPE-PAA.


Assuntos
Imageamento Tridimensional , Esqueleto , Humanos , Imageamento Tridimensional/métodos , Movimento (Física)
3.
Artigo em Inglês | MEDLINE | ID: mdl-33635791

RESUMO

Parkinson's Disease (PD) is a common neurodegenerative disease which impacts millions of people around the world. In clinical treatments, freezing of gait (FoG) is used as the typical symptom to assess PD patients' condition. Currently, the assessment of FoG is usually performed through live observation or video analysis by doctors. Considering the aging societies, such a manual inspection based approach may cause serious burdens on the healthcare systems. In this study, we propose a pure video-based method to automatically detect the shuffling step, which is the most indistinguishable type of FoG. Firstly, the RGB silhouettes which only contain legs and feet are fed into the feature extraction module to obtain multi-level features. 3D convolutions are used to aggregate both temporal and spatial information. Then the multi-level features are aggregated by the feature fusion. Skip connections are implemented to reserve information of high resolution and period-wise horizontal pyramid pooling is utilized to fuse both global context and local features. To validate the efficacy of our method, a dataset containing 268 normal gait samples and 362 shuffling step samples is built, on which our method achieves an average detection accuracy of 90.8%. Besides shuffling step detection, we demonstrate that our method can also assess the severity of walking abnormity. Our proposal facilitates a more frequent assessment of FoG with less manpower and lower cost, leading to more accurate monitoring of the patients' condition.


Assuntos
Transtornos Neurológicos da Marcha , Doenças Neurodegenerativas , Doença de Parkinson , Marcha , Transtornos Neurológicos da Marcha/diagnóstico , Humanos , Doença de Parkinson/diagnóstico , Caminhada
4.
Artigo em Inglês | MEDLINE | ID: mdl-32012014

RESUMO

Semantic attention has been shown to be effective in improving the performance of image captioning. The core of semantic attention based methods is to drive the model to attend to semantically important words, or attributes. In previous works, the attribute detector and the captioning network are usually independent, leading to the insufficient usage of the semantic information. Also, all the detected attributes, no matter whether they are appropriate for the linguistic context at the current step, are attended to through the whole caption generation process. This may sometimes disrupt the captioning model to attend to incorrect visual concepts. To solve these problems, we introduce two end-to-end trainable modules to closely couple attribute detection with image captioning as well as prompt the effective uses of attributes by predicting appropriate attributes at each time step. The multimodal attribute detector (MAD) module improves the attribute detection accuracy by using not only the image features but also the word embedding of attributes already existing in most captioning models. MAD models the similarity between the semantics of attributes and the image object features to facilitate accurate detection. The subsequent attribute predictor (SAP) module dynamically predicts a concise attribute subset at each time step to mitigate the diversity of image attributes. Compared to previous attribute based methods, our approach enhances the explainability in how the attributes affect the generated words and achieves a state-of-the-art single model performance of 128.8 CIDEr-D on the MSCOCO dataset. Extensive experiments on the MSCOCO dataset show that our proposal actually improves the performances in both image captioning and attribute detection simultaneously. The codes are available at: https://github.com/ RubickH/Image-Captioning-with-MAD-and-SAP.

5.
IEEE Trans Neural Syst Rehabil Eng ; 27(10): 1952-1961, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31502982

RESUMO

Non-volitional discontinuation of motion, namely bradykinesia, is a common motor symptom among patients with Parkinson's disease (PD). Evaluating bradykinesia severity is an important part of clinical examinations on PD patients in both diagnosis and monitoring phases. However, subjective evaluations from different clinicians often show low consistency. The research works that explore objective quantification of bradykinesia are mostly based on highly-integrated sensors. Although these sensor-based methods demonstrate applaudable performance, it is unrealistic to promote them for wide use because the special devices they require are far from popularized in daily lives. In this paper, we take advantage of computer vision and machine learning technologies, proposing a vision-based method to automatically and objectively quantify bradykinesia severity. Three bradykinesia-related items are investigated in our study: finger tapping, hand clasping and hand pro/supination. In our method, human pose estimation technology is utilized to extract kinematic characteristics and supervised-learning-based classifiers are employed to generate score ratings. Clinical experiment on 60 patients shows that the scoring accuracy of our method over 360 examination videos is 89.7%, which is competitive with other related works. The devices our method requires are only a camera for instrumentation and a laptop for data processing. Therefore, our method can produce reliable assessment results on Parkinsonian bradykinesia with minimal device requirement, showing great potential of realizing long-term remote monitoring on patients' condition.


Assuntos
Hipocinesia/diagnóstico , Interpretação de Imagem Assistida por Computador/métodos , Doença de Parkinson/diagnóstico , Idoso , Fenômenos Biomecânicos , Técnicas Biossensoriais , Feminino , Dedos , Marcha , Mãos , Força da Mão , Humanos , Hipocinesia/etiologia , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Doença de Parkinson/complicações , Postura
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...