Search | VHL Regional Portal

1.

GradMDM: Adversarial Attack on Dynamic Networks.

Pan, Jianhong; Foo, Lin Geng; Zheng, Qichen; Fan, Zhipeng; Rahmani, Hossein; Ke, Qiuhong; Liu, Jun.

IEEE Trans Pattern Anal Mach Intell ; 45(9): 11374-11381, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37015128

ABSTRACT

Dynamic neural networks can greatly reduce computation redundancy without compromising accuracy by adapting their structures based on the input. In this paper, we explore the robustness of dynamic neural networks against energy-oriented attacks targeted at reducing their efficiency. Specifically, we attack dynamic models with our novel algorithm GradMDM. GradMDM is a technique that adjusts the direction and the magnitude of the gradients to effectively find a small perturbation for each input, that will activate more computational units of dynamic models during inference. We evaluate GradMDM on multiple datasets and dynamic models, where it outperforms previous energy-oriented attack techniques, significantly increasing computation complexity while reducing the perceptibility of the perturbations https://github.com/lingengfoo/GradMDM.

2.

Human Action Recognition From Various Data Modalities: A Review.

Sun, Zehua; Ke, Qiuhong; Rahmani, Hossein; Bennamoun, Mohammed; Wang, Gang; Liu, Jun.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 3200-3225, 2023 03.

Article in English | MEDLINE | ID: mdl-35700242

ABSTRACT

Human Action Recognition (HAR) aims to understand human behavior and assign a label to each action. It has a wide range of applications, and therefore has been attracting increasing attention in the field of computer vision. Human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared, point cloud, event stream, audio, acceleration, radar, and WiFi signal, which encode different sources of useful yet distinct information and have various advantages depending on the application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this article, we present a comprehensive survey of recent progress in deep learning methods for HAR based on the type of input data modality. Specifically, we review the current mainstream deep learning methods for single data modalities and multiple data modalities, including the fusion-based and the co-learning-based frameworks. We also present comparative results on several benchmark datasets for HAR, together with insightful observations and inspiring future research directions.

Subject(s)

Algorithms , Pattern Recognition, Automated , Humans , Pattern Recognition, Automated/methods , Acceleration , Human Activities

3.

Video Joint Modelling Based on Hierarchical Transformer for Co-Summarization.

Li, Haopeng; Ke, Qiuhong; Gong, Mingming; Zhang, Rui.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 3904-3917, 2023 Mar.

Article in English | MEDLINE | ID: mdl-35759594

ABSTRACT

Video summarization aims to automatically generate a summary (storyboard or video skim) of a video, which can facilitate large-scale video retrieval and browsing. Most of the existing methods perform video summarization on individual videos, which neglects the correlations among similar videos. Such correlations, however, are also informative for video understanding and video summarization. To address this limitation, we propose Video Joint Modelling based on Hierarchical Transformer (VJMHT) for co-summarization, which takes into consideration the semantic dependencies across videos. Specifically, VJMHT consists of two layers of Transformer: the first layer extracts semantic representation from individual shots of similar videos, while the second layer performs shot-level video joint modelling to aggregate cross-video semantic information. By this means, complete cross-video high-level patterns are explicitly modelled and learned for the summarization of individual videos. Moreover, Transformer-based video representation reconstruction is introduced to maximize the high-level similarity between the summary and the original video. Extensive experiments are conducted to verify the effectiveness of the proposed modules and the superiority of VJMHT in terms of F-measure and rank-based evaluation.

4.

Hierarchical Paired Channel Fusion Network for Street Scene Change Detection.

Lei, Yinjie; Peng, Duo; Zhang, Pingping; Ke, Qiuhong; Li, Haifeng.

IEEE Trans Image Process ; 30: 55-67, 2021.

Article in English | MEDLINE | ID: mdl-33125327

ABSTRACT

Street Scene Change Detection (SSCD) aims to locate the changed regions between a given street-view image pair captured at different times, which is an important yet challenging task in the computer vision community. The intuitive way to solve the SSCD task is to fuse the extracted image feature pairs, and then directly measure the dissimilarity parts for producing a change map. Therefore, the key for the SSCD task is to design an effective feature fusion method that can improve the accuracy of the corresponding change maps. To this end, we present a novel Hierarchical Paired Channel Fusion Network (HPCFNet), which utilizes the adaptive fusion of paired feature channels. Specifically, the features of a given image pair are jointly extracted by a Siamese Convolutional Neural Network (SCNN) and hierarchically combined by exploring the fusion of channel pairs at multiple feature levels. In addition, based on the observation that the distribution of scene changes is diverse, we further propose a Multi-Part Feature Learning (MPFL) strategy to detect diverse changes. Based on the MPFL strategy, our framework achieves a novel approach to adapt to the scale and location diversities of the scene change regions. Extensive experiments on three public datasets (i.e., PCD, VL-CMU-CD and CDnet2014) demonstrate that the proposed framework achieves superior performance which outperforms other state-of-the-art methods with a considerable margin.

Subject(s)

Deep Learning , Image Processing, Computer-Assisted/methods , Databases, Factual , Neural Networks, Computer , Video Recording

5.

Learning Latent Global Network for Skeleton-based Action Prediction.

Ke, Qiuhong; Bennamoun, Mohammed; Rahmani, Hossein; An, Senjian; Sohel, Ferdous; Boussaid, Farid.

IEEE Trans Image Process ; 2019 Sep 02.

Article in English | MEDLINE | ID: mdl-31484121

ABSTRACT

Human actions represented with 3D skeleton sequences are robust to clustered backgrounds and illumination changes. In this paper, we investigate skeleton-based action prediction, which aims to recognize an action from a partial skeleton sequence that contains incomplete action information. We propose a new Latent Global Network based on adversarial learning for action prediction. We demonstrate that the proposed network provides latent long-term global information that is complementary to the local action information of the partial sequences and helps improve action prediction. We show that action prediction can be improved by combining the latent global information with the local action information. We test the proposed method on three challenging skeleton datasets and report state-of-the-art performance.

6.

Learning Clip Representations for Skeleton-Based 3D Action Recognition.

Ke, Qiuhong; Bennamoun, Mohammed; An, Senjian; Sohel, Ferdous; Boussaid, Farid.

IEEE Trans Image Process ; 27(6): 2842-2855, 2018 Jun.

Article in English | MEDLINE | ID: mdl-29570086

ABSTRACT

This paper presents a new representation of skeleton sequences for 3D action recognition. Existing methods based on hand-crafted features or recurrent neural networks cannot adequately capture the complex spatial structures and the long-term temporal dynamics of the skeleton sequences, which are very important to recognize the actions. In this paper, we propose to transform each channel of the 3D coordinates of a skeleton sequence into a clip. Each frame of the generated clip represents the temporal information of the entire skeleton sequence and one particular spatial relationship between the skeleton joints. The entire clip incorporates multiple frames with different spatial relationships, which provide useful spatial structural information of the human skeleton. We also propose a multitask convolutional neural network (MTCNN) to learn the generated clips for action recognition. The proposed MTCNN processes all the frames of the generated clips in parallel to explore the spatial and temporal information of the skeleton sequences. The proposed method has been extensively tested on six challenging benchmark datasets. Experimental results consistently demonstrate the superiority of the proposed clip representation and the feature learning method for 3D action recognition compared to the existing techniques.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL