Search | VHL Regional Portal

Salient Object Detection in the Deep Learning Era: An In-Depth Survey.

Wang, Wenguan; Lai, Qiuxia; Fu, Huazhu; Shen, Jianbing; Ling, Haibin; Yang, Ruigang.

IEEE Trans Pattern Anal Mach Intell ; 44(6): 3239-3259, 2022 06.

Article in English | MEDLINE | ID: mdl-33434124

ABSTRACT

As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years. Recent advances in SOD are predominantly led by deep learning-based solutions (named deep SOD). To enable in-depth understanding of deep SOD, in this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, we first review deep SOD algorithms from different perspectives, including network architecture, level of supervision, learning paradigm, and object-/instance-level detection. Following that, we summarize and analyze existing SOD datasets and evaluation metrics. Then, we benchmark a large group of representative SOD models, and provide detailed analyses of the comparison results. Moreover, we study the performance of SOD algorithms under different attribute settings, which has not been thoroughly explored previously, by constructing a novel SOD dataset with rich attribute annotations covering various salient object types, challenging factors, and scene categories. We further analyze, for the first time in the field, the robustness of SOD models to random input perturbations and adversarial attacks. We also look into the generalization and difficulty of existing SOD datasets. Finally, we discuss several open issues of SOD and outline future research directions. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are publicly available at https://github.com/wenguanwang/SODsurvey.

Subject(s)

Deep Learning , Algorithms , Attention , Benchmarking , Superoxide Dismutase

Peripheral Neural Interface.

Zhang, Peng; Li, Xiao; Hu, Dingyin; Lai, Qiuxia; Wang, Yuanyuan; Ma, Xuan; Xu, Qi; Li, Wei; Huang, Jian; He, Jiping.

Adv Exp Med Biol ; 1101: 91-122, 2019.

Article in English | MEDLINE | ID: mdl-31729673

ABSTRACT

Peripheral nervous system, widely spread in the whole body, is the important bridge for the transmission of neural signals. Signals from the central nervous system (brain and spinal cord) are transmitted to different parts of the body by the peripheral nerves, while along the way they also feedback all kinds of sensory information. Certain level of information integration and processing also occurs in the system. It has been shown that neural signals could be extracted from the distal end of the stump, indicating that the bridge is still effective after limb damage or amputation, which is the neurophysiological basis for the research and development of peripheral nerve interface for the prosthetic system.

Subject(s)

Peripheral Nerves , Signal Transduction , Central Nervous System , Humans , Nerve Regeneration , Peripheral Nerves/physiology , Prostheses and Implants , Spinal Cord

Multi-View Video Synopsis via Simultaneous Object-Shifting and View-Switching Optimization.

Zhang, Zhensong; Nie, Yongwei; Sun, Hanqiu; Zhang, Qing; Lai, Qiuxia; Li, Guiqing; Xiao, Mingyu.

IEEE Trans Image Process ; 2019 Sep 04.

Article in English | MEDLINE | ID: mdl-31494548

ABSTRACT

We present a method for synopsizing multiple videos captured by a set of surveillance cameras with some overlapped field-of-views. Currently, object-based approaches that directly shift objects along the time axis are already able to compute compact synopsis results for multiple surveillance videos. The challenge is how to present the multiple synopsis results in a more compact and understandable way. Previous approaches show them side by side on the screen, which however is difficult for user to comprehend. In this paper, we solve the problem by joint object-shifting and camera view-switching. Firstly, we synchronize the input videos, and group the same object in different videos together. Then we shift the groups of objects along the time axis to obtain multiple synopsis videos. Instead of showing them simultaneously, we just show one of them at each time, and allow to switch among the views of different synopsis videos. In this view switching way, we obtain just a single synopsis results consisting of content from all the input videos, which is much easier for user to follow and understand. To obtain the best synopsis result, we construct a simultaneous object-shifting and view-switching optimization framework instead of solving them separately. We also present an alternative optimization strategy composed of graph cuts and dynamic programming to solve the unified optimization. Experiments demonstrate that our single synopsis video generated from multiple input videos is compact, complete, and easy to understand.

Video Saliency Prediction using Spatiotemporal Residual Attentive Networks.

Lai, Qiuxia; Wang, Wenguan; Sun, Hanqiu; Shen, Jianbing.

IEEE Trans Image Process ; 2019 Aug 23.

Article in English | MEDLINE | ID: mdl-31449021

ABSTRACT

This paper proposes a novel residual attentive learning network architecture for predicting dynamic eye-fixation maps. The proposed model emphasizes two essential issues, i.e, effective spatiotemporal feature integration and multi-scale saliency learning. For the first problem, appearance and motion streams are tightly coupled via dense residual cross connections, which integrate appearance information with multi-layer, comprehensive motion features in a residual and dense way. Beyond traditional two-stream models learning appearance and motion features separately, such design allows early, multi-path information exchange between different domains, leading to a unified and powerful spatiotemporal learning architecture. For the second one, we propose a composite attention mechanism that learns multi-scale local attentions and global attention priors end-to-end. It is used for enhancing the fused spatiotemporal features via emphasizing important features in multi-scales. A lightweight convolutional Gated Recurrent Unit (convGRU), which is flexible for small training data situation, is used for long-term temporal characteristics modeling. Extensive experiments over four benchmark datasets clearly demonstrate the advantage of the proposed video saliency model over other competitors and the effectiveness of each component of our network. Our code and all the results will be available at https://github.com/ashleylqx/STRA-Net.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL