Search | VHL Regional Portal

1.

Capsule Networks With Residual Pose Routing.

Liu, Yi; Cheng, De; Zhang, Dingwen; Xu, Shoukun; Han, Jungong.

IEEE Trans Neural Netw Learn Syst ; PP2024 Jan 09.

Article in English | MEDLINE | ID: mdl-38194388

ABSTRACT

Capsule networks (CapsNets) have been known difficult to develop a deeper architecture, which is desirable for high performance in the deep learning era, due to the complex capsule routing algorithms. In this article, we present a simple yet effective capsule routing algorithm, which is presented by a residual pose routing. Specifically, the higher-layer capsule pose is achieved by an identity mapping on the adjacently lower-layer capsule pose. Such simple residual pose routing has two advantages: 1) reducing the routing computation complexity and 2) avoiding gradient vanishing due to its residual learning framework. On top of that, we explicitly reformulate the capsule layers by building a residual pose block. Stacking multiple such blocks results in a deep residual CapsNets (ResCaps) with a ResNet-like architecture. Results on MNIST, AffNIST, SmallNORB, and CIFAR-10/100 show the effectiveness of ResCaps for image classification. Furthermore, we successfully extend our residual pose routing to large-scale real-world applications, including 3-D object reconstruction and classification, and 2-D saliency dense prediction. The source code has been released on https://github.com/liuyi1989/ResCaps.

2.

FAFuse: A Four-Axis Fusion framework of CNN and Transformer for medical image segmentation.

Xu, Shoukun; Xiao, Dehao; Yuan, Baohua; Liu, Yi; Wang, Xueyuan; Li, Ning; Shi, Lin; Chen, Jialu; Zhang, Ju-Xiao; Wang, Yanhao; Cao, Jianfeng; Shao, Yeqin; Jiang, Mingjie.

Comput Biol Med ; 166: 107567, 2023 Oct 13.

Article in English | MEDLINE | ID: mdl-37852109

ABSTRACT

Medical image segmentation is crucial for accurate diagnosis and treatment in the medical field. In recent years, convolutional neural networks (CNNs) and Transformers have been frequently adopted as network architectures in medical image segmentation. The convolution operation is limited in modeling long-range dependencies because it can only extract local information through the limited receptive field. In comparison, Transformers demonstrate excellent capability in modeling long-range dependencies but are less effective in capturing local information. Hence, effectively modeling long-range dependencies while preserving local information is essential for accurate medical image segmentation. In this paper, we propose a four-axis fusion framework called FAFuse, which can exploit the advantages of CNN and Transformer. As the core component of our FAFuse, a Four-Axis Fusion module (FAF) is proposed to efficiently fuse global and local information. FAF combines Four-Axis attention (height, width, main diagonal, and counter diagonal axial attention), a multi-scale convolution, and a residual structure with a depth-separable convolution and a Hadamard product. Furthermore, we also introduce deep supervision to enhance gradient flow and improve overall performance. Our approach achieves state-of-the-art segmentation accuracy on three publicly available medical image segmentation datasets. The code is available at https://github.com/cczu-xiao/FAFuse.

3.

Disentangled Capsule Routing for Fast Part-Object Relational Saliency.

Liu, Yi; Zhang, Dingwen; Liu, Nian; Xu, Shoukun; Han, Jungong.

IEEE Trans Image Process ; 31: 6719-6732, 2022.

Article in English | MEDLINE | ID: mdl-36282823

ABSTRACT

Recently, the Part-Object Relational (POR) saliency underpinned by the Capsule Network (CapsNet) has been demonstrated to be an effective modeling mechanism to improve the saliency detection accuracy. However, it is widely known that the current capsule routing operations have huge computational complexity, which seriously limited the usability of the POR saliency models in real-time applications. To this end, this paper takes an early step towards a fast POR saliency inference by proposing a novel disentangled part-object relational network. Concretely, we disentangle horizontal routing and vertical routing from the original omnidirectional capsule routing, thus generating Disentangled Capsule Routing (DCR). This mechanism enjoys two advantages. On one hand, DCR that disentangles orthogonal 1D (i.e., vertical and horizontal) routing greatly reduces parameters and routing complexity, resulting in much faster inference than omnidirectional 2D routing adopted by existing CapsNets. On the other hand, thanks to the light POR cues explored by DCR, we could conveniently integrate the part-object routing process to different feature layers in CNN, rather than just applying it to the small-scaled one as in previous works. This helps to increase saliency inference accuracy. Compared to previous POR saliency detectors, DPORTNet infers visual saliency (5 â¼ 9 ) × faster, and is more accurate. DPORTNet is available under the open-source license at https://github.com/liuyi1989/DCR.

4.

Source localization and functional network analysis in emotion cognitive reappraisal with EEG-fMRI integration.

Li, Wenjie; Zhang, Wei; Jiang, Zhongyi; Zhou, Tiantong; Xu, Shoukun; Zou, Ling.

Front Hum Neurosci ; 16: 960784, 2022.

Article in English | MEDLINE | ID: mdl-36034109

ABSTRACT

Background: The neural activity and functional networks of emotion-based cognitive reappraisal have been widely investigated using electroencephalography (EEG) and functional magnetic resonance imaging (fMRI). However, single-mode neuroimaging techniques are limited in exploring the regulation process with high temporal and spatial resolution. Objectives: We proposed a source localization method with multimodal integration of EEG and fMRI and tested it in the source-level functional network analysis of emotion cognitive reappraisal. Methods: EEG and fMRI data were simultaneously recorded when 15 subjects were performing the emotional cognitive reappraisal task. Fused priori weighted minimum norm estimation (FWMNE) with sliding windows was proposed to trace the dynamics of EEG source activities, and the phase lag index (PLI) was used to construct the functional brain network associated with the process of downregulating negative affect using the reappraisal strategy. Results: The functional networks were constructed with the measure of PLI, in which the important regions were indicated. In the gamma band source-level network analysis, the cuneus, the lateral orbitofrontal cortex, the superior parietal cortex, the postcentral gyrus, and the pars opercularis were identified as important regions in reappraisal with high betweenness centrality. Conclusion: The proposed multimodal integration method for source localization identified the key cortices involved in emotion regulation, and the network analysis demonstrated the important brain regions involved in the cognitive control of reappraisal. It shows promise in the utility in the clinical setting for affective disorders.

5.

Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection.

Wang, Fengyun; Pan, Jinshan; Xu, Shoukun; Tang, Jinhui.

IEEE Trans Image Process ; 31: 1285-1297, 2022.

Article in English | MEDLINE | ID: mdl-35015637

ABSTRACT

How to explore useful information from depth is the key success of the RGB-D saliency detection methods. While the RGB and depth images are from different domains, a modality gap will lead to unsatisfactory results for simple feature concatenation. Towards better performance, most methods focus on bridging this gap and designing different cross-modal fusion modules for features, while ignoring explicitly extracting some useful consistent information from them. To overcome this problem, we develop a simple yet effective RGB-D saliency detection method by learning discriminative cross-modality features based on the deep neural network. The proposed method first learns modality-specific features for RGB and depth inputs. And then we separately calculate the correlations of every pixel-pair in a cross-modality consistent way, i.e., the distribution ranges are consistent for the correlations calculated based on features extracted from RGB (RGB correlation) or depth inputs (depth correlation). From different perspectives, color or spatial, the RGB and depth correlations end up at the same point to depict how tightly each pixel-pair is related. Secondly, to complemently gather RGB and depth information, we propose a novel correlation-fusion to fuse RGB and depth correlations, resulting in a cross-modality correlation. Finally, the features are refined with both long-range cross-modality correlations and local depth correlations to predict salient maps. In which, the long-range cross-modality correlation provides context information for accurate localization, and the local depth correlation keeps good subtle structures for fine segmentation. In addition, a lightweight DepthNet is designed for efficient depth feature extraction. We solve the proposed network in an end-to-end manner. Both quantitative and qualitative experimental results demonstrate the proposed algorithm achieves favorable performance against state-of-the-art methods.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL