Search | VHL Regional Portal

FDR-TransUNet: A novel encoder-decoder architecture with vision transformer for improved medical image segmentation.

Chaoyang, Zhang; Shibao, Sun; Wenmao, Hu; Pengcheng, Zhao.

Comput Biol Med ; 169: 107858, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38113680

ABSTRACT

The U-shaped and Transformer architectures have achieved exceptional performance in medical image segmentation and natural language processing, respectively. Their combination has also led to remarkable results but still suffers from enormous loss of image features during downsampling and the difficulty of recovering spatial information during upsampling. In this paper, we propose a novel encoder-decoder architecture for medical image segmentation, which has a flexibly adjustable hybrid encoder and two expanding paths decoder. The hybrid encoder incorporates the feature double reuse (FDR) block and the encoder of Vision Transformer (ViT), which can extract local and global pixel localization information, and alleviate image feature loss effectively. Meanwhile, we retain the original class-token sequence in the Vision Transformer and develop an additional corresponding expanding path. The class-token sequence and abstract image features are leveraged by two independent expanding paths with the deep-supervision strategy, which can better recover the image spatial information and accelerate model convergence. To further mitigate the feature loss and improve spatial information recovery, we introduce successive residual connections throughout the entire network. We evaluated our model on the COVID-19 lung segmentation and the infection area segmentation tasks. The mIoU index increased by 1.5 points and 3.9 points compared to other models which demonstrates a performance improvement.

Subject(s)

COVID-19 , Humans , Natural Language Processing , Image Processing, Computer-Assisted

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL