Marrying Convolution and Transformer for COVID-19 Diagnosis Based on CT Scans

Mei, J.

ABSTRACT

Presently, the coronavirus disease 2019 (COVID-19) has infected more than 200 million of the world's population and has killed more than 4 million people. In addition to reverse transcription nucleic acid polymerase chain reaction (RT-PCR) as the main detection method, the deep learning-based method using diagnose X-ray or CT scans has become an promising alternative. Last years, Convolution neural network (CNN) has became the methodology choices in the field of medical images until the emergence of Vision Transformer (ViT) broke this situation. Transformer gradually dominates in the field of computer vision, but Transformer lacks inductive biases of convolution operation, requires a lot of data to achieve better performance than CNN, and the amount of calculation is too large when the input is a high-resolution picture. It is found that Transformer and CNN can complement each other. Therefore, there are many kinds of research on the combination of them. However, there is little research on the hybrid model's diagnostic direction of medical images, especially COVID-19 image classification. For this problem, we search the way of marrying CNN and Transformer and propose a hybrid model combining CNN and Transformer, which we called DenseTransformer. Experiments on our COVID-19 CT scans dataset show that the hybrid model, which combines CNN and Transformer properly, can perform better than pure CNN and pure Transformer in the COVID-19 image classification task, and the performance will be further improved after using self-supervised learning. © 2022 IEEE.

Similar