Data augmentation based semi-supervised method to improve COVID-19 CT classification.

Chen, Xiangtao; Bai, Yuting; Wang, Peng; Luo, Jiawei

Chen, Xiangtao; Bai, Yuting; Wang, Peng; Luo, Jiawei.

Chen X; College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China.
Bai Y; College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China.
Wang P; College of Computer Science and Engineering, Hunan Institute of Technology, Hengyang 421002, China.
Luo J; College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China.

Math Biosci Eng ; 20(4): 6838-6852, 2023 02 06.

Article in English | MEDLINE | ID: covidwho-2254646

ABSTRACT

ABSTRACT

The Coronavirus (COVID-19) outbreak of December 2019 has become a serious threat to people around the world, creating a health crisis that infected millions of lives, as well as destroying the global economy. Early detection and diagnosis are essential to prevent further transmission. The detection of COVID-19 computed tomography images is one of the important approaches to rapid diagnosis. Many different branches of deep learning methods have played an important role in this area, including transfer learning, contrastive learning, ensemble strategy, etc. However, these works require a large number of samples of expensive manual labels, so in order to save costs, scholars adopted semi-supervised learning that applies only a few labels to classify COVID-19 CT images. Nevertheless, the existing semi-supervised methods focus primarily on class imbalance and pseudo-label filtering rather than on pseudo-label generation. Accordingly, in this paper, we organized a semi-supervised classification framework based on data augmentation to classify the CT images of COVID-19. We revised the classic teacher-student framework and introduced the popular data augmentation method Mixup, which widened the distribution of high confidence to improve the accuracy of selected pseudo-labels and ultimately obtain a model with better performance. For the COVID-CT dataset, our method makes precision, F1 score, accuracy and specificity 21.04%, 12.95%, 17.13% and 38.29% higher than average values for other methods respectively, For the SARS-COV-2 dataset, these increases were 8.40%, 7.59%, 9.35% and 12.80% respectively. For the Harvard Dataverse dataset, growth was 17.64%, 18.89%, 19.81% and 20.20% respectively. The codes are available at https//github.com/YutingBai99/COVID-19-SSL.

Subject(s)

COVID-19; Humans; COVID-19/diagnostic imaging; COVID-19/epidemiology; SARS-CoV-2; Databases, Factual; Disease Outbreaks; Tomography, X-Ray Computed

Keywords

COVID-19; Mixup; pseudo-labels; semi-supervised; teacher-student framework

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Main subject: COVID-19 Type of study: Diagnostic study / Observational study / Prognostic study Limits: Humans Language: English Journal: Math Biosci Eng Year: 2023 Document Type: Article Affiliation country: Mbe.2023294

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google