Your browser doesn't support javascript.
Enhancing thoracic disease detection using chest X-rays from PubMed Central Open Access.
Lin, Mingquan; Hou, Bojian; Mishra, Swati; Yao, Tianyuan; Huo, Yuankai; Yang, Qian; Wang, Fei; Shih, George; Peng, Yifan.
  • Lin M; Department of Population Health Sciences, Weill Cornell Medicine, New York, USA.
  • Hou B; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, USA.
  • Mishra S; Department of Information Science, Cornell University, New York, USA.
  • Yao T; Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
  • Huo Y; Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
  • Yang Q; Department of Information Science, Cornell University, New York, USA.
  • Wang F; Department of Population Health Sciences, Weill Cornell Medicine, New York, USA.
  • Shih G; Department of Radiology, Weill Cornell Medicine, New York, USA.
  • Peng Y; Department of Population Health Sciences, Weill Cornell Medicine, New York, USA. Electronic address: yip4002@med.cornell.edu.
Comput Biol Med ; 159: 106962, 2023 06.
Article in English | MEDLINE | ID: covidwho-2316623
ABSTRACT
Large chest X-rays (CXR) datasets have been collected to train deep learning models to detect thorax pathology on CXR. However, most CXR datasets are from single-center studies and the collected pathologies are often imbalanced. The aim of this study was to automatically construct a public, weakly-labeled CXR database from articles in PubMed Central Open Access (PMC-OA) and to assess model performance on CXR pathology classification by using this database as additional training data. Our framework includes text extraction, CXR pathology verification, subfigure separation, and image modality classification. We have extensively validated the utility of the automatically generated image database on thoracic disease detection tasks, including Hernia, Lung Lesion, Pneumonia, and pneumothorax. We pick these diseases due to their historically poor performance in existing datasets the NIH-CXR dataset (112,120 CXR) and the MIMIC-CXR dataset (243,324 CXR). We find that classifiers fine-tuned with additional PMC-CXR extracted by the proposed framework consistently and significantly achieved better performance than those without (e.g., Hernia 0.9335 vs 0.9154; Lung Lesion 0.7394 vs. 0.7207; Pneumonia 0.7074 vs. 0.6709; Pneumothorax 0.8185 vs. 0.7517, all in AUC with p< 0.0001) for CXR pathology detection. In contrast to previous approaches that manually submit the medical images to the repository, our framework can automatically collect figures and their accompanied figure legends. Compared to previous studies, the proposed framework improved subfigure segmentation and incorporates our advanced self-developed NLP technique for CXR pathology verification. We hope it complements existing resources and improves our ability to make biomedical image data findable, accessible, interoperable, and reusable.
Subject(s)
Keywords

Full text: Available Collection: International databases Database: MEDLINE Main subject: Pneumonia / Pneumothorax / Thoracic Diseases Type of study: Prognostic study / Reviews Limits: Humans Language: English Journal: Comput Biol Med Year: 2023 Document Type: Article Affiliation country: J.compbiomed.2023.106962

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Pneumonia / Pneumothorax / Thoracic Diseases Type of study: Prognostic study / Reviews Limits: Humans Language: English Journal: Comput Biol Med Year: 2023 Document Type: Article Affiliation country: J.compbiomed.2023.106962