Search | VHL Regional Portal

Predicting 30-day all-cause hospital readmission using multimodal spatiotemporal graph neural networks.

Tang, Siyi; Tariq, Amara; Dunnmon, Jared A; Sharma, Umesh; Elugunti, Praneetha; Rubin, Daniel L; Patel, Bhavik N; Banerjee, Imon.

IEEE J Biomed Health Inform ; PP2023 Jan 13.

Article in English | MEDLINE | ID: mdl-37018684

ABSTRACT

Reduction in 30-day readmission rate is an important quality factor for hospitals as it can reduce the overall cost of care and improve patient post-discharge outcomes. While deep-learning-based studies have shown promising empirical results, several limitations exist in prior models for hospital readmission prediction, such as: (a) only patients with certain conditions are considered, (b) do not leverage data temporality, (c) individual admissions are assumed independent of each other, which ignores patient similarity, (d) limited to single modality or single center data. In this study, we propose a multimodal, spatiotemporal graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission, which fuses in-patient multimodal, longitudinal data and models patient similarity using a graph. Using longitudinal chest radiographs and electronic health records from two independent centers, we show that MM-STGNN achieved an area under the receiver operating characteristic curve (AUROC) of 0.79 on both datasets. Furthermore, MM-STGNN significantly outperformed the current clinical reference standard, LACE+ (AUROC=0.61), on the internal dataset. For subset populations of patients with heart disease, our model significantly outperformed baselines, such as gradient-boosting and Long Short-Term Memory models (e.g., AUROC improved by 3.7 points in patients with heart disease). Qualitative interpretability analysis indicated that while patients' primary diagnoses were not explicitly used to train the model, features crucial for model prediction may reflect patients' diagnoses. Our model could be utilized as an additional clinical decision aid during discharge disposition and triaging high-risk patients for closer post-discharge follow-up for potential preventive measures.

Impact of Upstream Medical Image Processing on Downstream Performance of a Head CT Triage Neural Network.

Hooper, Sarah M; Dunnmon, Jared A; Lungren, Matthew P; Mastrodicasa, Domenico; Rubin, Daniel L; Ré, Christopher; Wang, Adam; Patel, Bhavik N.

Radiol Artif Intell ; 3(4): e200229, 2021 Jul.

Article in English | MEDLINE | ID: mdl-34350412

ABSTRACT

PURPOSE: To develop a convolutional neural network (CNN) to triage head CT (HCT) studies and investigate the effect of upstream medical image processing on the CNN's performance. MATERIALS AND METHODS: A total of 9776 HCT studies were retrospectively collected from 2001 through 2014, and a CNN was trained to triage them as normal or abnormal. CNN performance was evaluated on a held-out test set, assessing triage performance and sensitivity to 20 disorders to assess differential model performance, with 7856 CT studies in the training set, 936 in the validation set, and 984 in the test set. This CNN was used to understand how the upstream imaging chain affects CNN performance by evaluating performance after altering three variables: image acquisition by reducing the number of x-ray projections, image reconstruction by inputting sinogram data into the CNN, and image preprocessing. To evaluate performance, the DeLong test was used to assess differences in the area under the receiver operating characteristic curve (AUROC), and the McNemar test was used to compare sensitivities. RESULTS: The CNN achieved a mean AUROC of 0.84 (95% CI: 0.83, 0.84) in discriminating normal and abnormal HCT studies. The number of x-ray projections could be reduced by 16 times and the raw sensor data could be input into the CNN with no statistically significant difference in classification performance. Additionally, CT windowing consistently improved CNN performance, increasing the mean triage AUROC by 0.07 points. CONCLUSION: A CNN was developed to triage HCT studies, which may help streamline image evaluation, and the means by which upstream image acquisition, reconstruction, and preprocessing affect downstream CNN performance was investigated, bringing focus to this important part of the imaging chain.Keywords Head CT, Automated Triage, Deep Learning, Sinogram, DatasetSupplemental material is available for this article.© RSNA, 2021.

Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset.

Tang, Siyi; Ghorbani, Amirata; Yamashita, Rikiya; Rehman, Sameer; Dunnmon, Jared A; Zou, James; Rubin, Daniel L.

Sci Rep ; 11(1): 8366, 2021 04 16.

Article in English | MEDLINE | ID: mdl-33863957

ABSTRACT

The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically identify low quality data are highly desired. In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset. We characterized the effectiveness of data Shapley in identifying low quality versus valuable data for pneumonia detection. We found that removing training data with high Shapley values decreased the pneumonia detection performance, whereas removing data with low Shapley values improved the model performance. Furthermore, there were more mislabeled examples in low Shapley value data and more true pneumonia cases in high Shapley value data. Our results suggest that low Shapley value indicates mislabeled or poor quality images, whereas high Shapley value indicates data that are valuable for pneumonia detection. Our method can serve as a framework for using data Shapley to denoise large-scale medical imaging datasets.

Subject(s)

Algorithms , Diagnostic Imaging/methods , Image Processing, Computer-Assisted/methods , Machine Learning , Neural Networks, Computer , Pneumonia/diagnosis , Radiography, Thoracic/methods , Datasets as Topic , Humans

Comparison of segmentation-free and segmentation-dependent computer-aided diagnosis of breast masses on a public mammography dataset.

Sawyer Lee, Rebecca; Dunnmon, Jared A; He, Ann; Tang, Siyi; Ré, Christopher; Rubin, Daniel L.

J Biomed Inform ; 113: 103656, 2021 01.

Article in English | MEDLINE | ID: mdl-33309994

ABSTRACT

PURPOSE: To compare machine learning methods for classifying mass lesions on mammography images that use predefined image features computed over lesion segmentations to those that leverage segmentation-free representation learning on a standard, public evaluation dataset. METHODS: We apply several classification algorithms to the public Curated Breast Imaging Subset of the Digital Database for Screening Mammography (CBIS-DDSM), in which each image contains a mass lesion. Segmentation-free representation learning techniques for classifying lesions as benign or malignant include both a Bag-of-Visual-Words (BoVW) method and a Convolutional Neural Network (CNN). We compare classification performance of these techniques to that obtained using two different segmentation-dependent approaches from the literature that rely on specific combinations of end classifiers (e.g. linear discriminant analysis, neural networks) and predefined features computed over the lesion segmentation (e.g. spiculation measure, morphological characteristics, intensity metrics). RESULTS: We report area under the receiver operating characteristic curve (AZ) values for malignancy classification on CBIS-DDSM for each technique. We find average AZ values of 0.73 for a segmentation-free BoVW method, 0.86 for a segmentation-free CNN method, 0.75 for a segmentation-dependent linear discriminant analysis of Rubber-Band Straightening Transform features, and 0.58 for a hybrid rule-based neural network classification using a small number of hand-designed features. CONCLUSIONS: We find that malignancy classification performance on the CBIS-DDSM dataset using segmentation-free BoVW features is comparable to that of the best segmentation-dependent methods we study, but also observe that a common segmentation-free CNN model substantially and significantly outperforms each of these (p < 0.05). These results reinforce recent findings suggesting that representation learning techniques such as BoVW and CNNs are advantageous for mammogram analysis because they do not require lesion segmentation, the quality and specific characteristics of which can vary substantially across datasets. We further observe that segmentation-dependent methods achieve performance levels on CBIS-DDSM inferior to those achieved on the original evaluation datasets reported in the literature. Each of these findings reinforces the need for standardization of datasets, segmentation techniques, and model implementations in performance assessments of automated classifiers for medical imaging.

Subject(s)

Breast Neoplasms , Mammography , Breast/diagnostic imaging , Breast Neoplasms/diagnostic imaging , Computers , Early Detection of Cancer , Female , Humans

Cross-Modal Data Programming Enables Rapid Medical Machine Learning.

Dunnmon, Jared A; Ratner, Alexander J; Saab, Khaled; Khandwala, Nishith; Markert, Matthew; Sagreiya, Hersh; Goldman, Roger; Lee-Messer, Christopher; Lungren, Matthew P; Rubin, Daniel L; Ré, Christopher.

Patterns (N Y) ; 1(2)2020 May 08.

Article in English | MEDLINE | ID: mdl-32776018

ABSTRACT

A major bottleneck in developing clinically impactful machine learning models is a lack of labeled training data for model supervision. Thus, medical researchers increasingly turn to weaker, noisier sources of supervision, such as leveraging extractions from unstructured text reports to supervise image classification. A key challenge in weak supervision is combining sources of information that may differ in quality and have correlated errors. Recently, a statistical theory of weak supervision called data programming has shown promise in addressing this challenge. Data programming now underpins many deployed machine-learning systems in the technology industry, even for critical applications. We propose a new technique for applying data programming to the problem of cross-modal weak supervision in medicine, wherein weak labels derived from an auxiliary modality (e.g., text) are used to train models over a different target modality (e.g., images). We evaluate our approach on diverse clinical tasks via direct comparison to institution-scale, hand-labeled datasets. We find that our supervision technique increases model performance by up to 6 points area under the receiver operating characteristic curve (ROC-AUC) over baseline methods by improving both coverage and quality of the weak labels. Our approach yields models that on average perform within 1.75 points ROC-AUC of those supervised with physician-years of hand labeling and outperform those supervised with physician-months of hand labeling by 10.25 points ROC-AUC, while using only person-days of developer time and clinician work-a time saving of 96%. Our results suggest that modern weak supervision techniques such as data programming may enable more rapid development and deployment of clinically useful machine-learning models.

Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs.

Dunnmon, Jared A; Yi, Darvin; Langlotz, Curtis P; Ré, Christopher; Rubin, Daniel L; Lungren, Matthew P.

Radiology ; 290(2): 537-544, 2019 02.

Article in English | MEDLINE | ID: mdl-30422093

ABSTRACT

Purpose To assess the ability of convolutional neural networks (CNNs) to enable high-performance automated binary classification of chest radiographs. Materials and Methods In a retrospective study, 216 431 frontal chest radiographs obtained between 1998 and 2012 were procured, along with associated text reports and a prospective label from the attending radiologist. This data set was used to train CNNs to classify chest radiographs as normal or abnormal before evaluation on a held-out set of 533 images hand-labeled by expert radiologists. The effects of development set size, training set size, initialization strategy, and network architecture on end performance were assessed by using standard binary classification metrics; detailed error analysis, including visualization of CNN activations, was also performed. Results Average area under the receiver operating characteristic curve (AUC) was 0.96 for a CNN trained with 200 000 images. This AUC value was greater than that observed when the same model was trained with 2000 images (AUC = 0.84, P < .005) but was not significantly different from that observed when the model was trained with 20 000 images (AUC = 0.95, P > .05). Averaging the CNN output score with the binary prospective label yielded the best-performing classifier, with an AUC of 0.98 (P < .005). Analysis of specific radiographs revealed that the model was heavily influenced by clinically relevant spatial regions but did not reliably generalize beyond thoracic disease. Conclusion CNNs trained with a modestly sized collection of prospectively labeled chest radiographs achieved high diagnostic performance in the classification of chest radiographs as normal or abnormal; this function may be useful for automated prioritization of abnormal chest radiographs. © RSNA, 2018 Online supplemental material is available for this article. See also the editorial by van Ginneken in this issue.

Subject(s)

Neural Networks, Computer , Radiographic Image Interpretation, Computer-Assisted/methods , Radiography, Thoracic/methods , Female , Humans , Lung/diagnostic imaging , Male , ROC Curve , Radiologists , Retrospective Studies

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL