Search | VHL Regional Portal

Multimodal data fusion using sparse canonical correlation analysis and cooperative learning: a COVID-19 cohort study.

Er, Ahmet Gorkem; Ding, Daisy Yi; Er, Berrin; Uzun, Mertcan; Cakmak, Mehmet; Sadee, Christoph; Durhan, Gamze; Ozmen, Mustafa Nasuh; Tanriover, Mine Durusu; Topeli, Arzu; Aydin Son, Yesim; Tibshirani, Robert; Unal, Serhat; Gevaert, Olivier.

NPJ Digit Med ; 7(1): 117, 2024 May 07.

Article in English | MEDLINE | ID: mdl-38714751

ABSTRACT

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19.

Er, Ahmet Gorkem; Ding, Daisy Yi; Er, Berrin; Uzun, Mertcan; Cakmak, Mehmet; Sadee, Christoph; Durhan, Gamze; Ozmen, Mustafa Nasuh; Tanriover, Mine Durusu; Topeli, Arzu; Son, Yesim Aydin; Tibshirani, Robert; Unal, Serhat; Gevaert, Olivier.

Res Sq ; 2023 Nov 20.

Article in English | MEDLINE | ID: mdl-38045288

ABSTRACT

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

Molecular classification and biomarkers of clinical outcome in breast ductal carcinoma in situ: Analysis of TBCRC 038 and RAHBT cohorts.

Strand, Siri H; Rivero-Gutiérrez, Belén; Houlahan, Kathleen E; Seoane, Jose A; King, Lorraine M; Risom, Tyler; Simpson, Lunden A; Vennam, Sujay; Khan, Aziz; Cisneros, Luis; Hardman, Timothy; Harmon, Bryan; Couch, Fergus; Gallagher, Kristalyn; Kilgore, Mark; We, Shi; DeMichele, Angela; King, Tari; McAuliffe, Priscilla F; Nangia, Julie; Lee, Joanna; Tseng, Jennifer; Storniolo, Anna Maria; Thompson, Alastair M; Gupta, Gaorav P; Burns, Robyn; Veis, Deborah J; DeSchryver, Katherine; Zhu, Chunfang; Matusiak, Magdalena; Wang, Jason; Zhu, Shirley X; Tappenden, Jen; Ding, Daisy Yi; Zhang, Dadong; Luo, Jingqin; Jiang, Shu; Varma, Sushama; Anderson, Lauren; Straub, Cody; Srivastava, Sucheta; Curtis, Christina; Tibshirani, Rob; Angelo, Robert Michael; Hall, Allison; Owzar, Kouros; Polyak, Kornelia; Maley, Carlo; Marks, Jeffrey R; Colditz, Graham A.

Cancer Cell ; 41(7): 1381, 2023 Jul 10.

Article in English | MEDLINE | ID: mdl-37433282

Molecular classification and biomarkers of clinical outcome in breast ductal carcinoma in situ: Analysis of TBCRC 038 and RAHBT cohorts.

Strand, Siri H; Rivero-Gutiérrez, Belén; Houlahan, Kathleen E; Seoane, Jose A; King, Lorraine M; Risom, Tyler; Simpson, Lunden A; Vennam, Sujay; Khan, Aziz; Cisneros, Luis; Hardman, Timothy; Harmon, Bryan; Couch, Fergus; Gallagher, Kristalyn; Kilgore, Mark; Wei, Shi; DeMichele, Angela; King, Tari; McAuliffe, Priscilla F; Nangia, Julie; Lee, Joanna; Tseng, Jennifer; Storniolo, Anna Maria; Thompson, Alastair M; Gupta, Gaorav P; Burns, Robyn; Veis, Deborah J; DeSchryver, Katherine; Zhu, Chunfang; Matusiak, Magdalena; Wang, Jason; Zhu, Shirley X; Tappenden, Jen; Ding, Daisy Yi; Zhang, Dadong; Luo, Jingqin; Jiang, Shu; Varma, Sushama; Anderson, Lauren; Straub, Cody; Srivastava, Sucheta; Curtis, Christina; Tibshirani, Rob; Angelo, Robert Michael; Hall, Allison; Owzar, Kouros; Polyak, Kornelia; Maley, Carlo; Marks, Jeffrey R; Colditz, Graham A.

Cancer Cell ; 40(12): 1521-1536.e7, 2022 12 12.

Article in English | MEDLINE | ID: mdl-36400020

ABSTRACT

Ductal carcinoma in situ (DCIS) is the most common precursor of invasive breast cancer (IBC), with variable propensity for progression. We perform multiscale, integrated molecular profiling of DCIS with clinical outcomes by analyzing 774 DCIS samples from 542 patients with 7.3 years median follow-up from the Translational Breast Cancer Research Consortium 038 study and the Resource of Archival Breast Tissue cohorts. We identify 812 genes associated with ipsilateral recurrence within 5 years from treatment and develop a classifier that predicts DCIS or IBC recurrence in both cohorts. Pathways associated with recurrence include proliferation, immune response, and metabolism. Distinct stromal expression patterns and immune cell compositions are identified. Our multiscale approach employed in situ methods to generate a spatially resolved atlas of breast precancers, where complementary modalities can be directly compared and correlated with conventional pathology findings, disease states, and clinical outcome.

Subject(s)

Breast Neoplasms , Carcinoma, Ductal, Breast , Carcinoma, Intraductal, Noninfiltrating , Humans , Female , Carcinoma, Intraductal, Noninfiltrating/genetics , Carcinoma, Intraductal, Noninfiltrating/metabolism , Carcinoma, Intraductal, Noninfiltrating/pathology , Carcinoma, Ductal, Breast/genetics , Carcinoma, Ductal, Breast/metabolism , Carcinoma, Ductal, Breast/pathology , Disease Progression , Breast Neoplasms/pathology , Biomarkers , Biomarkers, Tumor/genetics , Biomarkers, Tumor/analysis

Cooperative learning for multiview analysis.

Ding, Daisy Yi; Li, Shuangning; Narasimhan, Balasubramanian; Tibshirani, Robert.

Proc Natl Acad Sci U S A ; 119(38): e2202113119, 2022 09 20.

Article in English | MEDLINE | ID: mdl-36095183

ABSTRACT

We propose a method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data, such as genomics, proteomics, and radiomics, are measured on a common set of samples. "Cooperative learning" combines the usual squared-error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. One version of our fitting procedure is modular, where one can choose different fitting mechanisms (e.g., lasso, random forests, boosting, or neural networks) appropriate for different data views. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals. We show that cooperative learning achieves higher predictive accuracy on simulated data and real multiomics examples of labor-onset prediction. By leveraging aligned signals and allowing flexible fitting mechanisms for different modalities, cooperative learning offers a powerful approach to multiomics data fusion.

Subject(s)

Genomics , Neural Networks, Computer , Supervised Machine Learning , Genomics/methods

The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data.

Ding, Daisy Yi; Simpson, Chloé; Pfohl, Stephen; Kale, Dave C; Jung, Kenneth; Shah, Nigam H.

Pac Symp Biocomput ; 24: 18-29, 2019.

Article in English | MEDLINE | ID: mdl-30864307

ABSTRACT

Electronic phenotyping is the task of ascertaining whether an individual has a medical condition of interest by analyzing their medical record and is foundational in clinical informatics. Increasingly, electronic phenotyping is performed via supervised learning. We investigate the effectiveness of multitask learning for phenotyping using electronic health records (EHR) data. Multitask learning aims to improve model performance on a target task by jointly learning additional auxiliary tasks and has been used in disparate areas of machine learning. However, its utility when applied to EHR data has not been established, and prior work suggests that its benefits are inconsistent. We present experiments that elucidate when multitask learning with neural nets improves performance for phenotyping using EHR data relative to neural nets trained for a single phenotype and to well-tuned baselines. We find that multitask neural nets consistently outperform single-task neural nets for rare phenotypes but underperform for relatively more common phenotypes. The effect size increases as more auxiliary tasks are added. Moreover, multitask learning reduces the sensitivity of neural nets to hyperparameter settings for rare phenotypes. Last, we quantify phenotype complexity and find that neural nets trained with or without multitask learning do not improve on simple baselines unless the phenotypes are sufficiently complex.

Subject(s)

Electronic Health Records/statistics & numerical data , Machine Learning , Algorithms , Computational Biology , Databases, Factual , Deep Learning , Humans , Logistic Models , Medical Informatics , Neural Networks, Computer , Phenotype

Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.

Rajpurkar, Pranav; Irvin, Jeremy; Ball, Robyn L; Zhu, Kaylie; Yang, Brandon; Mehta, Hershel; Duan, Tony; Ding, Daisy; Bagul, Aarti; Langlotz, Curtis P; Patel, Bhavik N; Yeom, Kristen W; Shpanskaya, Katie; Blankenberg, Francis G; Seekins, Jayne; Amrhein, Timothy J; Mong, David A; Halabi, Safwan S; Zucker, Evan J; Ng, Andrew Y; Lungren, Matthew P.

PLoS Med ; 15(11): e1002686, 2018 11.

Article in English | MEDLINE | ID: mdl-30457988

ABSTRACT

BACKGROUND: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists. METHODS AND FINDINGS: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution. CONCLUSIONS: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.

Subject(s)

Clinical Competence , Deep Learning , Diagnosis, Computer-Assisted/methods , Pneumonia/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Radiography, Thoracic/methods , Radiologists , Humans , Predictive Value of Tests , Reproducibility of Results , Retrospective Studies

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL