Supervised Pretraining through Contrastive Categorical Positive Samplings to Improve COVID-19 Mortality Prediction.

Wanyan, Tingyi; Lin, Mingquan; Klang, Eyal; Menon, Kartikeya M; Gulamali, Faris F; Azad, Ariful; Zhang, Yiye; Ding, Ying; Wang, Zhangyang; Wang, Fei; Glicksberg, Benjamin; Peng, Yifan

Wanyan, Tingyi; Lin, Mingquan; Klang, Eyal; Menon, Kartikeya M; Gulamali, Faris F; Azad, Ariful; Zhang, Yiye; Ding, Ying; Wang, Zhangyang; Wang, Fei; Glicksberg, Benjamin; Peng, Yifan.

Wanyan T; Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
Lin M; Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
Klang E; Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Menon KM; Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Gulamali FF; Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Azad A; Intelligent Systems Engineering, Indiana University, Bloomington, Bloomington, IN, USA.
Zhang Y; Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
Ding Y; School of Information, University of Texus Austin, Austin, TX, USA.
Wang Z; Electrical and Computer Engineering, University of Texus Austin, Austin, TX, USA.
Wang F; Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
Glicksberg B; Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Peng Y; Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.

ACM BCB ; 20222022 Aug.

Article in English | MEDLINE | ID: covidwho-1993099

ABSTRACT

ABSTRACT

Clinical EHR data is naturally heterogeneous, where it contains abundant sub-phenotype. Such diversity creates challenges for outcome prediction using a machine learning model since it leads to high intra-class variance. To address this issue, we propose a supervised pre-training model with a unique embedded k-nearest-neighbor positive sampling strategy. We demonstrate the enhanced performance value of this framework theoretically and show that it yields highly competitive experimental results in predicting patient mortality in real-world COVID-19 EHR data with a total of over 7,000 patients admitted to a large, urban health system. Our method achieves a better AUROC prediction score of 0.872, which outperforms the alternative pre-training models and traditional machine learning methods. Additionally, our method performs much better when the training data size is small (345 training instances).

Keywords

Intra-class variance; Pre-training; Self-supervised Learning; Sub-phenotype; Supervised Contrastive Learning; mortality prediction

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Type of study: Prognostic study Language: English Year: 2022 Document Type: Article Affiliation country: 3535508.3545541

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google