Search | VHL Regional Portal

Predicting diabetes-related hospitalizations based on electronic health records.

Brisimi, Theodora S; Xu, Tingting; Wang, Taiyao; Dai, Wuyang; Paschalidis, Ioannis Ch.

Stat Methods Med Res ; 28(12): 3667-3682, 2019 12.

Article in English | MEDLINE | ID: mdl-30474497

ABSTRACT

Objective: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. Methods: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. Results: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. Conclusions: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.

Subject(s)

Diabetes Mellitus, Type 2 , Electronic Health Records , Hospitalization/trends , Boston , Cluster Analysis , Cost-Benefit Analysis , Forecasting , Humans

Predicting Chronic Disease Hospitalizations from Electronic Health Records: An Interpretable Classification Approach.

Brisimi, Theodora S; Xu, Tingting; Wang, Taiyao; Dai, Wuyang; Adams, William G; Paschalidis, Ioannis Ch.

Proc IEEE Inst Electr Electron Eng ; 106(4): 690-707, 2018 Apr.

Article in English | MEDLINE | ID: mdl-30886441

ABSTRACT

Urban living in modern large cities has significant adverse effects on health, increasing the risk of several chronic diseases. We focus on the two leading clusters of chronic disease, heart disease and diabetes, and develop data-driven methods to predict hospitalizations due to these conditions. We base these predictions on the patients' medical history, recent and more distant, as described in their Electronic Health Records (EHR). We formulate the prediction problem as a binary classification problem and consider a variety of machine learning methods, including kernelized and sparse Support Vector Machines (SVM), sparse logistic regression, and random forests. To strike a balance between accuracy and interpretability of the prediction, which is important in a medical setting, we propose two novel methods: K-LRT, a likelihood ratio test-based method, and a Joint Clustering and Classification (JCC) method which identifies hidden patient clusters and adapts classifiers to each cluster. We develop theoretical out-of-sample guarantees for the latter method. We validate our algorithms on large datasets from the Boston Medical Center, the largest safety-net hospital system in New England.

Prediction of hospitalization due to heart diseases by supervised learning methods.

Dai, Wuyang; Brisimi, Theodora S; Adams, William G; Mela, Theofanie; Saligrama, Venkatesh; Paschalidis, Ioannis Ch.

Int J Med Inform ; 84(3): 189-97, 2015 Mar.

Article in English | MEDLINE | ID: mdl-25497295

ABSTRACT

BACKGROUND: In 2008, the United States spent $2.2 trillion for healthcare, which was 15.5% of its GDP. 31% of this expenditure is attributed to hospital care. Evidently, even modest reductions in hospital care costs matter. A 2009 study showed that nearly $30.8 billion in hospital care cost during 2006 was potentially preventable, with heart diseases being responsible for about 31% of that amount. METHODS: Our goal is to accurately and efficiently predict heart-related hospitalizations based on the available patient-specific medical history. To the best of our knowledge, the approaches we introduce are novel for this problem. The prediction of hospitalization is formulated as a supervised classification problem. We use de-identified Electronic Health Record (EHR) data from a large urban hospital in Boston to identify patients with heart diseases. Patients are labeled and randomly partitioned into a training and a test set. We apply five machine learning algorithms, namely Support Vector Machines (SVM), AdaBoost using trees as the weak learner, logistic regression, a naïve Bayes event classifier, and a variation of a Likelihood Ratio Test adapted to the specific problem. Each model is trained on the training set and then tested on the test set. RESULTS: All five models show consistent results, which could, to some extent, indicate the limit of the achievable prediction accuracy. Our results show that with under 30% false alarm rate, the detection rate could be as high as 82%. These accuracy rates translate to a considerable amount of potential savings, if used in practice.

Subject(s)

Artificial Intelligence , Heart Diseases , Hospitalization , Risk Assessment/methods , Algorithms , Bayes Theorem , Boston , Electronic Health Records , Humans , Likelihood Functions , Logistic Models , ROC Curve

Practical conditions for effectiveness of the Universum learning.

Cherkassky, Vladimir; Dhar, Sauptik; Dai, Wuyang.

IEEE Trans Neural Netw ; 22(8): 1241-55, 2011 Aug.

Article in English | MEDLINE | ID: mdl-21724504

ABSTRACT

Many applications of machine learning involve analysis of sparse high-dimensional data, in which the number of input features is larger than the number of data samples. Standard inductive learning methods may not be sufficient for such data, and this provides motivation for nonstandard learning settings. This paper investigates a new learning methodology called learning through contradictions or Universum support vector machine (U-SVM). U-SVM incorporates a priori knowledge about application data, in the form of additional Universum samples, into the learning process. This paper investigates possible advantages of U-SVM versus standard SVM, and describes the practical conditions necessary for the effectiveness of the U-SVM. These conditions are based on the analysis of the univariate histograms of projections of training samples onto the normal direction vector of (standard) SVM decision boundary. Several empirical comparisons are presented to illustrate the practical utility of the proposed approach.

Subject(s)

Artificial Intelligence , Support Vector Machine , Algorithms , Pattern Recognition, Automated/methods , Random Allocation

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL