Search | VHL Regional Portal

Influence of medical domain knowledge on deep learning for Alzheimer's disease prediction.

Ljubic, Branimir; Roychoudhury, Shoumik; Cao, Xi Hang; Pavlovski, Martin; Obradovic, Stefan; Nair, Richard; Glass, Lucas; Obradovic, Zoran.

Comput Methods Programs Biomed ; 197: 105765, 2020 Dec.

Article in English | MEDLINE | ID: mdl-33011665

ABSTRACT

BACKGROUND AND OBJECTIVE: Alzheimer's disease (AD) is the most common type of dementia that can seriously affect a person's ability to perform daily activities. Estimates indicate that AD may rank third as a cause of death for older people, after heart disease and cancer. Identification of individuals at risk for developing AD is imperative for testing therapeutic interventions. The objective of the study was to determine could diagnostics of AD from EMR data alone (without relying on diagnostic imaging) be significantly improved by applying clinical domain knowledge in data preprocessing and positive dataset selection rather than setting naïve filters. METHODS: Data were extracted from the repository of heterogeneous ambulatory EMR data, collected from primary care medical offices all over the U.S. Medical domain knowledge was applied to build a positive dataset from data relevant to AD. Selected Clinically Relevant Positive (SCRP) datasets were used as inputs to a Long-Short-Term Memory (LSTM) Recurrent Neural Network (RNN) deep learning model to predict will the patient develop AD. RESULTS: Risk scores prediction of AD using the drugs domain information in an SCRP AD dataset of 2,324 patients achieved high out-of-sample score - 0.98-0.99 Area Under the Precision-Recall Curve (AUPRC) when using 90% of SCRP dataset for training. AUPRC dropped to 0.89 when training the model using less than 1,500 cases from the SCRP dataset. The model was still significantly better than when using naïve dataset selection. CONCLUSION: The LSTM RNN method that used data relevant to AD performed significantly better when learning from the SCRP dataset than when datasets were selected naïvely. The integration of qualitative medical knowledge for dataset selection and deep learning technology provided a mechanism for significant improvement of AD prediction. Accurate and early prediction of AD is significant in the identification of patients for clinical trials, which can possibly result in the discovery of new drugs for treatments of AD. Also, the contribution of the proposed predictions of AD is a better selection of patients who need imaging diagnostics for differential diagnosis of AD from other degenerative brain disorders.

Subject(s)

Alzheimer Disease , Deep Learning , Aged , Aged, 80 and over , Alzheimer Disease/diagnosis , Area Under Curve , Humans , Neural Networks, Computer

Predicting complications of diabetes mellitus using advanced machine learning algorithms.

Ljubic, Branimir; Hai, Ameen Abdel; Stanojevic, Marija; Diaz, Wilson; Polimac, Daniel; Pavlovski, Martin; Obradovic, Zoran.

J Am Med Inform Assoc ; 27(9): 1343-1351, 2020 07 01.

Article in English | MEDLINE | ID: mdl-32869093

ABSTRACT

OBJECTIVE: We sought to predict if patients with type 2 diabetes mellitus (DM2) would develop 10 selected complications. Accurate prediction of complications could help with more targeted measures that would prevent or slow down their development. MATERIALS AND METHODS: Experiments were conducted on the Healthcare Cost and Utilization Project State Inpatient Databases of California for the period of 2003 to 2011. Recurrent neural network (RNN) long short-term memory (LSTM) and RNN gated recurrent unit (GRU) deep learning methods were designed and compared with random forest and multilayer perceptron traditional models. Prediction accuracy of selected complications were compared on 3 settings corresponding to minimum number of hospitalizations between diabetes diagnosis and the diagnosis of complications. RESULTS: The diagnosis domain was used for experiments. The best results were achieved with RNN GRU model, followed by RNN LSTM model. The prediction accuracy achieved with RNN GRU model was between 73% (myocardial infarction) and 83% (chronic ischemic heart disease), while accuracy of traditional models was between 66% - 76%. DISCUSSION: The number of hospitalizations was an important factor for the prediction accuracy. Experiments with 4 hospitalizations achieved significantly better accuracy than with 2 hospitalizations. To achieve improved accuracy deep learning models required training on at least 1000 patients and accuracy significantly dropped if training datasets contained 500 patients. The prediction accuracy of complications decreases over time period. Considering individual complications, the best accuracy was achieved on depressive disorder and chronic ischemic heart disease. CONCLUSIONS: The RNN GRU model was the best choice for electronic medical record type of data, based on the achieved results.

Subject(s)

Algorithms , Deep Learning , Diabetes Complications , Diabetes Mellitus, Type 2/complications , Risk Assessment/methods , Decision Trees , Humans , Neural Networks, Computer , Prognosis

Optimizing clinical trials recruitment via deep learning.

Gligorijevic, Jelena; Gligorijevic, Djordje; Pavlovski, Martin; Milkovits, Elizabeth; Glass, Lucas; Grier, Kevin; Vankireddy, Praveen; Obradovic, Zoran.

J Am Med Inform Assoc ; 26(11): 1195-1202, 2019 11 01.

Article in English | MEDLINE | ID: mdl-31188432

ABSTRACT

OBJECTIVE: Clinical trials, prospective research studies on human participants carried out by a distributed team of clinical investigators, play a crucial role in the development of new treatments in health care. This is a complex and expensive process where investigators aim to enroll volunteers with predetermined characteristics, administer treatment(s), and collect safety and efficacy data. Therefore, choosing top-enrolling investigators is essential for efficient clinical trial execution and is 1 of the primary drivers of drug development cost. MATERIALS AND METHODS: To facilitate clinical trials optimization, we propose DeepMatch (DM), a novel approach that builds on top of advances in deep learning. DM is designed to learn from both investigator and trial-related heterogeneous data sources and rank investigators based on their expected enrollment performance on new clinical trials. RESULTS: Large-scale evaluation conducted on 2618 studies provides evidence that the proposed ranking-based framework improves the current state-of-the-art by up to 19% on ranking investigators and up to 10% on detecting top/bottom performers when recruiting investigators for new clinical trials. DISCUSSION: The extensive experimental section suggests that DM can provide substantial improvement over current industry standards in several regards: (1) the enrollment potential of the investigator list, (2) the time it takes to generate the list, and (3) data-informed decisions about new investigators. CONCLUSION: Due to the great significance of the problem at hand, related research efforts are set to shift the paradigm of how investigators are chosen for clinical trials, thereby optimizing and automating them and reducing the cost of new therapies.

Subject(s)

Clinical Trials as Topic/methods , Data Mining/methods , Deep Learning , Patient Selection , Research Personnel , Databases, Factual , Electronic Health Records , Humans , Insurance Claim Reporting

Social network analysis for better understanding of influenza.

Ljubic, Branimir; Gligorijevic, Djordje; Gligorijevic, Jelena; Pavlovski, Martin; Obradovic, Zoran.

J Biomed Inform ; 93: 103161, 2019 05.

Article in English | MEDLINE | ID: mdl-30940598

ABSTRACT

INTRODUCTION: The objective of this study is to improve the understanding of spatial spreading of complicated cases of influenza that required hospitalizations, by creating heatmaps and social networks. They will allow to identify critical hubs and routes of spreading of Influenza, in specific geographic locations, in order to contain infections and prevent complications, that require hospitalizations. MATERIAL AND METHODS: Data were downloaded from the Healthcare Cost and Utilization Project (HCUP) - SID, New York State database. Patients hospitalized with flu complications, between 2003 and 2012 were included in the research (30,380 cases). A novel approach was designed, by constructing heatmaps for specific geographic regions in New York state and power law networks, in order to analyze distribution of hospitalized flu cases. RESULTS: Heatmaps revealed that distributions of patients follow urban areas and big roads, indicating that flu spreads along routes, that people use to travel. A scale-free network, created from correlations among zip codes, discovered that, the highest populated zip codes didn't have the largest number of patients with flu complications. Among the top five most affected zip codes, four were in Bronx. Demographics of top affected zip codes were presented in results. Normalized numbers of cases per population revealed that, none of zip codes from Bronx were in the top 20. All zip codes with the highest node degrees were in New York City area. DISCUSSION: Heatmaps identified geographic distribution of hospitalized flu patients and network analysis identified hubs of the infection. Our results will enable better estimation of resources for prevention and treatment of hospitalized patients with complications of Influenza. CONCLUSION: Analyses of geographic distribution of hospitalized patients with Influenza and demographic characteristics of populations, help us to make better planning and management of resources for Influenza patients, that require hospitalization. Obtained results could potentially help to save many lives and improve the health of the population.

Subject(s)

Influenza, Human/epidemiology , Social Networking , Hospitalization , Humans , New York/epidemiology , Travel

Generating highly accurate prediction hypotheses through collaborative ensemble learning.

Arsov, Nino; Pavlovski, Martin; Basnarkov, Lasko; Kocarev, Ljupco.

Sci Rep ; 7: 44649, 2017 03 17.

Article in English | MEDLINE | ID: mdl-28304378

ABSTRACT

Ensemble generation is a natural and convenient way of achieving better generalization performance of learning algorithms by gathering their predictive capabilities. Here, we nurture the idea of ensemble-based learning by combining bagging and boosting for the purpose of binary classification. Since the former improves stability through variance reduction, while the latter ameliorates overfitting, the outcome of a multi-model that combines both strives toward a comprehensive net-balancing of the bias-variance trade-off. To further improve this, we alter the bagged-boosting scheme by introducing collaboration between the multi-model's constituent learners at various levels. This novel stability-guided classification scheme is delivered in two flavours: during or after the boosting process. Applied among a crowd of Gentle Boost ensembles, the ability of the two suggested algorithms to generalize is inspected by comparing them against Subbagging and Gentle Boost on various real-world datasets. In both cases, our models obtained a 40% generalization error decrease. But their true ability to capture details in data was revealed through their application for protein detection in texture analysis of gel electrophoresis images. They achieve improved performance of approximately 0.9773 AUROC when compared to the AUROC of 0.9574 obtained by an SVM based on recursive feature elimination.

Subject(s)

Algorithms , Machine Learning , Models, Theoretical , Area Under Curve , Numerical Analysis, Computer-Assisted , ROC Curve

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL