Search | VHL Regional Portal

Influence of medical domain knowledge on deep learning for Alzheimer's disease prediction.

Ljubic, Branimir; Roychoudhury, Shoumik; Cao, Xi Hang; Pavlovski, Martin; Obradovic, Stefan; Nair, Richard; Glass, Lucas; Obradovic, Zoran.

Comput Methods Programs Biomed ; 197: 105765, 2020 Dec.

Article in English | MEDLINE | ID: mdl-33011665

ABSTRACT

BACKGROUND AND OBJECTIVE: Alzheimer's disease (AD) is the most common type of dementia that can seriously affect a person's ability to perform daily activities. Estimates indicate that AD may rank third as a cause of death for older people, after heart disease and cancer. Identification of individuals at risk for developing AD is imperative for testing therapeutic interventions. The objective of the study was to determine could diagnostics of AD from EMR data alone (without relying on diagnostic imaging) be significantly improved by applying clinical domain knowledge in data preprocessing and positive dataset selection rather than setting naïve filters. METHODS: Data were extracted from the repository of heterogeneous ambulatory EMR data, collected from primary care medical offices all over the U.S. Medical domain knowledge was applied to build a positive dataset from data relevant to AD. Selected Clinically Relevant Positive (SCRP) datasets were used as inputs to a Long-Short-Term Memory (LSTM) Recurrent Neural Network (RNN) deep learning model to predict will the patient develop AD. RESULTS: Risk scores prediction of AD using the drugs domain information in an SCRP AD dataset of 2,324 patients achieved high out-of-sample score - 0.98-0.99 Area Under the Precision-Recall Curve (AUPRC) when using 90% of SCRP dataset for training. AUPRC dropped to 0.89 when training the model using less than 1,500 cases from the SCRP dataset. The model was still significantly better than when using naïve dataset selection. CONCLUSION: The LSTM RNN method that used data relevant to AD performed significantly better when learning from the SCRP dataset than when datasets were selected naïvely. The integration of qualitative medical knowledge for dataset selection and deep learning technology provided a mechanism for significant improvement of AD prediction. Accurate and early prediction of AD is significant in the identification of patients for clinical trials, which can possibly result in the discovery of new drugs for treatments of AD. Also, the contribution of the proposed predictions of AD is a better selection of patients who need imaging diagnostics for differential diagnosis of AD from other degenerative brain disorders.

Subject(s)

Alzheimer Disease , Deep Learning , Aged , Aged, 80 and over , Alzheimer Disease/diagnosis , Area Under Curve , Humans , Neural Networks, Computer

Time-to-event estimation by re-defining time.

Cao, Xi Hang; Han, Chao; Glass, Lucas M; Kindman, Allen; Obradovic, Zoran.

J Biomed Inform ; 100: 103326, 2019 12.

Article in English | MEDLINE | ID: mdl-31678589

ABSTRACT

The primary goal of a time-to-event estimation model is to accurately infer the occurrence time of a target event. Most existing studies focus on developing new models to effectively utilize the information in the censored observations. In this paper, we propose a model to tackle the time-to-event estimation problem from a completely different perspective. Our model relaxes a fundamental constraint that the target variable, time, is a univariate number which satisfies a partial order. Instead, the proposed model interprets each event occurrence time as a time concept with a vector representation. We hypothesize that the model will be more accurate and interpretable by capturing (1) the relationships between features and time concept vectors and (2) the relationships among time concept vectors. We also propose a scalable framework to simultaneously learn the model parameters and time concept vectors. Rigorous experiments and analysis have been conducted in medical event prediction task on seven gene expression datasets. The results demonstrate the efficiency and effectiveness of the proposed model. Furthermore, similarity information among time concept vectors helped in identifying time regimes, thus leading to a potential knowledge discovery related to the human cancer considered in our experiments.

Subject(s)

Models, Theoretical , Time and Motion Studies , Algorithms

DPDR-CPI, a server that predicts Drug Positioning and Drug Repositioning via Chemical-Protein Interactome.

Luo, Heng; Zhang, Ping; Cao, Xi Hang; Du, Dizheng; Ye, Hao; Huang, Hui; Li, Can; Qin, Shengying; Wan, Chunling; Shi, Leming; He, Lin; Yang, Lun.

Sci Rep ; 6: 35996, 2016 11 02.

Article in English | MEDLINE | ID: mdl-27805045

ABSTRACT

The cost of developing a new drug has increased sharply over the past years. To ensure a reasonable return-on-investment, it is useful for drug discovery researchers in both industry and academia to identify all the possible indications for early pipeline molecules. For the first time, we propose the term computational "drug candidate positioning" or "drug positioning", to describe the above process. It is distinct from drug repositioning, which identifies new uses for existing drugs and maximizes their value. Since many therapeutic effects are mediated by unexpected drug-protein interactions, it is reasonable to analyze the chemical-protein interactome (CPI) profiles to predict indications. Here we introduce the server DPDR-CPI, which can make real-time predictions based only on the structure of the small molecule. When a user submits a molecule, the server will dock it across 611 human proteins, generating a CPI profile of features that can be used for predictions. It can suggest the likelihood of relevance of the input molecule towards ~1,000 human diseases with top predictions listed. DPDR-CPI achieved an overall AUROC of 0.78 during 10-fold cross-validations and AUROC of 0.76 for the independent validation. The server is freely accessible via http://cpi.bio-x.cn/dpdr/.

Subject(s)

Drug Repositioning , Pharmaceutical Preparations/metabolism , Proteins/metabolism , User-Computer Interface , Area Under Curve , Humans , Internet , Molecular Docking Simulation , Pharmaceutical Preparations/chemistry , Protein Binding , Proteins/chemistry , Pyridazines/chemistry , Pyridazines/metabolism , ROC Curve , Rosiglitazone , Thiazolidinediones/chemistry , Thiazolidinediones/metabolism

A robust data scaling algorithm to improve classification accuracies in biomedical data.

Cao, Xi Hang; Stojkovic, Ivan; Obradovic, Zoran.

BMC Bioinformatics ; 17(1): 359, 2016 Sep 09.

Article in English | MEDLINE | ID: mdl-27612635

ABSTRACT

BACKGROUND: Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. RESULTS: To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms. CONCLUSION: The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.

Subject(s)

Algorithms , Biomedical Research , Models, Theoretical , Databases as Topic , Humans , ROC Curve

Structured feature selection using coordinate descent optimization.

Ghalwash, Mohamed F; Cao, Xi Hang; Stojkovic, Ivan; Obradovic, Zoran.

BMC Bioinformatics ; 17: 158, 2016 Apr 08.

Article in English | MEDLINE | ID: mdl-27059502

ABSTRACT

BACKGROUND: Existing feature selection methods typically do not consider prior knowledge in the form of structural relationships among features. In this study, the features are structured based on prior knowledge into groups. The problem addressed in this article is how to select one representative feature from each group such that the selected features are jointly discriminating the classes. The problem is formulated as a binary constrained optimization and the combinatorial optimization is relaxed as a convex-concave problem, which is then transformed into a sequence of convex optimization problems so that the problem can be solved by any standard optimization algorithm. Moreover, a block coordinate gradient descent optimization algorithm is proposed for high dimensional feature selection, which in our experiments was four times faster than using a standard optimization algorithm. RESULTS: In order to test the effectiveness of the proposed formulation, we used microarray analysis as a case study, where genes with similar expressions or similar molecular functions were grouped together. In particular, the proposed block coordinate gradient descent feature selection method is evaluated on five benchmark microarray gene expression datasets and evidence is provided that the proposed method gives more accurate results than the state-of-the-art gene selection methods. Out of 25 experiments, the proposed method achieved the highest average AUC in 13 experiments while the other methods achieved higher average AUC in no more than 6 experiments. CONCLUSION: A method is developed to select a feature from each group. When the features are grouped based on similarity in gene expression, we showed that the proposed algorithm is more accurate than state-of-the-art gene selection methods that are particularly developed to select highly discriminative and less redundant genes. In addition, the proposed method can exploit any grouping structure among features, while alternative methods are restricted to using similarity based grouping.

Subject(s)

Algorithms , Models, Theoretical , Corneal Neovascularization/diagnosis , Corneal Neovascularization/genetics , Databases, Genetic , Gene Expression Regulation , Gene Ontology , Genetic Variation , HIV Infections/diagnosis , HIV Infections/genetics , Hemoglobinuria/diagnosis , Hemoglobinuria/genetics , Humans , Melanoma/diagnosis , Melanoma/genetics , Microarray Analysis , Multiple Myeloma/diagnosis , Multiple Myeloma/genetics , Neuroendocrine Tumors/diagnosis , Neuroendocrine Tumors/genetics , Nevus/diagnosis , Nevus/genetics , Stress, Physiological/genetics , Virus Diseases/diagnosis , Virus Diseases/genetics

Effectiveness of Multiple Blood-Cleansing Interventions in Sepsis, Characterized in Rats.

Stojkovic, Ivan; Ghalwash, Mohamed; Cao, Xi Hang; Obradovic, Zoran.

Sci Rep ; 6: 24719, 2016 Apr 21.

Article in English | MEDLINE | ID: mdl-27097769

ABSTRACT

Sepsis is a serious, life-threatening condition that presents a growing problem in medicine, but there is still no satisfying solution for treating it. Several blood cleansing approaches recently gained attention as promising interventions that target the main site of problem development-the blood. The focus of this study is an evaluation of the theoretical effectiveness of hemoadsorption therapy and pathogen reduction therapy. This is evaluated using the mathematical model of Murine sepsis, and the results of over 2,200 configurations of single and multiple intervention therapies simulated on 5,000 virtual subjects suggest the advantage of pathogen reduction over hemoadsorption therapy. However, a combination of two approaches is found to take advantage of their complementary effects and outperform either therapy alone. The conducted computational experiments provide unprecedented evidence that the combination of two therapies synergistically enhances the positive effects beyond the simple superposition of the benefits of two approaches. Such a characteristic could have a profound influence on the way sepsis treatment is conducted.

Subject(s)

Models, Biological , Sepsis/blood , Sepsis/therapy , Animals , Computer Simulation , Disease Models, Animal , Mice , Models, Theoretical , Rats , Sepsis/diagnosis , Sepsis/etiology , Severity of Illness Index , Time Factors , Treatment Outcome

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL