Search | VHL Regional Portal

1.

A Knowledge Distillation Ensemble Framework for Predicting Short- and Long-Term Hospitalization Outcomes From Electronic Health Records Data.

Ibrahim, Zina M; Bean, Daniel; Searle, Thomas; Qian, Linglong; Wu, Honghan; Shek, Anthony; Kraljevic, Zeljko; Galloway, James; Norton, Sam; Teo, James T; Dobson, Richard Jb.

IEEE J Biomed Health Inform ; 26(1): 423-435, 2022 01.

Article in English | MEDLINE | ID: mdl-34129509

ABSTRACT

The ability to perform accurate prognosis is crucial for proactive clinical decision making, informed resource management and personalised care. Existing outcome prediction models suffer from a low recall of infrequent positive outcomes. We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission and readmission from time-series of vital signs and laboratory results obtained within the first 24 hours of hospital admission. The stacked ensemble platform comprises two components: a) an unsupervised LSTM Autoencoder that learns an optimal representation of the time-series, using it to differentiate the less frequent patterns which conclude with an adverse event from the majority patterns that do not, and b) a gradient boosting model, which relies on the constructed representation to refine prediction by incorporating static features. The model is used to assess a patient's risk of adversity and provides visual justifications of its prediction. Results of three case studies show that the model outperforms existing platforms in ICU and general ward settings, achieving average Precision-Recall Areas Under the Curve (PR-AUCs) of 0.891 (95% CI: 0.878-0.939) for mortality and 0.908 (95% CI: 0.870-0.935) in predicting ICU admission and readmission.

Subject(s)

Electronic Health Records , Machine Learning , Hospitalization , Humans , Length of Stay , ROC Curve , Retrospective Studies

2.

The side effect profile of Clozapine in real world data of three large mental health hospitals.

Iqbal, Ehtesham; Govind, Risha; Romero, Alvin; Dzahini, Olubanke; Broadbent, Matthew; Stewart, Robert; Smith, Tanya; Kim, Chi-Hun; Werbeloff, Nomi; MacCabe, James H; Dobson, Richard J B; Ibrahim, Zina M.

PLoS One ; 15(12): e0243437, 2020.

Article in English | MEDLINE | ID: mdl-33290433

ABSTRACT

OBJECTIVE: Mining the data contained within Electronic Health Records (EHRs) can potentially generate a greater understanding of medication effects in the real world, complementing what we know from Randomised control trials (RCTs). We Propose a text mining approach to detect adverse events and medication episodes from the clinical text to enhance our understanding of adverse effects related to Clozapine, the most effective antipsychotic drug for the management of treatment-resistant schizophrenia, but underutilised due to concerns over its side effects. MATERIAL AND METHODS: We used data from de-identified EHRs of three mental health trusts in the UK (>50 million documents, over 500,000 patients, 2835 of which were prescribed Clozapine). We explored the prevalence of 33 adverse effects by age, gender, ethnicity, smoking status and admission type three months before and after the patients started Clozapine treatment. Where possible, we compared the prevalence of adverse effects with those reported in the Side Effects Resource (SIDER). RESULTS: Sedation, fatigue, agitation, dizziness, hypersalivation, weight gain, tachycardia, headache, constipation and confusion were amongst the highest recorded Clozapine adverse effect in the three months following the start of treatment. Higher percentages of all adverse effects were found in the first month of Clozapine therapy. Using a significance level of (p< 0.05) our chi-square tests show a significant association between most of the ADRs and smoking status and hospital admission, and some in gender, ethnicity and age groups in all trusts hospitals. Later we combined the data from the three trusts hospitals to estimate the average effect of ADRs in each monthly interval. In gender and ethnicity, the results show significant association in 7 out of 33 ADRs, smoking status shows significant association in 21 out of 33 ADRs and hospital admission shows the significant association in 30 out of 33 ADRs. CONCLUSION: A better understanding of how drugs work in the real world can complement clinical trials.

Subject(s)

Antipsychotic Agents/adverse effects , Clozapine/adverse effects , Schizophrenia/drug therapy , Weight Gain/drug effects , Adult , Benzodiazepines/administration & dosage , Benzodiazepines/adverse effects , Clozapine/administration & dosage , Databases, Factual , Female , Hospitals, Psychiatric , Humans , Infant , Male , Middle Aged , Olanzapine/administration & dosage , Olanzapine/adverse effects , Piperazines/administration & dosage , Piperazines/adverse effects , Risperidone/administration & dosage , Risperidone/adverse effects , Schizophrenia/complications , Schizophrenia/physiopathology , Thiazoles/administration & dosage , Thiazoles/adverse effects

3.

On classifying sepsis heterogeneity in the ICU: insight using machine learning.

Ibrahim, Zina M; Wu, Honghan; Hamoud, Ahmed; Stappen, Lukas; Dobson, Richard J B; Agarossi, Andrea.

J Am Med Inform Assoc ; 27(3): 437-443, 2020 03 01.

Article in English | MEDLINE | ID: mdl-31951005

ABSTRACT

OBJECTIVES: Current machine learning models aiming to predict sepsis from electronic health records (EHR) do not account 20 for the heterogeneity of the condition despite its emerging importance in prognosis and treatment. This work demonstrates the added value of stratifying the types of organ dysfunction observed in patients who develop sepsis in the intensive care unit (ICU) in improving the ability to recognize patients at risk of sepsis from their EHR data. MATERIALS AND METHODS: Using an ICU dataset of 13 728 records, we identify clinically significant sepsis subpopulations with distinct organ dysfunction patterns. We perform classification experiments with random forest, gradient boost trees, and support vector machines, using the identified subpopulations to distinguish patients who develop sepsis in the ICU from those who do not. RESULTS: The classification results show that features selected using sepsis subpopulations as background knowledge yield a superior performance in distinguishing septic from non-septic patients regardless of the classification model used. The improved performance is especially pronounced in specificity, which is a current bottleneck in sepsis prediction machine learning models. CONCLUSION: Our findings can steer machine learning efforts toward more personalized models for complex conditions including sepsis.

Subject(s)

Machine Learning , Sepsis/diagnosis , Diagnosis, Differential , Electronic Health Records , Humans , Intensive Care Units , Organ Dysfunction Scores , Sensitivity and Specificity , Sepsis/classification

4.

Efficient Reuse of Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: A Phenotype Embedding Approach.

Wu, Honghan; Hodgson, Karen; Dyson, Sue; Morley, Katherine I; Ibrahim, Zina M; Iqbal, Ehtesham; Stewart, Robert; Dobson, Richard Jb; Sudlow, Cathie.

JMIR Med Inform ; 7(4): e14782, 2019 Dec 17.

Article in English | MEDLINE | ID: mdl-31845899

ABSTRACT

BACKGROUND: Much effort has been put into the use of automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records in order to construct comprehensive patient profiles for delivering better health care. Reusing NLP models in new settings, however, remains cumbersome, as it requires validation and retraining on new data iteratively to achieve convergent results. OBJECTIVE: The aim of this work is to minimize the effort involved in reusing NLP models on free-text medical records. METHODS: We formally define and analyze the model adaptation problem in phenotype-mention identification tasks. We identify "duplicate waste" and "imbalance waste," which collectively impede efficient model reuse. We propose a phenotype embedding-based approach to minimize these sources of waste without the need for labelled data from new settings. RESULTS: We conduct experiments on data from a large mental health registry to reuse NLP models in four phenotype-mention identification tasks. The proposed approach can choose the best model for a new task, identifying up to 76% waste (duplicate waste), that is, phenotype mentions without the need for validation and model retraining and with very good performance (93%-97% accuracy). It can also provide guidance for validating and retraining the selected model for novel language patterns in new tasks, saving around 80% waste (imbalance waste), that is, the effort required in "blind" model-adaptation approaches. CONCLUSIONS: Adapting pretrained NLP models for new tasks can be more efficient and effective if the language pattern landscapes of old settings and new settings can be made explicit and comparable. Our experiments show that the phenotype-mention embedding approach is an effective way to model language patterns for phenotype-mention identification tasks and that its use can guide efficient NLP model reuse.

5.

Author Correction: Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records.

Bean, Daniel M; Wu, Honghan; Iqbal, Ehtesham; Dzahini, Olubanke; Ibrahim, Zina M; Broadbent, Matthew; Stewart, Robert; Dobson, Richard J B.

Sci Rep ; 8(1): 4284, 2018 Mar 06.

Article in English | MEDLINE | ID: mdl-29511265

ABSTRACT

A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.

6.

SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research.

Wu, Honghan; Toti, Giulia; Morley, Katherine I; Ibrahim, Zina M; Folarin, Amos; Jackson, Richard; Kartoglu, Ismail; Agrawal, Asha; Stringer, Clive; Gale, Darren; Gorrell, Genevieve; Roberts, Angus; Broadbent, Matthew; Stewart, Robert; Dobson, Richard J B.

J Am Med Inform Assoc ; 25(5): 530-537, 2018 05 01.

Article in English | MEDLINE | ID: mdl-29361077

ABSTRACT

Objective: Unlocking the data contained within both structured and unstructured components of electronic health records (EHRs) has the potential to provide a step change in data available for secondary research use, generation of actionable medical insights, hospital management, and trial recruitment. To achieve this, we implemented SemEHR, an open source semantic search and analytics tool for EHRs. Methods: SemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualized mentions of a wide range of biomedical concepts within EHRs. Natural language processing annotations are further assembled at the patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data are serviced via ontology-based search and analytics interfaces. Results: SemEHR has been deployed at a number of UK hospitals, including the Clinical Record Interactive Search, an anonymized replica of the EHR of the UK South London and Maudsley National Health Service Foundation Trust, one of Europe's largest providers of mental health services. In 2 Clinical Record Interactive Search-based studies, SemEHR achieved 93% (hepatitis C) and 99% (HIV) F-measure results in identifying true positive patients. At King's College Hospital in London, as part of the CogStack program (github.com/cogstack), SemEHR is being used to recruit patients into the UK Department of Health 100 000 Genomes Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast at searching phenotypes; time for recruitment criteria checking was reduced from days to minutes. Validated on open intensive care EHR data, Medical Information Mart for Intensive Care III, the vital signs extracted by SemEHR can achieve around 97% accuracy. Conclusion: Results from the multiple case studies demonstrate SemEHR's efficiency: weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of patients, bringing in more and unexpected insight compared to study-oriented bespoke IE systems. SemEHR is open source, available at https://github.com/CogStack/SemEHR.

Subject(s)

Electronic Health Records , Information Storage and Retrieval/methods , Natural Language Processing , Semantics , Clinical Trials as Topic , Humans , Patient Selection , State Medicine , United Kingdom

7.

Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records.

Bean, Daniel M; Wu, Honghan; Iqbal, Ehtesham; Dzahini, Olubanke; Ibrahim, Zina M; Broadbent, Matthew; Stewart, Robert; Dobson, Richard J B.

Sci Rep ; 7(1): 16416, 2017 11 27.

Article in English | MEDLINE | ID: mdl-29180758

ABSTRACT

Unknown adverse reactions to drugs available on the market present a significant health risk and limit accurate judgement of the cost/benefit trade-off for medications. Machine learning has the potential to predict unknown adverse reactions from current knowledge. We constructed a knowledge graph containing four types of node: drugs, protein targets, indications and adverse reactions. Using this graph, we developed a machine learning algorithm based on a simple enrichment test and first demonstrated this method performs extremely well at classifying known causes of adverse reactions (AUC 0.92). A cross validation scheme in which 10% of drug-adverse reaction edges were systematically deleted per fold showed that the method correctly predicts 68% of the deleted edges on average. Next, a subset of adverse reactions that could be reliably detected in anonymised electronic health records from South London and Maudsley NHS Foundation Trust were used to validate predictions from the model that are not currently known in public databases. High-confidence predictions were validated in electronic records significantly more frequently than random models, and outperformed standard methods (logistic regression, decision trees and support vector machines). This approach has the potential to improve patient safety by predicting adverse reactions that were not observed during randomised trials.

Subject(s)

Drug-Related Side Effects and Adverse Reactions/epidemiology , Electronic Health Records , Knowledge Bases , Algorithms , Databases, Factual , Humans , Machine Learning , Prognosis , Public Health Surveillance , Reproducibility of Results

8.

ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records.

Iqbal, Ehtesham; Mallah, Robbie; Rhodes, Daniel; Wu, Honghan; Romero, Alvin; Chang, Nynn; Dzahini, Olubanke; Pandey, Chandra; Broadbent, Matthew; Stewart, Robert; Dobson, Richard J B; Ibrahim, Zina M.

PLoS One ; 12(11): e0187121, 2017.

Article in English | MEDLINE | ID: mdl-29121053

ABSTRACT

Adverse drug events (ADEs) are unintended responses to medical treatment. They can greatly affect a patient's quality of life and present a substantial burden on healthcare. Although Electronic health records (EHRs) document a wealth of information relating to ADEs, they are frequently stored in the unstructured or semi-structured free-text narrative requiring Natural Language Processing (NLP) techniques to mine the relevant information. Here we present a rule-based ADE detection and classification pipeline built and tested on a large Psychiatric corpus comprising 264k patients using the de-identified EHRs of four UK-based psychiatric hospitals. The pipeline uses characteristics specific to Psychiatric EHRs to guide the annotation process, and distinguishes: a) the temporal value associated with the ADE mention (whether it is historical or present), b) the categorical value of the ADE (whether it is assertive, hypothetical, retrospective or a general discussion) and c) the implicit contextual value where the status of the ADE is deduced from surrounding indicators, rather than explicitly stated. We manually created the rulebase in collaboration with clinicians and pharmacists by studying ADE mentions in various types of clinical notes. We evaluated the open-source Adverse Drug Event annotation Pipeline (ADEPt) using 19 ADEs specific to antipsychotics and antidepressants medication. The ADEs chosen vary in severity, regularity and persistence. The average F-measure and accuracy achieved by our tool across all tested ADEs were 0.83 and 0.83 respectively. In addition to annotation power, the ADEPT pipeline presents an improvement to the state of the art context-discerning algorithm, ConText.

Subject(s)

Drug-Related Side Effects and Adverse Reactions/pathology , Electronic Health Records , Semantics , Algorithms , Antidepressive Agents/pharmacology , Antipsychotic Agents/pharmacology , Natural Language Processing , ROC Curve

9.

Identification of Adverse Drug Events from Free Text Electronic Patient Records and Information in a Large Mental Health Case Register.

Iqbal, Ehtesham; Mallah, Robbie; Jackson, Richard George; Ball, Michael; Ibrahim, Zina M; Broadbent, Matthew; Dzahini, Olubanke; Stewart, Robert; Johnston, Caroline; Dobson, Richard J B.

PLoS One ; 10(8): e0134208, 2015.

Article in English | MEDLINE | ID: mdl-26273830

ABSTRACT

OBJECTIVES: Electronic healthcare records (EHRs) are a rich source of information, with huge potential for secondary research use. The aim of this study was to develop an application to identify instances of Adverse Drug Events (ADEs) from free text psychiatric EHRs. METHODS: We used the GATE Natural Language Processing (NLP) software to mine instances of ADEs from free text content within the Clinical Record Interactive Search (CRIS) system, a de-identified psychiatric case register developed at the South London and Maudsley NHS Foundation Trust, UK. The tool was built around a set of four movement disorders (extrapyramidal side effects [EPSEs]) related to antipsychotic therapy and rules were then generalised such that the tool could be applied to additional ADEs. We report the frequencies of recorded EPSEs in patients diagnosed with a Severe Mental Illness (SMI) and then report performance in identifying eight other unrelated ADEs. RESULTS: The tool identified EPSEs with >0.85 precision and >0.86 recall during testing. Akathisia was found to be the most prevalent EPSE overall and occurred in the Asian ethnic group with a frequency of 8.13%. The tool performed well when applied to most of the non-EPSEs but least well when applied to rare conditions such as myocarditis, a condition that appears frequently in the text as a side effect warning to patients. CONCLUSIONS: The developed tool allows us to accurately identify instances of a potential ADE from psychiatric EHRs. As such, we were able to study the prevalence of ADEs within subgroups of patients stratified by SMI diagnosis, gender, age and ethnicity. In addition we demonstrated the generalisability of the application to other ADE types by producing a high precision rate on a non-EPSE related set of ADE containing documents. AVAILABILITY: The application can be found at http://git.brc.iop.kcl.ac.uk/rmallah/dystoniaml.

Subject(s)

Antipsychotic Agents/adverse effects , Drug-Related Side Effects and Adverse Reactions/epidemiology , Electronic Health Records , Antipsychotic Agents/therapeutic use , Data Mining/methods , Humans , Mental Disorders/drug therapy , Registries , Software

10.

The relative vertex clustering value--a new criterion for the fast discovery of functional modules in protein interaction networks.

Ibrahim, Zina M; Ngom, Alioune.

BMC Bioinformatics ; 16 Suppl 4: S3, 2015.

Article in English | MEDLINE | ID: mdl-25734691

ABSTRACT

BACKGROUND: Cellular processes are known to be modular and are realized by groups of proteins implicated in common biological functions. Such groups of proteins are called functional modules, and many community detection methods have been devised for their discovery from protein interaction networks (PINs) data. In current agglomerative clustering approaches, vertices with just a very few neighbors are often classified as separate clusters, which does not make sense biologically. Also, a major limitation of agglomerative techniques is that their computational efficiency do not scale well to large PINs. Finally, PIN data obtained from large scale experiments generally contain many false positives, and this makes it hard for agglomerative clustering methods to find the correct clusters, since they are known to be sensitive to noisy data. RESULTS: We propose a local similarity premetric, the relative vertex clustering value, as a new criterion allowing to decide when a node can be added to a given node's cluster and which addresses the above three issues. Based on this criterion, we introduce a novel and very fast agglomerative clustering technique, FAC-PIN, for discovering functional modules and protein complexes from a PIN data. CONCLUSIONS: Our proposed FAC-PIN algorithm is applied to nine PIN data from eight different species including the yeast PIN, and the identified functional modules are validated using Gene Ontology (GO) annotations from DAVID Bioinformatics Resources. Identified protein complexes are also validated using experimentally verified complexes. Computational results show that FAC-PIN can discover functional modules or protein complexes from PINs more accurately and more efficiently than HC-PIN and CNM, the current state-of-the-art approaches for clustering PINs in an agglomerative manner.

Subject(s)

Algorithms , Computational Biology/methods , Protein Interaction Mapping/methods , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Cluster Analysis , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Signal Transduction , Vocabulary, Controlled

11.

Using qualitative probability in reverse-engineering gene regulatory networks.

Ibrahim, Zina M; Ngom, Alioune; Tawfik, Ahmed Y.

IEEE/ACM Trans Comput Biol Bioinform ; 8(2): 326-34, 2011.

Article in English | MEDLINE | ID: mdl-20876933

ABSTRACT

This paper demonstrates the use of qualitative probabilistic networks (QPNs) to aid Dynamic Bayesian Networks (DBNs) in the process of learning the structure of gene regulatory networks from microarray gene expression data. We present a study which shows that QPNs define monotonic relations that are capable of identifying regulatory interactions in a manner that is less susceptible to the many sources of uncertainty that surround gene expression data. Moreover, we construct a model that maps the regulatory interactions of genetic networks to QPN constructs and show its capability in providing a set of candidate regulators for target genes, which is subsequently used to establish a prior structure that the DBN learning algorithm can use and which 1) distinguishes spurious correlations from true regulations, 2) enables the discovery of sets of coregulators of target genes, and 3) results in a more efficient construction of gene regulatory networks. The model is compared to the existing literature using the known gene regulatory interactions of Drosophila Melanogaster.

Subject(s)

Gene Regulatory Networks/genetics , Models, Statistical , Algorithms , Animals , Bayes Theorem , Drosophila melanogaster/genetics , Gene Expression Profiling/methods , Gene Expression Regulation

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL