Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Int J Med Inform ; 173: 104930, 2023 05.
Article in English | MEDLINE | ID: covidwho-2277481

ABSTRACT

BACKGROUND: Data drift can negatively impact the performance of machine learning algorithms (MLAs) that were trained on historical data. As such, MLAs should be continuously monitored and tuned to overcome the systematic changes that occur in the distribution of data. In this paper, we study the extent of data drift and provide insights about its characteristics for sepsis onset prediction. This study will help elucidate the nature of data drift for prediction of sepsis and similar diseases. This may aid with the development of more effective patient monitoring systems that can stratify risk for dynamic disease states in hospitals. METHODS: We devise a series of simulations that measure the effects of data drift in patients with sepsis, using electronic health records (EHR). We simulate multiple scenarios in which data drift may occur, namely the change in the distribution of the predictor variables (covariate shift), the change in the statistical relationship between the predictors and the target (concept shift), and the occurrence of a major healthcare event (major event) such as the COVID-19 pandemic. We measure the impact of data drift on model performances, identify the circumstances that necessitate model retraining, and compare the effects of different retraining methodologies and model architecture on the outcomes. We present the results for two different MLAs, eXtreme Gradient Boosting (XGB) and Recurrent Neural Network (RNN). RESULTS: Our results show that the properly retrained XGB models outperform the baseline models in all simulation scenarios, hence signifying the existence of data drift. In the major event scenario, the area under the receiver operating characteristic curve (AUROC) at the end of the simulation period is 0.811 for the baseline XGB model and 0.868 for the retrained XGB model. In the covariate shift scenario, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.853 and 0.874 respectively. In the concept shift scenario and under the mixed labeling method, the retrained XGB models perform worse than the baseline model for most simulation steps. However, under the full relabeling method, the AUROC at the end of the simulation period for the baseline and retrained XGB models is 0.852 and 0.877 respectively. The results for the RNN models were mixed, suggesting that retraining based on a fixed network architecture may be inadequate for an RNN. We also present the results in the form of other performance metrics such as the ratio of observed to expected probabilities (calibration) and the normalized rate of positive predictive values (PPV) by prevalence, referred to as lift, at a sensitivity of 0.8. CONCLUSION: Our simulations reveal that retraining periods of a couple of months or using several thousand patients are likely to be adequate to monitor machine learning models that predict sepsis. This indicates that a machine learning system for sepsis prediction will probably need less infrastructure for performance monitoring and retraining compared to other applications in which data drift is more frequent and continuous. Our results also show that in the event of a concept shift, a full overhaul of the sepsis prediction model may be necessary because it indicates a discrete change in the definition of sepsis labels, and mixing the labels for the sake of incremental training may not produce the desired results.


Subject(s)
COVID-19 , Communicable Diseases , Sepsis , Humans , Pandemics , COVID-19/diagnosis , Sepsis/diagnosis , Machine Learning
2.
JMIR Med Inform ; 10(6): e36202, 2022 Jun 15.
Article in English | MEDLINE | ID: covidwho-1892524

ABSTRACT

BACKGROUND: Acute respiratory distress syndrome (ARDS) is a condition that is often considered to have broad and subjective diagnostic criteria and is associated with significant mortality and morbidity. Early and accurate prediction of ARDS and related conditions such as hypoxemia and sepsis could allow timely administration of therapies, leading to improved patient outcomes. OBJECTIVE: The aim of this study is to perform an exploration of how multilabel classification in the clinical setting can take advantage of the underlying dependencies between ARDS and related conditions to improve early prediction of ARDS in patients. METHODS: The electronic health record data set included 40,703 patient encounters from 7 hospitals from April 20, 2018, to March 17, 2021. A recurrent neural network (RNN) was trained using data from 5 hospitals, and external validation was conducted on data from 2 hospitals. In addition to ARDS, 12 target labels for related conditions such as sepsis, hypoxemia, and COVID-19 were used to train the model to classify a total of 13 outputs. As a comparator, XGBoost models were developed for each of the 13 target labels. Model performance was assessed using the area under the receiver operating characteristic curve. Heat maps to visualize attention scores were generated to provide interpretability to the neural networks. Finally, cluster analysis was performed to identify potential phenotypic subgroups of patients with ARDS. RESULTS: The single RNN model trained to classify 13 outputs outperformed the individual XGBoost models for ARDS prediction, achieving an area under the receiver operating characteristic curve of 0.842 on the external test sets. Models trained on an increasing number of tasks resulted in improved performance. Earlier prediction of ARDS nearly doubled the rate of in-hospital survival. Cluster analysis revealed distinct ARDS subgroups, some of which had similar mortality rates but different clinical presentations. CONCLUSIONS: The RNN model presented in this paper can be used as an early warning system to stratify patients who are at risk of developing one of the multiple risk outcomes, hence providing practitioners with the means to take early action.

3.
PLoS One ; 16(3): e0248128, 2021.
Article in English | MEDLINE | ID: covidwho-1575679

ABSTRACT

BACKGROUND: The COVID-19 pandemic remains a significant global threat. However, despite urgent need, there remains uncertainty surrounding best practices for pharmaceutical interventions to treat COVID-19. In particular, conflicting evidence has emerged surrounding the use of hydroxychloroquine and azithromycin, alone or in combination, for COVID-19. The COVID-19 Evidence Accelerator convened by the Reagan-Udall Foundation for the FDA, in collaboration with Friends of Cancer Research, assembled experts from the health systems research, regulatory science, data science, and epidemiology to participate in a large parallel analysis of different data sets to further explore the effectiveness of these treatments. METHODS: Electronic health record (EHR) and claims data were extracted from seven separate databases. Parallel analyses were undertaken on data extracted from each source. Each analysis examined time to mortality in hospitalized patients treated with hydroxychloroquine, azithromycin, and the two in combination as compared to patients not treated with either drug. Cox proportional hazards models were used, and propensity score methods were undertaken to adjust for confounding. Frequencies of adverse events in each treatment group were also examined. RESULTS: Neither hydroxychloroquine nor azithromycin, alone or in combination, were significantly associated with time to mortality among hospitalized COVID-19 patients. No treatment groups appeared to have an elevated risk of adverse events. CONCLUSION: Administration of hydroxychloroquine, azithromycin, and their combination appeared to have no effect on time to mortality in hospitalized COVID-19 patients. Continued research is needed to clarify best practices surrounding treatment of COVID-19.


Subject(s)
Antiviral Agents/therapeutic use , Azithromycin/therapeutic use , COVID-19 Drug Treatment , Hydroxychloroquine/therapeutic use , Pandemics/prevention & control , Data Management/methods , Drug Therapy, Combination/methods , Female , Hospitalization , Humans , Male , SARS-CoV-2/drug effects
4.
JMIR Form Res ; 5(9): e28028, 2021 Sep 14.
Article in English | MEDLINE | ID: covidwho-1438390

ABSTRACT

BACKGROUND: A high number of patients who are hospitalized with COVID-19 develop acute respiratory distress syndrome (ARDS). OBJECTIVE: In response to the need for clinical decision support tools to help manage the next pandemic during the early stages (ie, when limited labeled data are present), we developed machine learning algorithms that use semisupervised learning (SSL) techniques to predict ARDS development in general and COVID-19 populations based on limited labeled data. METHODS: SSL techniques were applied to 29,127 encounters with patients who were admitted to 7 US hospitals from May 1, 2019, to May 1, 2021. A recurrent neural network that used a time series of electronic health record data was applied to data that were collected when a patient's peripheral oxygen saturation level fell below the normal range (<97%) to predict the subsequent development of ARDS during the remaining duration of patients' hospital stay. Model performance was assessed with the area under the receiver operating characteristic curve and area under the precision recall curve of an external hold-out test set. RESULTS: For the whole data set, the median time between the first peripheral oxygen saturation measurement of <97% and subsequent respiratory failure was 21 hours. The area under the receiver operating characteristic curve for predicting subsequent ARDS development was 0.73 when the model was trained on a labeled data set of 6930 patients, 0.78 when the model was trained on the labeled data set that had been augmented with the unlabeled data set of 16,173 patients by using SSL techniques, and 0.84 when the model was trained on the entire training set of 23,103 labeled patients. CONCLUSIONS: In the context of using time-series inpatient data and a careful model training design, unlabeled data can be used to improve the performance of machine learning models when labeled data for predicting ARDS development are scarce or expensive.

5.
Health Policy Technol ; 10(3): 100554, 2021 Sep.
Article in English | MEDLINE | ID: covidwho-1340667

ABSTRACT

Objective: In the wake of COVID-19, the United States (U.S.) developed a three stage plan to outline the parameters to determine when states may reopen businesses and ease travel restrictions. The guidelines also identify subpopulations of Americans deemed to be at high risk for severe disease should they contract COVID-19. These guidelines were based on population level demographics, rather than individual-level risk factors. As such, they may misidentify individuals at high risk for severe illness, and may therefore be of limited use in decisions surrounding resource allocation to vulnerable populations. The objective of this study was to evaluate a machine learning algorithm for prediction of serious illness due to COVID-19 using inpatient data collected from electronic health records. Methods: The algorithm was trained to identify patients for whom a diagnosis of COVID-19 was likely to result in hospitalization, and compared against four U.S. policy-based criteria: age over 65; having a serious underlying health condition; age over 65 or having a serious underlying health condition; and age over 65 and having a serious underlying health condition. Results: This algorithm identified 80% of patients at risk for hospitalization due to COVID-19, versus 62% identified by government guidelines. The algorithm also achieved a high specificity of 95%, outperforming government guidelines. Conclusions: This algorithm may identify individuals likely to require hospitalization should they contract COVID-19. This information may be useful to guide vaccine distribution, anticipate hospital resource needs, and assist health care policymakers to make care decisions in a more principled manner.

6.
Clin Ther ; 43(5): 871-885, 2021 05.
Article in English | MEDLINE | ID: covidwho-1188425

ABSTRACT

PURPOSE: Coronavirus disease-2019 (COVID-19) continues to be a global threat and remains a significant cause of hospitalizations. Recent clinical guidelines have supported the use of corticosteroids or remdesivir in the treatment of COVID-19. However, uncertainty remains about which patients are most likely to benefit from treatment with either drug; such knowledge is crucial for avoiding preventable adverse effects, minimizing costs, and effectively allocating resources. This study presents a machine-learning system with the capacity to identify patients in whom treatment with a corticosteroid or remdesivir is associated with improved survival time. METHODS: Gradient-boosted decision-tree models used for predicting treatment benefit were trained and tested on data from electronic health records dated between December 18, 2019, and October 18, 2020, from adult patients (age ≥18 years) with COVID-19 in 10 US hospitals. Models were evaluated for performance in identifying patients with longer survival times when treated with a corticosteroid versus remdesivir. Fine and Gray proportional-hazards models were used for identifying significant findings in treated and nontreated patients, in a subset of patients who received supplemental oxygen, and in patients identified by the algorithm. Inverse probability-of-treatment weights were used to adjust for confounding. Models were trained and tested separately for each treatment. FINDINGS: Data from 2364 patients were included, with men comprising slightly more than 50% of the sample; 893 patients were treated with remdesivir, and 1471 were treated with a corticosteroid. After adjustment for confounding, neither corticosteroids nor remdesivir use was associated with increased survival time in the overall population or in the subpopulation that received supplemental oxygen. However, in the populations identified by the algorithms, both corticosteroids and remdesivir were significantly associated with an increase in survival time, with hazard ratios of 0.56 and 0.40, respectively (both, P = 0.04). IMPLICATIONS: Machine-learning methods have the capacity to identify hospitalized patients with COVID-19 in whom treatment with a corticosteroid or remdesivir is associated with an increase in survival time. These methods may help to improve patient outcomes and allocate resources during the COVID-19 crisis.


Subject(s)
Adenosine Monophosphate/analogs & derivatives , Adrenal Cortex Hormones , Alanine/analogs & derivatives , Antiviral Agents , COVID-19 Drug Treatment , Machine Learning , Adenosine Monophosphate/therapeutic use , Adolescent , Adrenal Cortex Hormones/therapeutic use , Adult , Aged , Aged, 80 and over , Alanine/therapeutic use , Antiviral Agents/therapeutic use , Female , Humans , Male , Middle Aged , Young Adult
7.
J Clin Med ; 9(12)2020 Nov 26.
Article in English | MEDLINE | ID: covidwho-945860

ABSTRACT

Therapeutic agents for the novel coronavirus disease 2019 (COVID-19) have been proposed, but evidence supporting their use is limited. A machine learning algorithm was developed in order to identify a subpopulation of COVID-19 patients for whom hydroxychloroquine was associated with improved survival; this population might be relevant for study in a clinical trial. A pragmatic trial was conducted at six United States hospitals. We enrolled COVID-19 patients that were admitted between 10 March and 4 June 2020. Treatment was not randomized. The study endpoint was mortality; discharge was a competing event. Hazard ratios were obtained on the entire population, and on the subpopulation indicated by the algorithm as suitable for treatment. A total of 290 patients were enrolled. In the subpopulation that was identified by the algorithm, hydroxychloroquine was associated with a statistically significant (p = 0.011) increase in survival (adjusted hazard ratio 0.29, 95% confidence interval (CI) 0.11-0.75). Adjusted survival among the algorithm indicated patients was 82.6% in the treated arm and 51.2% in the arm not treated. No association between treatment and mortality was observed in the general population. A 31% increase in survival at the end of the study was observed in a population of COVID-19 patients that were identified by a machine learning algorithm as having a better outcome with hydroxychloroquine treatment. Precision medicine approaches may be useful in identifying a subpopulation of COVID-19 patients more likely to be proven to benefit from hydroxychloroquine treatment in a clinical trial.

8.
Ann Med Surg (Lond) ; 59: 207-216, 2020 Nov.
Article in English | MEDLINE | ID: covidwho-813448

ABSTRACT

RATIONALE: Prediction of patients at risk for mortality can help triage patients and assist in resource allocation. OBJECTIVES: Develop and evaluate a machine learning-based algorithm which accurately predicts mortality in COVID-19, pneumonia, and mechanically ventilated patients. METHODS: Retrospective study of 53,001 total ICU patients, including 9166 patients with pneumonia and 25,895 mechanically ventilated patients, performed on the MIMIC dataset. An additional retrospective analysis was performed on a community hospital dataset containing 114 patients positive for SARS-COV-2 by PCR test. The outcome of interest was in-hospital patient mortality. RESULTS: When trained and tested on the MIMIC dataset, the XGBoost predictor obtained area under the receiver operating characteristic (AUROC) values of 0.82, 0.81, 0.77, and 0.75 for mortality prediction on mechanically ventilated patients at 12-, 24-, 48-, and 72- hour windows, respectively, and AUROCs of 0.87, 0.78, 0.77, and 0.734 for mortality prediction on pneumonia patients at 12-, 24-, 48-, and 72- hour windows, respectively. The predictor outperformed the qSOFA, MEWS and CURB-65 risk scores at all prediction windows. When tested on the community hospital dataset, the predictor obtained AUROCs of 0.91, 0.90, 0.86, and 0.87 for mortality prediction on COVID-19 patients at 12-, 24-, 48-, and 72- hour windows, respectively, outperforming the qSOFA, MEWS and CURB-65 risk scores at all prediction windows. CONCLUSIONS: This machine learning-based algorithm is a useful predictive tool for anticipating patient mortality at clinically useful timepoints, and is capable of accurate mortality prediction for mechanically ventilated patients as well as those diagnosed with pneumonia and COVID-19.

9.
Comput Biol Med ; 124: 103949, 2020 09.
Article in English | MEDLINE | ID: covidwho-695377

ABSTRACT

BACKGROUND: Currently, physicians are limited in their ability to provide an accurate prognosis for COVID-19 positive patients. Existing scoring systems have been ineffective for identifying patient decompensation. Machine learning (ML) may offer an alternative strategy. A prospectively validated method to predict the need for ventilation in COVID-19 patients is essential to help triage patients, allocate resources, and prevent emergency intubations and their associated risks. METHODS: In a multicenter clinical trial, we evaluated the performance of a machine learning algorithm for prediction of invasive mechanical ventilation of COVID-19 patients within 24 h of an initial encounter. We enrolled patients with a COVID-19 diagnosis who were admitted to five United States health systems between March 24 and May 4, 2020. RESULTS: 197 patients were enrolled in the REspirAtory Decompensation and model for the triage of covid-19 patients: a prospective studY (READY) clinical trial. The algorithm had a higher diagnostic odds ratio (DOR, 12.58) for predicting ventilation than a comparator early warning system, the Modified Early Warning Score (MEWS). The algorithm also achieved significantly higher sensitivity (0.90) than MEWS, which achieved a sensitivity of 0.78, while maintaining a higher specificity (p < 0.05). CONCLUSIONS: In the first clinical trial of a machine learning algorithm for ventilation needs among COVID-19 patients, the algorithm demonstrated accurate prediction of the need for mechanical ventilation within 24 h. This algorithm may help care teams effectively triage patients and allocate resources. Further, the algorithm is capable of accurately identifying 16% more patients than a widely used scoring system while minimizing false positive results.


Subject(s)
Betacoronavirus , Clinical Laboratory Techniques/methods , Coronavirus Infections/diagnosis , Coronavirus Infections/physiopathology , Machine Learning , Pneumonia, Viral/diagnosis , Pneumonia, Viral/physiopathology , Respiratory Insufficiency/diagnosis , Respiratory Insufficiency/physiopathology , Adult , Aged , Aged, 80 and over , Algorithms , COVID-19 , COVID-19 Testing , Clinical Laboratory Techniques/statistics & numerical data , Computational Biology , Coronavirus Infections/drug therapy , Coronavirus Infections/therapy , Female , Humans , Male , Middle Aged , Pandemics , Pneumonia, Viral/therapy , Prognosis , Prospective Studies , Respiration, Artificial , Respiratory Insufficiency/therapy , SARS-CoV-2 , Sensitivity and Specificity , Triage/methods , Triage/statistics & numerical data , United States/epidemiology , COVID-19 Drug Treatment
SELECTION OF CITATIONS
SEARCH DETAIL