Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 599
Filter
Add filters

Document Type
Year range
1.
BMC Med Inform Decis Mak ; 22(1): 2, 2022 01 04.
Article in English | MEDLINE | ID: covidwho-1606711

ABSTRACT

BACKGROUND: The coronavirus disease (COVID-19) hospitalized patients are always at risk of death. Machine learning (ML) algorithms can be used as a potential solution for predicting mortality in COVID-19 hospitalized patients. So, our study aimed to compare several ML algorithms to predict the COVID-19 mortality using the patient's data at the first time of admission and choose the best performing algorithm as a predictive tool for decision-making. METHODS: In this study, after feature selection, based on the confirmed predictors, information about 1500 eligible patients (1386 survivors and 144 deaths) obtained from the registry of Ayatollah Taleghani Hospital, Abadan city, Iran, was extracted. Afterwards, several ML algorithms were trained to predict COVID-19 mortality. Finally, to assess the models' performance, the metrics derived from the confusion matrix were calculated. RESULTS: The study participants were 1500 patients; the number of men was found to be higher than that of women (836 vs. 664) and the median age was 57.25 years old (interquartile 18-100). After performing the feature selection, out of 38 features, dyspnea, ICU admission, and oxygen therapy were found as the top three predictors. Smoking, alanine aminotransferase, and platelet count were found to be the three lowest predictors of COVID-19 mortality. Experimental results demonstrated that random forest (RF) had better performance than other ML algorithms with accuracy, sensitivity, precision, specificity, and receiver operating characteristic (ROC) of 95.03%, 90.70%, 94.23%, 95.10%, and 99.02%, respectively. CONCLUSION: It was found that ML enables a reasonable level of accuracy in predicting the COVID-19 mortality. Therefore, ML-based predictive models, particularly the RF algorithm, potentially facilitate identifying the patients who are at high risk of mortality and inform proper interventions by the clinicians.


Subject(s)
COVID-19 , Algorithms , Female , Humans , Machine Learning , Male , Middle Aged , ROC Curve , Retrospective Studies , SARS-CoV-2
2.
PLoS One ; 17(1): e0262193, 2022.
Article in English | MEDLINE | ID: covidwho-1606289

ABSTRACT

OBJECTIVE: To prospectively evaluate a logistic regression-based machine learning (ML) prognostic algorithm implemented in real-time as a clinical decision support (CDS) system for symptomatic persons under investigation (PUI) for Coronavirus disease 2019 (COVID-19) in the emergency department (ED). METHODS: We developed in a 12-hospital system a model using training and validation followed by a real-time assessment. The LASSO guided feature selection included demographics, comorbidities, home medications, vital signs. We constructed a logistic regression-based ML algorithm to predict "severe" COVID-19, defined as patients requiring intensive care unit (ICU) admission, invasive mechanical ventilation, or died in or out-of-hospital. Training data included 1,469 adult patients who tested positive for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) within 14 days of acute care. We performed: 1) temporal validation in 414 SARS-CoV-2 positive patients, 2) validation in a PUI set of 13,271 patients with symptomatic SARS-CoV-2 test during an acute care visit, and 3) real-time validation in 2,174 ED patients with PUI test or positive SARS-CoV-2 result. Subgroup analysis was conducted across race and gender to ensure equity in performance. RESULTS: The algorithm performed well on pre-implementation validations for predicting COVID-19 severity: 1) the temporal validation had an area under the receiver operating characteristic (AUROC) of 0.87 (95%-CI: 0.83, 0.91); 2) validation in the PUI population had an AUROC of 0.82 (95%-CI: 0.81, 0.83). The ED CDS system performed well in real-time with an AUROC of 0.85 (95%-CI, 0.83, 0.87). Zero patients in the lowest quintile developed "severe" COVID-19. Patients in the highest quintile developed "severe" COVID-19 in 33.2% of cases. The models performed without significant differences between genders and among race/ethnicities (all p-values > 0.05). CONCLUSION: A logistic regression model-based ML-enabled CDS can be developed, validated, and implemented with high performance across multiple hospitals while being equitable and maintaining performance in real-time validation.


Subject(s)
COVID-19/diagnosis , Decision Support Systems, Clinical , Logistic Models , Machine Learning , Triage/methods , COVID-19/physiopathology , Emergency Service, Hospital , Humans , ROC Curve , Severity of Illness Index
3.
AAPS J ; 24(1): 19, 2022 01 04.
Article in English | MEDLINE | ID: covidwho-1605878

ABSTRACT

Over the past decade, artificial intelligence (AI) and machine learning (ML) have become the breakthrough technology most anticipated to have a transformative effect on pharmaceutical research and development (R&D). This is partially driven by revolutionary advances in computational technology and the parallel dissipation of previous constraints to the collection/processing of large volumes of data. Meanwhile, the cost of bringing new drugs to market and to patients has become prohibitively expensive. Recognizing these headwinds, AI/ML techniques are appealing to the pharmaceutical industry due to their automated nature, predictive capabilities, and the consequent expected increase in efficiency. ML approaches have been used in drug discovery over the past 15-20 years with increasing sophistication. The most recent aspect of drug development where positive disruption from AI/ML is starting to occur, is in clinical trial design, conduct, and analysis. The COVID-19 pandemic may further accelerate utilization of AI/ML in clinical trials due to an increased reliance on digital technology in clinical trial conduct. As we move towards a world where there is a growing integration of AI/ML into R&D, it is critical to get past the related buzz-words and noise. It is equally important to recognize that the scientific method is not obsolete when making inferences about data. Doing so will help in separating hope from hype and lead to informed decision-making on the optimal use of AI/ML in drug development. This manuscript aims to demystify key concepts, present use-cases and finally offer insights and a balanced view on the optimal use of AI/ML methods in R&D.


Subject(s)
Artificial Intelligence , Clinical Trials as Topic , Computational Biology , Drug Development , Machine Learning , Pharmaceutical Research , Research Design , Animals , Artificial Intelligence/trends , Computational Biology/trends , Diffusion of Innovation , Drug Development/trends , Forecasting , Humans , Machine Learning/trends , Pharmaceutical Research/trends , Research Design/trends
4.
BMC Public Health ; 22(1): 10, 2022 01 05.
Article in English | MEDLINE | ID: covidwho-1604673

ABSTRACT

BACKGROUND: Narrowing a large set of features to a smaller one can improve our understanding of the main risk factors for in-hospital mortality in patients with COVID-19. This study aimed to derive a parsimonious model for predicting overall survival (OS) among re-infected COVID-19 patients using machine-learning algorithms. METHODS: The retrospective data of 283 re-infected COVID-19 patients admitted to twenty-six medical centers (affiliated with Shiraz University of Medical Sciences) from 10 June to 26 December 2020 were reviewed and analyzed. An elastic-net regularized Cox proportional hazards (PH) regression and model approximation via backward elimination were utilized to optimize a predictive model of time to in-hospital death. The model was further reduced to its core features to maximize simplicity and generalizability. RESULTS: The empirical in-hospital mortality rate among the re-infected COVID-19 patients was 9.5%. In addition, the mortality rate among the intubated patients was 83.5%. Using the Kaplan-Meier approach, the OS (95% CI) rates for days 7, 14, and 21 were 87.5% (81.6-91.6%), 78.3% (65.0-87.0%), and 52.2% (20.3-76.7%), respectively. The elastic-net Cox PH regression retained 8 out of 35 candidate features of death. Transfer by Emergency Medical Services (EMS) (HR=3.90, 95% CI: 1.63-9.48), SpO2≤85% (HR=8.10, 95% CI: 2.97-22.00), increased serum creatinine (HR=1.85, 95% CI: 1.48-2.30), and increased white blood cells (WBC) count (HR=1.10, 95% CI: 1.03-1.15) were associated with higher in-hospital mortality rates in the re-infected COVID-19 patients. CONCLUSION: The results of the machine-learning analysis demonstrated that transfer by EMS, profound hypoxemia (SpO2≤85%), increased serum creatinine (more than 1.6 mg/dL), and increased WBC count (more than 8.5 (×109 cells/L)) reduced the OS of the re-infected COVID-19 patients. We recommend that future machine-learning studies should further investigate these relationships and the associated factors in these patients for a better prediction of OS.


Subject(s)
COVID-19 , Algorithms , Hospital Mortality , Humans , Machine Learning , Proportional Hazards Models , Retrospective Studies , Risk Factors , SARS-CoV-2
5.
J Med Internet Res ; 23(12): e30753, 2021 12 22.
Article in English | MEDLINE | ID: covidwho-1593102

ABSTRACT

BACKGROUND: Expanding access to and use of medication for opioid use disorder (MOUD) is a key component of overdose prevention. An important barrier to the uptake of MOUD is exposure to inaccurate and potentially harmful health misinformation on social media or web-based forums where individuals commonly seek information. There is a significant need to devise computational techniques to describe the prevalence of web-based health misinformation related to MOUD to facilitate mitigation efforts. OBJECTIVE: By adopting a multidisciplinary, mixed methods strategy, this paper aims to present machine learning and natural language analysis approaches to identify the characteristics and prevalence of web-based misinformation related to MOUD to inform future prevention, treatment, and response efforts. METHODS: The team harnessed public social media posts and comments in the English language from Twitter (6,365,245 posts), YouTube (99,386 posts), Reddit (13,483,419 posts), and Drugs-Forum (5549 posts). Leveraging public health expert annotations on a sample of 2400 of these social media posts that were found to be semantically most similar to a variety of prevailing opioid use disorder-related myths based on representational learning, the team developed a supervised machine learning classifier. This classifier identified whether a post's language promoted one of the leading myths challenging addiction treatment: that the use of agonist therapy for MOUD is simply replacing one drug with another. Platform-level prevalence was calculated thereafter by machine labeling all unannotated posts with the classifier and noting the proportion of myth-indicative posts over all posts. RESULTS: Our results demonstrate promise in identifying social media postings that center on treatment myths about opioid use disorder with an accuracy of 91% and an area under the curve of 0.9, including how these discussions vary across platforms in terms of prevalence and linguistic characteristics, with the lowest prevalence on web-based health communities such as Reddit and Drugs-Forum and the highest on Twitter. Specifically, the prevalence of the stated MOUD myth ranged from 0.4% on web-based health communities to 0.9% on Twitter. CONCLUSIONS: This work provides one of the first large-scale assessments of a key MOUD-related myth across multiple social media platforms and highlights the feasibility and importance of ongoing assessment of health misinformation related to addiction treatment.


Subject(s)
Opioid-Related Disorders , Social Media , Communication , Humans , Machine Learning , Opioid-Related Disorders/drug therapy , Opioid-Related Disorders/epidemiology , Prevalence
6.
Crit Care ; 25(1): 328, 2021 09 08.
Article in English | MEDLINE | ID: covidwho-1582035

ABSTRACT

BACKGROUND: The coronavirus disease 2019 (COVID-19) pandemic caused by the SARS-Cov2 virus has become the greatest health and controversial issue for worldwide nations. It is associated with different clinical manifestations and a high mortality rate. Predicting mortality and identifying outcome predictors are crucial for COVID patients who are critically ill. Multivariate and machine learning methods may be used for developing prediction models and reduce the complexity of clinical phenotypes. METHODS: Multivariate predictive analysis was applied to 108 out of 250 clinical features, comorbidities, and blood markers captured at the admission time from a hospitalized cohort of patients (N = 250) with COVID-19. Inspired modification of partial least square (SIMPLS)-based model was developed to predict hospital mortality. Prediction accuracy was randomly assigned to training and validation sets. Predictive partition analysis was performed to obtain cutting value for either continuous or categorical variables. Latent class analysis (LCA) was carried to cluster the patients with COVID-19 to identify low- and high-risk patients. Principal component analysis and LCA were used to find a subgroup of survivors that tends to die. RESULTS: SIMPLS-based model was able to predict hospital mortality in patients with COVID-19 with moderate predictive power (Q2 = 0.24) and high accuracy (AUC > 0.85) through separating non-survivors from survivors developed using training and validation sets. This model was obtained by the 18 clinical and comorbidities predictors and 3 blood biochemical markers. Coronary artery disease, diabetes, Altered Mental Status, age > 65, and dementia were the topmost differentiating mortality predictors. CRP, prothrombin, and lactate were the most differentiating biochemical markers in the mortality prediction model. Clustering analysis identified high- and low-risk patients among COVID-19 survivors. CONCLUSIONS: An accurate COVID-19 mortality prediction model among hospitalized patients based on the clinical features and comorbidities may play a beneficial role in the clinical setting to better management of patients with COVID-19. The current study revealed the application of machine-learning-based approaches to predict hospital mortality in patients with COVID-19 and identification of most important predictors from clinical, comorbidities and blood biochemical variables as well as recognizing high- and low-risk COVID-19 survivors.


Subject(s)
COVID-19/mortality , Hospital Mortality/trends , Machine Learning/standards , Severity of Illness Index , COVID-19/epidemiology , Cohort Studies , Female , Humans , Male , Prognosis , Respiration, Artificial/statistics & numerical data , Risk Assessment/methods , Risk Factors
7.
Sci Rep ; 11(1): 24224, 2021 12 20.
Article in English | MEDLINE | ID: covidwho-1585790

ABSTRACT

Since 2019, a large number of people worldwide have been infected with severe acute respiratory syndrome coronavirus 2. Among those infected, a limited number develop severe coronavirus disease 2019 (COVID-19), which generally has an acute onset. The treatment of patients with severe COVID-19 is challenging. To optimize disease prognosis and effectively utilize medical resources, proactive measures must be adopted for patients at risk of developing severe COVID-19. We analyzed the data of COVID-19 patients from seven medical institutions in Tokyo and used mathematical modeling of patient blood test results to quantify and compare the predictive ability of multiple prognostic indicators for the development of severe COVID-19. A machine learning logistic regression model was used to analyze the blood test results of 300 patients. Due to the limited data set, the size of the training group was constantly adjusted to ensure that the results of machine learning were effective (e.g., recognition rate of disease severity > 80%). Lymphocyte count, hemoglobin, and ferritin levels were the best prognostic indicators of severe COVID-19. The mathematical model developed in this study enables prediction and classification of COVID-19 severity.


Subject(s)
COVID-19/pathology , Models, Theoretical , Adolescent , Adult , Aged , C-Reactive Protein/analysis , COVID-19/virology , Female , Ferritins/analysis , Hemoglobins/analysis , Humans , Lymphocyte Count , Machine Learning , Male , Middle Aged , Prognosis , Retrospective Studies , Risk Factors , SARS-CoV-2/isolation & purification , Severity of Illness Index , Young Adult
8.
Sci Rep ; 11(1): 24439, 2021 12 24.
Article in English | MEDLINE | ID: covidwho-1585782

ABSTRACT

Acute kidney injury (AKI) is frequently associated with COVID-19 and it is considered an indicator of disease severity. This study aimed to develop a prognostic score for predicting in-hospital mortality in COVID-19 patients with AKI (AKI-COV score). This was a cross-sectional multicentre prospective cohort study in the Latin America AKI COVID-19 Registry. A total of 870 COVID-19 patients with AKI defined according to the KDIGO were included between 1 May 2020 and 31 December 2020. We evaluated four categories of predictor variables that were available at the time of the diagnosis of AKI: (1) demographic data; (2) comorbidities and conditions at admission; (3) laboratory exams within 24 h; and (4) characteristics and causes of AKI. We used a machine learning approach to fit models in the training set using tenfold cross-validation and validated the accuracy using the area under the receiver operating characteristic curve (AUC-ROC). The coefficients of the best model (Elastic Net) were used to build the predictive AKI-COV score. The AKI-COV score had an AUC-ROC of 0.823 (95% CI 0.761-0.885) in the validation cohort. The use of the AKI-COV score may assist healthcare workers in identifying hospitalized COVID-19 patients with AKI that may require more intensive monitoring and can be used for resource allocation.


Subject(s)
Acute Kidney Injury/complications , COVID-19/pathology , Hospital Mortality , Machine Learning , Aged , Area Under Curve , COVID-19/complications , COVID-19/mortality , COVID-19/virology , Comorbidity , Female , Humans , Male , Middle Aged , Prospective Studies , ROC Curve , Registries , Risk Factors , SARS-CoV-2/isolation & purification
9.
Comput Intell Neurosci ; 2021: 1916690, 2021.
Article in English | MEDLINE | ID: covidwho-1582894

ABSTRACT

Background: From Ebola, Zika, to the latest COVID-19 pandemic, outbreaks of highly infectious diseases continue to reveal severe consequences of social and health inequalities. People from low socioeconomic and educational backgrounds as well as low health literacy tend to be affected by the uncertainty, complexity, volatility, and progressiveness of public health crises and emergencies. A key lesson that governments have taken from the ongoing coronavirus pandemic is the importance of developing and disseminating highly accessible, actionable, inclusive, coherent public health advice, which represent a critical tool to help people with diverse cultural, educational backgrounds and varying abilities to effectively implement health policies at the grassroots level. Objective: We aimed to translate the best practices of accessible, inclusive public health advice (purposefully designed for people with low socioeconomic and educational background, health literacy levels, limited English proficiency, and cognitive/functional impairments) on COVID-19 from health authorities in English-speaking multicultural countries (USA, Australia, and UK) to adaptive tools for the evaluation of the accessibility of public health advice in other languages. Methods: We developed an optimised Bayesian classifier to produce probabilistic prediction of the accessibility of official health advice among vulnerable people including migrants and foreigners living in China. We developed an adaptive statistical formula for the rapid evaluation of the accessibility of health advice among vulnerable people in China. Results: Our study provides needed research tools to fill in a persistent gap in Chinese public health research on accessible, inclusive communication of infectious diseases' prevention and management. For the probabilistic prediction, using the optimised Bayesian machine learning classifier (GNB), the largest positive likelihood ratio (LR+) 16.685 (95% confidence interval: 4.35, 64.04) was identified when the probability threshold was set at 0.2 (sensitivity: 0.98; specificity: 0.94). Conclusion: Effective communication of health risks through accessible, inclusive, actionable public advice represents a powerful tool to reduce health inequalities amidst health crises and emergencies. Our study translated the best-practice public health advice developed during the pandemic into intuitive machine learning classifiers for health authorities to develop evidence-based guidelines of accessible health advice. In addition, we developed adaptive statistical tools for frontline health professionals to assess accessibility of public health advice for people from non-English speaking backgrounds.


Subject(s)
COVID-19 , Communicable Diseases , Zika Virus Infection , Zika Virus , Bayes Theorem , Communicable Diseases/epidemiology , Humans , Machine Learning , Pandemics , Public Health , SARS-CoV-2
10.
Crit Care ; 25(1): 448, 2021 12 27.
Article in English | MEDLINE | ID: covidwho-1582028

ABSTRACT

INTRODUCTION: Determining the optimal timing for extubation can be challenging in the intensive care. In this study, we aim to identify predictors for extubation failure in critically ill patients with COVID-19. METHODS: We used highly granular data from 3464 adult critically ill COVID patients in the multicenter Dutch Data Warehouse, including demographics, clinical observations, medications, fluid balance, laboratory values, vital signs, and data from life support devices. All intubated patients with at least one extubation attempt were eligible for analysis. Transferred patients, patients admitted for less than 24 h, and patients still admitted at the time of data extraction were excluded. Potential predictors were selected by a team of intensive care physicians. The primary and secondary outcomes were extubation without reintubation or death within the next 7 days and within 48 h, respectively. We trained and validated multiple machine learning algorithms using fivefold nested cross-validation. Predictor importance was estimated using Shapley additive explanations, while cutoff values for the relative probability of failed extubation were estimated through partial dependence plots. RESULTS: A total of 883 patients were included in the model derivation. The reintubation rate was 13.4% within 48 h and 18.9% at day 7, with a mortality rate of 0.6% and 1.0% respectively. The grandient-boost model performed best (area under the curve of 0.70) and was used to calculate predictor importance. Ventilatory characteristics and settings were the most important predictors. More specifically, a controlled mode duration longer than 4 days, a last fraction of inspired oxygen higher than 35%, a mean tidal volume per kg ideal body weight above 8 ml/kg in the day before extubation, and a shorter duration in assisted mode (< 2 days) compared to their median values. Additionally, a higher C-reactive protein and leukocyte count, a lower thrombocyte count, a lower Glasgow coma scale and a lower body mass index compared to their medians were associated with extubation failure. CONCLUSION: The most important predictors for extubation failure in critically ill COVID-19 patients include ventilatory settings, inflammatory parameters, neurological status, and body mass index. These predictors should therefore be routinely captured in electronic health records.


Subject(s)
Airway Extubation , COVID-19 , Treatment Failure , Adult , COVID-19/therapy , Critical Illness , Humans , Machine Learning
11.
PLoS One ; 16(12): e0255757, 2021.
Article in English | MEDLINE | ID: covidwho-1581885

ABSTRACT

As many U.S. states implemented stay-at-home orders beginning in March 2020, anecdotes reported a surge in alcohol sales, raising concerns about increased alcohol use and associated ills. The surveillance report from the National Institute on Alcohol Abuse and Alcoholism provides monthly U.S. alcohol sales data from a subset of states, allowing an investigation of this potential increase in alcohol use. Meanwhile, anonymized human mobility data released by companies such as SafeGraph enables an examination of the visiting behavior of people to various alcohol outlets such as bars and liquor stores. This study examines changes to alcohol sales and alcohol outlet visits during COVID-19 and their geographic differences across states. We find major increases in the sales of spirits and wine since March 2020, while the sales of beer decreased. We also find moderate increases in people's visits to liquor stores, while their visits to bars and pubs substantially decreased. Noticing a significant correlation between alcohol sales and outlet visits, we use machine learning models to examine their relationship and find evidence in some states for likely panic buying of spirits and wine. Large geographic differences exist across states, with both major increases and decreases in alcohol sales and alcohol outlet visits.


Subject(s)
Alcohol Drinking/epidemiology , Alcoholic Beverages/economics , COVID-19/epidemiology , Commerce/statistics & numerical data , Consumer Behavior/statistics & numerical data , Humans , Machine Learning , United States
12.
Sensors (Basel) ; 21(24)2021 Dec 20.
Article in English | MEDLINE | ID: covidwho-1580509

ABSTRACT

The coronavirus disease 2019 (COVID-19) pandemic has affected hundreds of millions of individuals and caused millions of deaths worldwide. Predicting the clinical course of the disease is of pivotal importance to manage patients. Several studies have found hematochemical alterations in COVID-19 patients, such as inflammatory markers. We retrospectively analyzed the anamnestic data and laboratory parameters of 303 patients diagnosed with COVID-19 who were admitted to the Polyclinic Hospital of Bari during the first phase of the COVID-19 global pandemic. After the pre-processing phase, we performed a survival analysis with Kaplan-Meier curves and Cox Regression, with the aim to discover the most unfavorable predictors. The target outcomes were mortality or admission to the intensive care unit (ICU). Different machine learning models were also compared to realize a robust classifier relying on a low number of strongly significant factors to estimate the risk of death or admission to ICU. From the survival analysis, it emerged that the most significant laboratory parameters for both outcomes was C-reactive protein min; HR=17.963 (95% CI 6.548-49.277, p < 0.001) for death, HR=1.789 (95% CI 1.000-3.200, p = 0.050) for admission to ICU. The second most important parameter was Erythrocytes max; HR=1.765 (95% CI 1.141-2.729, p < 0.05) for death, HR=1.481 (95% CI 0.895-2.452, p = 0.127) for admission to ICU. The best model for predicting the risk of death was the decision tree, which resulted in ROC-AUC of 89.66%, whereas the best model for predicting the admission to ICU was support vector machine, which had ROC-AUC of 95.07%. The hematochemical predictors identified in this study can be utilized as a strong prognostic signature to characterize the severity of the disease in COVID-19 patients.


Subject(s)
COVID-19 , Hospital Mortality , Humans , Machine Learning , Prognosis , Retrospective Studies , SARS-CoV-2 , Survival Analysis
13.
J Med Internet Res ; 23(2): e23026, 2021 02 22.
Article in English | MEDLINE | ID: covidwho-1575588

ABSTRACT

BACKGROUND: For the clinical care of patients with well-established diseases, randomized trials, literature, and research are supplemented with clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, artificial intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, a lack of clinical data restricts the design and development of such AI tools, particularly in preparation for an impending crisis or pandemic. OBJECTIVE: This study aimed to develop and test the feasibility of a "patients-like-me" framework to predict the deterioration of patients with COVID-19 using a retrospective cohort of patients with similar respiratory diseases. METHODS: Our framework used COVID-19-like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-19-like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) at an academic medical center from 2008 to 2019. In total, 15 training cohorts were created using different combinations of the COVID-19-like cohorts with the ARDS cohort for exploratory purposes. In this study, two machine learning models were developed: one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value, and negative predictive value. We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features. RESULTS: Compared to the COVID-19-like cohorts (n=16,509), the patients hospitalized with COVID-19 (n=159) were significantly younger, with a higher proportion of patients of Hispanic ethnicity, a lower proportion of patients with smoking history, and fewer patients with comorbidities (P<.001). Patients with COVID-19 had a lower IMV rate (15.1 versus 23.2, P=.02) and shorter time to IMV (2.9 versus 4.1 days, P<.001) compared to the COVID-19-like patients. In the COVID-19-like training data, the top models achieved excellent performance (AUROC>0.90). Validating in the COVID-19 cohort, the top-performing model for predicting IMV was the XGBoost model (AUROC=0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all 4 COVID-19-like cohorts without ARDS achieved the best performance (AUROC=0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood cell count, cardiac troponin, albumin, etc). Our models had class imbalance, which resulted in high negative predictive values and low positive predictive values. CONCLUSIONS: We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic.


Subject(s)
COVID-19/diagnosis , COVID-19/mortality , Machine Learning , Pneumonia, Viral/diagnosis , Aged , Area Under Curve , Cohort Studies , Comorbidity , Female , Hospitalization/statistics & numerical data , Humans , Male , Middle Aged , Pandemics , Pneumonia, Viral/mortality , Predictive Value of Tests , Prognosis , ROC Curve , Respiration, Artificial/statistics & numerical data , Retrospective Studies , SARS-CoV-2 , Treatment Outcome
14.
J Med Internet Res ; 23(2): e23458, 2021 02 26.
Article in English | MEDLINE | ID: covidwho-1574596

ABSTRACT

BACKGROUND: During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model. OBJECTIVE: In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients' chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model. METHODS: Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients' data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables. RESULTS: Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791). CONCLUSIONS: We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning-based clinical decision support tools.


Subject(s)
COVID-19/mortality , Machine Learning , COVID-19/virology , Female , Humans , Male , Middle Aged , Models, Statistical , Pandemics , Retrospective Studies , SARS-CoV-2/isolation & purification , Survival Analysis
15.
Ann Med ; 53(1): 257-266, 2021 12.
Article in English | MEDLINE | ID: covidwho-1574445

ABSTRACT

OBJECTIVES: To appraise effective predictors for COVID-19 mortality in a retrospective cohort study. METHODS: A total of 1270 COVID-19 patients, including 984 admitted in Sino French New City Branch (training and internal validation sets randomly split at 7:3 ratio) and 286 admitted in Optical Valley Branch (external validation set) of Wuhan Tongji hospital, were included in this study. Forty-eight clinical and laboratory features were screened with LASSO method. Further multi-tree extreme gradient boosting (XGBoost) machine learning-based model was used to rank importance of features selected from LASSO and subsequently constructed death risk prediction model with simple-tree XGBoost model. Performances of models were evaluated by AUC, prediction accuracy, precision, and F1 scores. RESULTS: Six features, including disease severity, age, levels of high-sensitivity C-reactive protein (hs-CRP), lactate dehydrogenase (LDH), ferritin, and interleukin-10 (IL-10), were selected as predictors for COVID-19 mortality. Simple-tree XGBoost model conducted by these features can predict death risk accurately with >90% precision and >85% sensitivity, as well as F1 scores >0.90 in training and validation sets. CONCLUSION: We proposed the disease severity, age, serum levels of hs-CRP, LDH, ferritin, and IL-10 as significant predictors for death risk of COVID-19, which may help to identify the high-risk COVID-19 cases. KEY MESSAGES A machine learning method is used to build death risk model for COVID-19 patients. Disease severity, age, hs-CRP, LDH, ferritin, and IL-10 are death risk factors. These findings may help to identify the high-risk COVID-19 cases.


Subject(s)
COVID-19/mortality , Clinical Decision Rules , Hospitalization , Machine Learning , Adult , Aged , Aged, 80 and over , C-Reactive Protein/metabolism , COVID-19/epidemiology , COVID-19/metabolism , COVID-19/physiopathology , Cardiovascular Diseases/epidemiology , China/epidemiology , Cohort Studies , Comorbidity , Diabetes Mellitus/epidemiology , Female , Ferritins/metabolism , Humans , Hypertension/epidemiology , Interleukin-10/metabolism , L-Lactate Dehydrogenase/metabolism , Male , Middle Aged , Prognosis , Reproducibility of Results , Retrospective Studies , SARS-CoV-2 , Severity of Illness Index
16.
J Med Internet Res ; 23(2): e23390, 2021 02 22.
Article in English | MEDLINE | ID: covidwho-1574113

ABSTRACT

BACKGROUND: The initial symptoms of patients with COVID-19 are very much like those of patients with community-acquired pneumonia (CAP); it is difficult to distinguish COVID-19 from CAP with clinical symptoms and imaging examination. OBJECTIVE: The objective of our study was to construct an effective model for the early identification of COVID-19 that would also distinguish it from CAP. METHODS: The clinical laboratory indicators (CLIs) of 61 COVID-19 patients and 60 CAP patients were analyzed retrospectively. Random combinations of various CLIs (ie, CLI combinations) were utilized to establish COVID-19 versus CAP classifiers with machine learning algorithms, including random forest classifier (RFC), logistic regression classifier, and gradient boosting classifier (GBC). The performance of the classifiers was assessed by calculating the area under the receiver operating characteristic curve (AUROC) and recall rate in COVID-19 prediction using the test data set. RESULTS: The classifiers that were constructed with three algorithms from 43 CLI combinations showed high performance (recall rate >0.9 and AUROC >0.85) in COVID-19 prediction for the test data set. Among the high-performance classifiers, several CLIs showed a high usage rate; these included procalcitonin (PCT), mean corpuscular hemoglobin concentration (MCHC), uric acid, albumin, albumin to globulin ratio (AGR), neutrophil count, red blood cell (RBC) count, monocyte count, basophil count, and white blood cell (WBC) count. They also had high feature importance except for basophil count. The feature combination (FC) of PCT, AGR, uric acid, WBC count, neutrophil count, basophil count, RBC count, and MCHC was the representative one among the nine FCs used to construct the classifiers with an AUROC equal to 1.0 when using the RFC or GBC algorithms. Replacing any CLI in these FCs would lead to a significant reduction in the performance of the classifiers that were built with them. CONCLUSIONS: The classifiers constructed with only a few specific CLIs could efficiently distinguish COVID-19 from CAP, which could help clinicians perform early isolation and centralized management of COVID-19 patients.


Subject(s)
COVID-19/diagnosis , Community-Acquired Infections/diagnosis , Machine Learning , Pneumonia/diagnosis , SARS-CoV-2/pathogenicity , Area Under Curve , COVID-19/blood , COVID-19/virology , Community-Acquired Infections/blood , Female , Humans , Laboratories , Leukocyte Count , Logistic Models , Male , Middle Aged , Pneumonia/blood , Procalcitonin/blood , ROC Curve , Retrospective Studies
17.
J Med Internet Res ; 23(2): e24246, 2021 02 10.
Article in English | MEDLINE | ID: covidwho-1573886

ABSTRACT

BACKGROUND: Predicting early respiratory failure due to COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating the patients at greatest risk for deterioration. Given the complexity of COVID-19, machine learning approaches may support clinical decision making for patients with this disease. OBJECTIVE: Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department. METHODS: Data were collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and were discharged, died, or spent a minimum of 48 hours in the hospital between March 1 and May 11, 2020. Of 11,525 patients, 933 (8.1%) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the emergency department. We trained and validated three predictive models (two based on XGBoost and one that used logistic regression) using cross-hospital validation. We compared model performance among all three models as well as an established early warning score (Modified Early Warning Score) using receiver operating characteristic curves, precision-recall curves, and other metrics. RESULTS: The XGBoost model had the highest mean accuracy (0.919; area under the curve=0.77), outperforming the other two models as well as the Modified Early Warning Score. Important predictor variables included the type of oxygen delivery used in the emergency department, patient age, Emergency Severity Index level, respiratory rate, serum lactate, and demographic characteristics. CONCLUSIONS: The XGBoost model had high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19.


Subject(s)
COVID-19/physiopathology , Hospitalization , Intubation, Intratracheal/statistics & numerical data , Machine Learning , Respiration, Artificial/statistics & numerical data , Respiratory Insufficiency/epidemiology , Aged , COVID-19/complications , Clinical Decision Rules , Early Warning Score , Emergency Service, Hospital , Female , Hospitals , Humans , Logistic Models , Male , Middle Aged , Patient Admission , ROC Curve , Respiratory Insufficiency/etiology , Retrospective Studies , SARS-CoV-2 , Triage
18.
J Med Internet Res ; 23(2): e20545, 2021 02 19.
Article in English | MEDLINE | ID: covidwho-1573803

ABSTRACT

COVID-19 cases are exponentially increasing worldwide; however, its clinical phenotype remains unclear. Natural language processing (NLP) and machine learning approaches may yield key methods to rapidly identify individuals at a high risk of COVID-19 and to understand key symptoms upon clinical manifestation and presentation. Data on such symptoms may not be accurately synthesized into patient records owing to the pressing need to treat patients in overburdened health care settings. In this scenario, clinicians may focus on documenting widely reported symptoms that indicate a confirmed diagnosis of COVID-19, albeit at the expense of infrequently reported symptoms. While NLP solutions can play a key role in generating clinical phenotypes of COVID-19, they are limited by the resulting limitations in data from electronic health records (EHRs). A comprehensive record of clinic visits is required-audio recordings may be the answer. A recording of clinic visits represents a more comprehensive record of patient-reported symptoms. If done at scale, a combination of data from the EHR and recordings of clinic visits can be used to power NLP and machine learning models, thus rapidly generating a clinical phenotype of COVID-19. We propose the generation of a pipeline extending from audio or video recordings of clinic visits to establish a model that factors in clinical symptoms and predict COVID-19 incidence. With vast amounts of available data, we believe that a prediction model can be rapidly developed to promote the accurate screening of individuals at a high risk of COVID-19 and to identify patient characteristics that predict a greater risk of a more severe infection. If clinical encounters are recorded and our NLP model is adequately refined, benchtop virologic findings would be better informed. While clinic visit recordings are not the panacea for this pandemic, they are a low-cost option with many potential benefits, which have recently begun to be explored.


Subject(s)
Ambulatory Care/standards , COVID-19/genetics , Communications Media/standards , Electronic Health Records/standards , Machine Learning/standards , Natural Language Processing , Humans , Phenotype , SARS-CoV-2
19.
J Med Internet Res ; 23(2): e26302, 2021 02 22.
Article in English | MEDLINE | ID: covidwho-1575865

ABSTRACT

BACKGROUND: The emergence of SARS-CoV-2 (ie, COVID-19) has given rise to a global pandemic affecting 215 countries and over 40 million people as of October 2020. Meanwhile, we are also experiencing an infodemic induced by the overabundance of information, some accurate and some inaccurate, spreading rapidly across social media platforms. Social media has arguably shifted the information acquisition and dissemination of a considerably large population of internet users toward higher interactivities. OBJECTIVE: This study aimed to investigate COVID-19-related health beliefs on one of the mainstream social media platforms, Twitter, as well as potential impacting factors associated with fluctuations in health beliefs on social media. METHODS: We used COVID-19-related posts from the mainstream social media platform Twitter to monitor health beliefs. A total of 92,687,660 tweets corresponding to 8,967,986 unique users from January 6 to June 21, 2020, were retrieved. To quantify health beliefs, we employed the health belief model (HBM) with four core constructs: perceived susceptibility, perceived severity, perceived benefits, and perceived barriers. We utilized natural language processing and machine learning techniques to automate the process of judging the conformity of each tweet with each of the four HBM constructs. A total of 5000 tweets were manually annotated for training the machine learning architectures. RESULTS: The machine learning classifiers yielded areas under the receiver operating characteristic curves over 0.86 for the classification of all four HBM constructs. Our analyses revealed a basic reproduction number R0 of 7.62 for trends in the number of Twitter users posting health belief-related content over the study period. The fluctuations in the number of health belief-related tweets could reflect dynamics in case and death statistics, systematic interventions, and public events. Specifically, we observed that scientific events, such as scientific publications, and nonscientific events, such as politicians' speeches, were comparable in their ability to influence health belief trends on social media through a Kruskal-Wallis test (P=.78 and P=.92 for perceived benefits and perceived barriers, respectively). CONCLUSIONS: As an analogy of the classic epidemiology model where an infection is considered to be spreading in a population with an R0 greater than 1, we found that the number of users tweeting about COVID-19 health beliefs was amplifying in an epidemic manner and could partially intensify the infodemic. It is "unhealthy" that both scientific and nonscientific events constitute no disparity in impacting the health belief trends on Twitter, since nonscientific events, such as politicians' speeches, might not be endorsed by substantial evidence and could sometimes be misleading.


Subject(s)
COVID-19/psychology , Data Analysis , Health Education/statistics & numerical data , Machine Learning , Natural Language Processing , Public Opinion , Social Media/statistics & numerical data , COVID-19/epidemiology , Humans , Pandemics
20.
Proc Natl Acad Sci U S A ; 118(51)2021 12 21.
Article in English | MEDLINE | ID: covidwho-1569348

ABSTRACT

Simultaneously tracking the global impact of COVID-19 is challenging because of regional variation in resources and reporting. Leveraging self-reported survey outcomes via an existing international social media network has the potential to provide standardized data streams to support monitoring and decision-making worldwide, in real time, and with limited local resources. The University of Maryland Global COVID-19 Trends and Impact Survey (UMD-CTIS), in partnership with Facebook, has invited daily cross-sectional samples from the social media platform's active users to participate in the survey since its launch on April 23, 2020. We analyzed UMD-CTIS survey data through December 20, 2020, from 31,142,582 responses representing 114 countries/territories weighted for nonresponse and adjusted to basic demographics. We show consistent respondent demographics over time for many countries/territories. Machine Learning models trained on national and pooled global data verified known symptom indicators. COVID-like illness (CLI) signals were correlated with government benchmark data. Importantly, the best benchmarked UMD-CTIS signal uses a single survey item whereby respondents report on CLI in their local community. In regions with strained health infrastructure but active social media users, we show it is possible to define COVID-19 impact trajectories using a remote platform independent of local government resources. This syndromic surveillance public health tool is the largest global health survey to date and, with brief participant engagement, can provide meaningful, timely insights into the global COVID-19 pandemic at a local scale.


Subject(s)
COVID-19/epidemiology , Public Health Surveillance/methods , Social Media , COVID-19/diagnosis , COVID-19 Testing , Cross-Sectional Studies , Epidemiologic Methods , Humans , Internationality , Machine Learning , Pandemics/statistics & numerical data
SELECTION OF CITATIONS
SEARCH DETAIL
...