Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 4 de 4
Add filters

Year range
EuropePMC; 2021.
Preprint in English | EuropePMC | ID: ppcovidwho-294818


ABSTRACT OBJECTIVE Before integrating new machine learning (ML) into clinical practice, algorithms must undergo validation. Validation studies require sample size estimates. Unlike hypothesis testing studies seeking a p-value, the goal of validating predictive models is obtaining estimates of model performance. Our aim was to provide a standardized, data distribution- and model-agnostic approach to sample size calculations for validation studies of predictive ML models. MATERIALS AND METHODS Sample Size Analysis for Machine Learning (SSAML) was tested in three previously published models: brain age to predict mortality (Cox Proportional Hazard), COVID hospitalization risk prediction (ordinal regression), and seizure risk forecasting (deep learning). The SSAML steps are: 1) Specify performance metrics for model discrimination and calibration. For discrimination, we use area under the receiver operating curve (AUC) for classification and Harrell’s C-statistic for survival models. For calibration, we employ calibration slope and calibration-in-the-large. 2) Specify the required precision and accuracy (≤0.5 normalized confidence interval width and ±5% accuracy). 3) Specify the required coverage probability (95%). 4) For increasing sample sizes, calculate the expected precision and bias that is achievable. 5) Choose the minimum sample size that meets all requirements. RESULTS Minimum sample sizes were obtained in each dataset using standardized criteria. DISCUSSION SSAML provides a formal expectation of precision and accuracy at a desired confidence level. CONCLUSION SSAML is open-source and agnostics to data type and ML model. It can be used for clinical validation studies of ML models.

Front Neurol ; 12: 642912, 2021.
Article in English | MEDLINE | ID: covidwho-1202073


Objectives: Patients with comorbidities are at increased risk for poor outcomes in COVID-19, yet data on patients with prior neurological disease remains limited. Our objective was to determine the odds of critical illness and duration of mechanical ventilation in patients with prior cerebrovascular disease and COVID-19. Methods: A observational study of 1,128 consecutive adult patients admitted to an academic center in Boston, Massachusetts, and diagnosed with laboratory-confirmed COVID-19. We tested the association between prior cerebrovascular disease and critical illness, defined as mechanical ventilation (MV) or death by day 28, using logistic regression with inverse probability weighting of the propensity score. Among intubated patients, we estimated the cumulative incidence of successful extubation without death over 45 days using competing risk analysis. Results: Of the 1,128 adults with COVID-19, 350 (36%) were critically ill by day 28. The median age of patients was 59 years (SD: 18 years) and 640 (57%) were men. As of June 2nd, 2020, 127 (11%) patients had died. A total of 177 patients (16%) had a prior cerebrovascular disease. Prior cerebrovascular disease was significantly associated with critical illness (OR = 1.54, 95% CI = 1.14-2.07), lower rate of successful extubation (cause-specific HR = 0.57, 95% CI = 0.33-0.98), and increased duration of intubation (restricted mean time difference = 4.02 days, 95% CI = 0.34-10.92) compared to patients without cerebrovascular disease. Interpretation: Prior cerebrovascular disease adversely affects COVID-19 outcomes in hospitalized patients. Further study is required to determine if this subpopulation requires closer monitoring for disease progression during COVID-19.

J Infect Dis ; 223(1): 38-46, 2021 01 04.
Article in English | MEDLINE | ID: covidwho-1066343


BACKGROUND: We sought to develop an automatable score to predict hospitalization, critical illness, or death for patients at risk for coronavirus disease 2019 (COVID-19) presenting for urgent care. METHODS: We developed the COVID-19 Acuity Score (CoVA) based on a single-center study of adult outpatients seen in respiratory illness clinics or the emergency department. Data were extracted from the Partners Enterprise Data Warehouse, and split into development (n = 9381, 7 March-2 May) and prospective (n = 2205, 3-14 May) cohorts. Outcomes were hospitalization, critical illness (intensive care unit or ventilation), or death within 7 days. Calibration was assessed using the expected-to-observed event ratio (E/O). Discrimination was assessed by area under the receiver operating curve (AUC). RESULTS: In the prospective cohort, 26.1%, 6.3%, and 0.5% of patients experienced hospitalization, critical illness, or death, respectively. CoVA showed excellent performance in prospective validation for hospitalization (expected-to-observed ratio [E/O]: 1.01; AUC: 0.76), for critical illness (E/O: 1.03; AUC: 0.79), and for death (E/O: 1.63; AUC: 0.93). Among 30 predictors, the top 5 were age, diastolic blood pressure, blood oxygen saturation, COVID-19 testing status, and respiratory rate. CONCLUSIONS: CoVA is a prospectively validated automatable score for the outpatient setting to predict adverse events related to COVID-19 infection.

COVID-19/diagnosis , Severity of Illness Index , Adult , Aged , Critical Illness , Female , Hospitalization , Humans , Intensive Care Units , Male , Middle Aged , Models, Theoretical , Outpatients , Predictive Value of Tests , Prognosis , Prospective Studies , ROC Curve , Sensitivity and Specificity
JMIR Med Inform ; 9(2): e25457, 2021 Feb 10.
Article in English | MEDLINE | ID: covidwho-1032549


BACKGROUND: Medical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor-intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at 2 large hospitals for patients hospitalized with COVID-19. OBJECTIVE: Our study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled nursing inpatient facility [SNIF], and death) of patients hospitalized with COVID-19 based on hospital discharge summary notes. METHODS: Text mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women's Hospital) between March 10, 2020, and June 30, 2020. The data were divided into a training set (70%) and hold-out test set (30%). Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set. RESULTS: The study cohort included 1737 adult patients (median age 61 [SD 18] years; 55% men; 45% White and 16% Black; 14% nonsurvivors and 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following: "appointments specialty," "home health," and "home care" (home); "intubate" and "ARDS" (inpatient rehabilitation); "service" (SNIF); "brief assessment" and "covid" (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95% CI 0.97-0.98) and average precision of 0.81 (95% CI 0.75-0.84) in the testing set for prediction of discharge disposition. CONCLUSIONS: A supervised learning-based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients' discharge disposition that is possible with EHR data.