Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
2.
JAMIA Open ; 7(1): ooae015, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38414534

ABSTRACT

Objectives: In the United States, end-stage kidney disease (ESKD) is responsible for high mortality and significant healthcare costs, with the number of cases sharply increasing in the past 2 decades. In this study, we aimed to reduce these impacts by developing an ESKD model for predicting its occurrence in a 2-year period. Materials and Methods: We developed a machine learning (ML) pipeline to test different models for the prediction of ESKD. The electronic health record was used to capture several kidney disease-related variables. Various imputation methods, feature selection, and sampling approaches were tested. We compared the performance of multiple ML models using area under the ROC curve (AUCROC), area under the Precision-Recall curve (PR-AUC), and Brier scores for discrimination, precision, and calibration, respectively. Explainability methods were applied to the final model. Results: Our best model was a gradient-boosting machine with feature selection and imputation methods as additional components. The model exhibited an AUCROC of 0.97, a PR-AUC of 0.33, and a Brier score of 0.002 on a holdout test set. A chart review analysis by expert physicians indicated clinical utility. Discussion and Conclusion: An ESKD prediction model can identify individuals at risk for ESKD and has been successfully deployed within our health system.

3.
J Biomed Inform ; 149: 104551, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38000765

ABSTRACT

The development and deployment of machine learning (ML) models for biomedical research and healthcare currently lacks standard methodologies. Although tools for model replication are numerous, without a unifying blueprint it remains difficult to scientifically reproduce predictive ML models for any number of reasons (e.g., assumptions regarding data distributions and preprocessing, unclear test metrics, etc.) and ultimately, questions around generalizability and transportability are not readily answered. To facilitate scientific reproducibility, we built upon the Predictive Model Markup Language (PMML) to capture essential information. As a key component of the PREdictive Model Index and Exchange REpository (PREMIERE) platform, we present the Automated Metadata Pipeline (AMP) for conversion of a given predictive ML model into an extended PMML file that autocompletes an ML-based checklist, assessing model elements for interoperability and reproducibility. We demonstrate this pipeline on multiple test cases with three different ML algorithms and health-related datasets, providing a foundation for future predictive model reproducibility, sharing, and comparison.


Subject(s)
Biomedical Research , Reproducibility of Results , Algorithms , Records , Metadata
4.
Res Sq ; 2023 Nov 14.
Article in English | MEDLINE | ID: mdl-38014280

ABSTRACT

Continuous renal replacement therapy (CRRT) is a form of dialysis prescribed to severely ill patients who cannot tolerate regular hemodialysis. However, as the patients are typically very ill to begin with, there is always uncertainty as to whether they will survive during or after CRRT treatment. Because of outcome uncertainty, a large percentage of patients treated with CRRT do not survive, utilizing scarce resources and raising false hope in patients and their families. To address these issues, we present a machine-learning-based algorithm to predict if patients will survive after being treated with CRRT. We use information extracted from electronic health records from patients who were placed on CRRT at multiple institutions to train a model that predicts CRRT survival outcome; on a held-out test set, the model achieved an area under the receiver operating curve of 0.929 (CI=0.917-0.942). Feature importance, error, and subgroup analyses identified consistently, mean corpuscular volume as a driving feature for model predictions. Overall, we demonstrate the potential for predictive machine-learning models to assist clinicians in alleviating the uncertainty of CRRT patient survival outcomes, with opportunities for future improvement through further data collection and advanced modeling.

5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2303-2309, 2021 11.
Article in English | MEDLINE | ID: mdl-34891747

ABSTRACT

The adoption of electronic health records (EHRs) has made patient data increasingly accessible, precipitating the development of various clinical decision support systems and data-driven models to help physicians. However, missing data are common in EHR-derived datasets, which can introduce significant uncertainty, if not invalidating the use of a predictive model. Machine learning (ML)-based imputation methods have shown promise in various domains for the task of estimating values and reducing uncertainty to the point that a predictive model can be employed. We introduce Autopopulus, a novel framework that enables the design and evaluation of various autoencoder architectures for efficient imputation on large datasets. Autopopulus implements existing autoencoder methods as well as a new technique that outputs a range of estimated values (rather than point estimates), and demonstrates a workflow that helps users make an informed decision on an appropriate imputation method. To further illustrate Autopopulus' utility, we use it to identify not only which imputation methods can most accurately impute on a large clinical dataset, but to also identify the imputation methods that enable downstream predictive models to achieve the best performance for prediction of chronic kidney disease (CKD) progression.


Subject(s)
Electronic Health Records , Research Design , Datasets as Topic , Disease Progression , Humans , Renal Insufficiency, Chronic/diagnosis , Software , Uncertainty
6.
Front Big Data ; 4: 693869, 2021.
Article in English | MEDLINE | ID: mdl-34604740

ABSTRACT

We present a novel approach for imputing missing data that incorporates temporal information into bipartite graphs through an extension of graph representation learning. Missing data is abundant in several domains, particularly when observations are made over time. Most imputation methods make strong assumptions about the distribution of the data. While novel methods may relax some assumptions, they may not consider temporality. Moreover, when such methods are extended to handle time, they may not generalize without retraining. We propose using a joint bipartite graph approach to incorporate temporal sequence information. Specifically, the observation nodes and edges with temporal information are used in message passing to learn node and edge embeddings and to inform the imputation task. Our proposed method, temporal setting imputation using graph neural networks (TSI-GNN), captures sequence information that can then be used within an aggregation function of a graph neural network. To the best of our knowledge, this is the first effort to use a joint bipartite graph approach that captures sequence information to handle missing data. We use several benchmark datasets to test the performance of our method against a variety of conditions, comparing to both classic and contemporary methods. We further provide insight to manage the size of the generated TSI-GNN model. Through our analysis we show that incorporating temporal information into a bipartite graph improves the representation at the 30% and 60% missing rate, specifically when using a nonlinear model for downstream prediction tasks in regularly sampled datasets and is competitive with existing temporal methods under different scenarios.

7.
Artif Intell Health (2018) ; 11326: 213-227, 2019.
Article in English | MEDLINE | ID: mdl-31363717

ABSTRACT

Cancer screening can benefit from individualized decision-making tools that decrease overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized methods. Partially observable Markov decision processes (POMDPs), when defined with an appropriate reward function, can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDPs. Using experts (physicians) retrospective screening decisions for lung and breast cancer screening, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was employed to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards for a POMDP. The POMDP screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. The Cohen's Kappa score of agreement between the POMDPs and physicians' predictions was high in breast cancer and had a decreasing trend in lung cancer.

8.
Nephrol Dial Transplant ; 34(10): 1780-1788, 2019 10 01.
Article in English | MEDLINE | ID: mdl-30844074

ABSTRACT

BACKGROUND: Complement factor H-related protein 5 (CFHR5) nephropathy is an inherited renal disease characterized by microscopic and synpharyngitic macroscopic haematuria, C3 glomerulonephritis and renal failure. It is caused by an internal duplication of exons 2-3 within the CFHR5 gene resulting in dysregulation of the alternative complement pathway. The clinical characteristics and outcomes of transplanted patients with this rare familial nephropathy remain unknown. METHODS: This is a retrospective case series study of 17 kidney transplant patients with the established founder mutation, followed-up over a span of 30 years. RESULTS: The mean (±SD) age of patients at the time of the study and at transplantation was 58.6 ± 9.9 and 46.7 ± 8.8 years, respectively. The 10- and 15-year patient survival rates were 100 and 77.8%, respectively. Proteinuria was present in 33.3% and microscopic haematuria in 58.3% of patients with a functional graft. Serum complement levels were normal in all. 'Confirmed' and 'likely' recurrence of CFHR5 nephropathy were 16.6 and 52.9%, respectively; however, 76.5% of patients had a functional graft after a median of 120 months post-transplantation. Total recurrence was not associated with graft loss (P = 0.171), but was associated with the presence of microscopic haematuria (P = 0.001) and proteinuria (P = 0.018). Graft loss was associated with the presence of proteinuria (P = 0.025). CONCLUSIONS: We describe for the first time the clinical characteristics and outcome of patients with CFHR5 nephropathy post-transplantation. Despite the recurrence of CFHR5 nephropathy, we provide evidence for a long-term favourable outcome and support the continued provision of kidney transplantation as a renal replacement option in patients with CFHR5 nephropathy.


Subject(s)
Complement System Proteins/genetics , Glomerulonephritis/mortality , Kidney Diseases/complications , Kidney Transplantation/mortality , Mutation , Adult , Aged , Female , Glomerulonephritis/etiology , Glomerulonephritis/surgery , Humans , Kidney Diseases/genetics , Male , Middle Aged , Prognosis , Retrospective Studies , Survival Rate
9.
IEEE Access ; 7: 119403-119419, 2019.
Article in English | MEDLINE | ID: mdl-32754420

ABSTRACT

Globally, lung cancer is responsible for nearly one in five cancer deaths. The National Lung Screening Trial (NLST) demonstrated the efficacy of low-dose computed tomography (LDCT) to identify early-stage disease, setting the basis for widespread implementation of lung cancer screening programs. However, the specificity of LDCT lung cancer screening is suboptimal, with a significant false positive rate. Representing this imaging-based screening process as a sequential decision making problem, we combined multiple machine learning-based methods to learn a partially-observable Markov decision process that simultaneously optimizes lung cancer detection while enhancing test specificity. Using NLST data, we trained a dynamic Bayesian network as an observational model and used inverse reinforcement learning to discover a rewards function based on experts' decisions. Our resultant predictive model decreased the false positive rate while maintaining a high true positive rate at a level comparable to human experts. Our model also detected a number of lung cancers earlier.

10.
AMIA Annu Symp Proc ; 2018: 1461-1470, 2018.
Article in English | MEDLINE | ID: mdl-30815191

ABSTRACT

Risk prediction models are crucial for assessing the pretest probability of cancer and are applied to stratify patient management strategies. These models are frequently based on multivariate regression analysis, requiring that all risk factors be specified, and do not convey the confidence in their predictions. We present a framework for uncertainty analysis that accounts for variability in input values. Uncertain or missing values are replaced with a range of plausible values. These ranges are used to compute individualized risk confidence intervals. We demonstrate our approach using the Gail model to evaluate the impact of uncertainty on management decisions. Up to 13% of cases (uncertain) had a risk interval that falls within the decision threshold (e.g., 1.67% 5-year absolute risk). A small number of cases changed from low- to high-risk when missing values were present. Our analysis underscores the need for better communication of input assumptions that influence the resulting predictions.


Subject(s)
Algorithms , Breast Neoplasms/diagnosis , Models, Theoretical , Risk Assessment/methods , Uncertainty , Decision Making , Early Detection of Cancer , Female , Humans , Risk Factors
11.
Comput Biol Med ; 81: 111-120, 2017 02 01.
Article in English | MEDLINE | ID: mdl-28038345

ABSTRACT

A growing number of individuals who are considered at high risk of cancer are now routinely undergoing population screening. However, noted harms such as radiation exposure, overdiagnosis, and overtreatment underscore the need for better temporal models that predict who should be screened and at what frequency. The mean sojourn time (MST), an average duration period when a tumor can be detected by imaging but with no observable clinical symptoms, is a critical variable for formulating screening policy. Estimation of MST has been long studied using continuous Markov model (CMM) with Maximum likelihood estimation (MLE). However, a lot of traditional methods assume no observation error of the imaging data, which is unlikely and can bias the estimation of the MST. In addition, the MLE may not be stably estimated when data is sparse. Addressing these shortcomings, we present a probabilistic modeling approach for periodic cancer screening data. We first model the cancer state transition using a three state CMM model, while simultaneously considering observation error. We then jointly estimate the MST and observation error within a Bayesian framework. We also consider the inclusion of covariates to estimate individualized rates of disease progression. Our approach is demonstrated on participants who underwent chest x-ray screening in the National Lung Screening Trial (NLST) and validated using posterior predictive p-values and Pearson's chi-square test. Our model demonstrates more accurate and sensible estimates of MST in comparison to MLE.


Subject(s)
Bayes Theorem , Disease Progression , Early Detection of Cancer/methods , Lung Neoplasms/diagnostic imaging , Models, Statistical , Radiographic Image Interpretation, Computer-Assisted/methods , Severity of Illness Index , Aged , Algorithms , Computer Simulation , Female , Humans , Male , Markov Chains , Middle Aged , Reproducibility of Results , Sensitivity and Specificity
12.
Artif Intell Med ; 72: 42-55, 2016 09.
Article in English | MEDLINE | ID: mdl-27664507

ABSTRACT

INTRODUCTION: Identifying high-risk lung cancer individuals at an early disease stage is the most effective way of improving survival. The landmark National Lung Screening Trial (NLST) demonstrated the utility of low-dose computed tomography (LDCT) imaging to reduce mortality (relative to X-ray screening). As a result of the NLST and other studies, imaging-based lung cancer screening programs are now being implemented. However, LDCT interpretation results in a high number of false positives. A set of dynamic Bayesian networks (DBN) were designed and evaluated to provide insight into how longitudinal data can be used to help inform lung cancer screening decisions. METHODS: The LDCT arm of the NLST dataset was used to build and explore five DBNs for high-risk individuals. Three of these DBNs were built using a backward construction process, and two using structure learning methods. All models employ demographics, smoking status, cancer history, family lung cancer history, exposure risk factors, comorbidities related to lung cancer, and LDCT screening outcome information. Given the uncertainty arising from lung cancer screening, a cancer state-space model based on lung cancer staging was utilized to characterize the cancer status of an individual over time. The models were evaluated on balanced training and test sets of cancer and non-cancer cases to deal with data imbalance and overfitting. RESULTS: Results were comparable to expert decisions. The average area under the curve (AUC) of the receiver operating characteristic (ROC) for the three intervention points of the NLST trial was higher than 0.75 for all models. Evaluation of the models on the complete LDCT arm of the NLST dataset (N=25,486) demonstrated satisfactory generalization. Consensus of predictions over similar cases is reported in concordance statistics between the models' and the physicians' predictions. The models' predictive ability with respect to missing data was also evaluated with the sample of cases that missed the second screening exam of the trial (N=417). The DBNs outperformed comparison models such as logistic regression and naïve Bayes. CONCLUSION: The lung cancer screening DBNs demonstrated high discrimination and predictive power with the majority of cancer and non-cancer cases.


Subject(s)
Bayes Theorem , Early Detection of Cancer , Lung Neoplasms/diagnosis , Tomography, X-Ray Computed , Clinical Trials as Topic , Humans , Incidence , Lung Neoplasms/epidemiology , Mass Screening , Models, Theoretical , Neoplasm Staging , ROC Curve , Smoking
SELECTION OF CITATIONS
SEARCH DETAIL
...