Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Comput Biol Med ; 170: 108014, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38301515

RESUMO

BACKGROUND: Across medicine, prognostic models are used to estimate patient risk of certain future health outcomes (e.g., cardiovascular or mortality risk). To develop (or train) prognostic models, historic patient-level training data is needed containing both the predictive factors (i.e., features) and the relevant health outcomes (i.e., labels). Sometimes, when the health outcomes are not recorded in structured data, these are first extracted from textual notes using text mining techniques. Because there exist many studies utilizing text mining to obtain outcome data for prognostic model development, our aim is to study the impact of the text mining quality on downstream prognostic model performance. METHODS: We conducted a simulation study charting the relationship between text mining quality and prognostic model performance using an illustrative case study about in-hospital mortality prediction in intensive care unit patients. We repeatedly developed and evaluated a prognostic model for in-hospital mortality, using outcome data extracted by multiple text mining models of varying quality. RESULTS: Interestingly, we found in our case study that a relatively low-quality text mining model (F1 score ≈ 0.50) could already be used to train a prognostic model with quite good discrimination (area under the receiver operating characteristic curve of around 0.80). The calibration of the risks estimated by the prognostic model seemed unreliable across the majority of settings, even when text mining models were of relatively high quality (F1 ≈ 0.80). DISCUSSION: Developing prognostic models on text-extracted outcomes using imperfect text mining models seems promising. However, it is likely that prognostic models developed using this approach may not produce well-calibrated risk estimates, and require recalibration in (possibly a smaller amount of) manually extracted outcome data.


Assuntos
Cuidados Críticos , Mineração de Dados , Humanos , Prognóstico , Simulação por Computador , Avaliação de Resultados em Cuidados de Saúde
2.
J Clin Epidemiol ; 167: 111258, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38219811

RESUMO

OBJECTIVES: Natural language processing (NLP) of clinical notes in electronic medical records is increasingly used to extract otherwise sparsely available patient characteristics, to assess their association with relevant health outcomes. Manual data curation is resource intensive and NLP methods make these studies more feasible. However, the methodology of using NLP methods reliably in clinical research is understudied. The objective of this study is to investigate how NLP models could be used to extract study variables (specifically exposures) to reliably conduct exposure-outcome association studies. STUDY DESIGN AND SETTING: In a convenience sample of patients admitted to the intensive care unit of a US academic health system, multiple association studies are conducted, comparing the association estimates based on NLP-extracted vs. manually extracted exposure variables. The association studies varied in NLP model architecture (Bidirectional Encoder Decoder from Transformers, Long Short-Term Memory), training paradigm (training a new model, fine-tuning an existing external model), extracted exposures (employment status, living status, and substance use), health outcomes (having a do-not-resuscitate/intubate code, length of stay, and in-hospital mortality), missing data handling (multiple imputation vs. complete case analysis), and the application of measurement error correction (via regression calibration). RESULTS: The study was conducted on 1,174 participants (median [interquartile range] age, 61 [50, 73] years; 60.6% male). Additionally, up to 500 discharge reports of participants from the same health system and 2,528 reports of participants from an external health system were used to train the NLP models. Substantial differences were found between the associations based on NLP-extracted and manually extracted exposures under all settings. The error in association was only weakly correlated with the overall F1 score of the NLP models. CONCLUSION: Associations estimated using NLP-extracted exposures should be interpreted with caution. Further research is needed to set conditions for reliable use of NLP in medical association studies.


Assuntos
Unidades de Terapia Intensiva , Processamento de Linguagem Natural , Humanos , Masculino , Pessoa de Meia-Idade , Feminino , Registros Eletrônicos de Saúde
3.
PLoS One ; 18(12): e0294557, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38091283

RESUMO

BACKGROUND: General practitioners (GPs) often assess patients with acute infections. It is challenging for GPs to recognize patients needing immediate hospital referral for sepsis while avoiding unnecessary referrals. This study aimed to predict adverse sepsis-related outcomes from telephone triage information of patients presenting to out-of-hours GP cooperatives. METHODS: A retrospective cohort study using linked routine care databases from out-of-hours GP cooperatives, general practices, hospitals and mortality registration. We included adult patients with complaints possibly related to an acute infection, who were assessed (clinic consultation or home visit) by a GP from a GP cooperative between 2017-2019. We used telephone triage information to derive a risk prediction model for sepsis-related adverse outcome (infection-related ICU admission within seven days or infection-related death within 30 days) using logistic regression, random forest, and neural network machine learning techniques. Data from 2017 and 2018 were used for derivation and from 2019 for validation. RESULTS: We included 155,486 patients (median age of 51 years; 59% females) in the analyses. The strongest predictors for sepsis-related adverse outcome were age, type of contact (home visit or clinic consultation), patients considered ABCD unstable during triage, and the entry complaints"general malaise", "shortness of breath" and "fever". The multivariable logistic regression model resulted in a C-statistic of 0.89 (95% CI 0.88-0.90) with good calibration. Machine learning models performed similarly to the logistic regression model. A "sepsis alert" based on a predicted probability >1% resulted in a sensitivity of 82% and a positive predictive value of 4.5%. However, most events occurred in patients receiving home visits, and model performance was substantially worse in this subgroup (C-statistic 0.70). CONCLUSION: Several patient characteristics identified during telephone triage of patients presenting to out-of-hours GP cooperatives were associated with sepsis-related adverse outcomes. Still, on a patient level, predictions were not sufficiently accurate for clinical purposes.


Assuntos
Plantão Médico , Infecções , Sepse , Adulto , Feminino , Humanos , Pessoa de Meia-Idade , Masculino , Estudos de Coortes , Estudos Retrospectivos , Triagem/métodos , Sepse/diagnóstico , Telefone , Unidades de Terapia Intensiva
4.
Eur J Cardiothorac Surg ; 64(3)2023 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-37672025

RESUMO

OBJECTIVES: The aim of this study was to investigate the performance of the EuroSCORE II over time and dynamics in values of predictors included in the model. METHODS: A cohort study was performed using data from the Netherlands Heart Registration. All cardiothoracic surgical procedures performed between 1 January 2013 and 31 December 2019 were included for analysis. Performance of the EuroSCORE II was assessed across 3-month intervals in terms of calibration and discrimination. For subgroups of major surgical procedures, performance of the EuroSCORE II was assessed across 12-month time intervals. Changes in values of individual EuroSCORE II predictors over time were assessed graphically. RESULTS: A total of 103 404 cardiothoracic surgical procedures were included. Observed mortality risk ranged between 1.9% [95% confidence interval (CI) 1.6-2.4] and 3.6% (95% CI 2.6-4.4) across 3-month intervals, while the mean predicted mortality risk ranged between 3.4% (95% CI 3.3-3.6) and 4.2% (95% CI 3.9-4.6). The corresponding observed:expected ratios ranged from 0.50 (95% CI 0.46-0.61) to 0.95 (95% CI 0.74-1.16). Discriminative performance in terms of the c-statistic ranged between 0.82 (95% CI 0.78-0.89) and 0.89 (95% CI 0.87-0.93). The EuroSCORE II consistently overestimated mortality compared to observed mortality. This finding was consistent across all major cardiothoracic surgical procedures. Distributions of values of individual predictors varied broadly across predictors over time. Most notable trends were a decrease in elective surgery from 75% to 54% and a rise in patients with no or New York Heart Association I class heart failure from 27% to 33%. CONCLUSIONS: The EuroSCORE II shows good discriminative performance, but consistently overestimates mortality risks of all types of major cardiothoracic surgical procedures in the Netherlands.


Assuntos
Procedimentos Cirúrgicos Cardíacos , Humanos , Estudos de Coortes , Coração , Procedimentos Cirúrgicos Eletivos , Calibragem
5.
Radiother Oncol ; 179: 109449, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36566991

RESUMO

BACKGROUND: Normal-tissue complication probability (NTCP) models predict complication risk in patients receiving radiotherapy, considering radiation dose to healthy tissues, and are used to select patients for proton therapy, based on their expected reduction in risk after proton therapy versus photon radiotherapy (ΔNTCP). Recommended model evaluation measures include area under the receiver operating characteristic curve (AUC), overall calibration (CITL), and calibration slope (CS), whose precise relation to patient selection is still unclear. We investigated how each measure relates to patient selection outcomes. METHODS: The model validation and consequent patient selection process was simulated within empirical head and neck cancer patient data. By manipulating performance measures independently via model perturbations, the relation between model performance and patient selection was studied. RESULTS: Small reductions in AUC (-0.02) yielded mean changes in ΔNTCP between 0.9-3.2 %, and single-model patient selection differences between 2-19 %. Deviations (-0.2 or +0.2) in CITL or CS yielded mean changes in ΔNTCP between 0.3-1.4 %, and single-model patient selection differences between 1-10 %. CONCLUSIONS: Each measure independently impacts ΔNTCP and patient selection and should thus be assessed in a representative sufficiently large external sample. Our suggested practical model selection approach is considering the model with the highest AUC, and recalibrating it if needed.


Assuntos
Neoplasias de Cabeça e Pescoço , Terapia com Prótons , Humanos , Terapia com Prótons/efeitos adversos , Seleção de Pacientes , Dosagem Radioterapêutica , Neoplasias de Cabeça e Pescoço/etiologia , Probabilidade , Planejamento da Radioterapia Assistida por Computador
6.
NPJ Digit Med ; 5(1): 2, 2022 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-35013569

RESUMO

While the opportunities of ML and AI in healthcare are promising, the growth of complex data-driven prediction models requires careful quality and applicability assessment before they are applied and disseminated in daily practice. This scoping review aimed to identify actionable guidance for those closely involved in AI-based prediction model (AIPM) development, evaluation and implementation including software engineers, data scientists, and healthcare professionals and to identify potential gaps in this guidance. We performed a scoping review of the relevant literature providing guidance or quality criteria regarding the development, evaluation, and implementation of AIPMs using a comprehensive multi-stage screening strategy. PubMed, Web of Science, and the ACM Digital Library were searched, and AI experts were consulted. Topics were extracted from the identified literature and summarized across the six phases at the core of this review: (1) data preparation, (2) AIPM development, (3) AIPM validation, (4) software development, (5) AIPM impact assessment, and (6) AIPM implementation into daily healthcare practice. From 2683 unique hits, 72 relevant guidance documents were identified. Substantial guidance was found for data preparation, AIPM development and AIPM validation (phases 1-3), while later phases clearly have received less attention (software development, impact assessment and implementation) in the scientific literature. The six phases of the AIPM development, evaluation and implementation cycle provide a framework for responsible introduction of AI-based prediction models in healthcare. Additional domain and technology specific research may be necessary and more practical experience with implementing AIPMs is needed to support further guidance.

7.
Diagn Progn Res ; 6(1): 1, 2022 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-35016734

RESUMO

BACKGROUND: Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate. METHODS: We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations. RESULTS: In the conducted simulations, no effect of collinearity was observed on predictive outcomes (AUC, R2, Intercept, Slope) across methods. However, a negative effect of collinearity on the stability of predictor selection was found, affecting all compared methods, but in particular methods that perform strong predictor selection (e.g., Lasso). Methods for which the included set of predictors remained most stable under increased collinearity were Ridge, PCLR, LAELR, and Dropout. CONCLUSIONS: Based on the results, we would recommend refraining from data-driven predictor selection approaches in the presence of high collinearity, because of the increased instability of predictor selection, even in relatively high events-per-variable settings. The selection of certain predictors over others may disproportionally give the impression that included predictors have a stronger association with the outcome than excluded predictors.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...