Pesquisa | Portal Regional da BVS (teste)

Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text.

Bear Don't Walk Iv, Oliver J; Pichon, Adrienne; Nieva, Harry Reyes; Sun, Tony; Altosaar, Jaan; Natarajan, Karthik; Perotte, Adler; Tarczy-Hornoch, Peter; Demner-Fushman, Dina; Elhadad, Noémie.

AMIA Annu Symp Proc ; 2023: 289-298, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38222422

RESUMO

Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.

Assuntos

Aprendizado Profundo , Etnicidade , Humanos , Idioma , Processamento de Linguagem Natural

Clinically relevant pretraining is all you need.

Bear Don't Walk Iv, Oliver J; Sun, Tony; Perotte, Adler; Elhadad, Noémie.

J Am Med Inform Assoc ; 28(9): 1970-1976, 2021 08 13.

Artigo em Inglês | MEDLINE | ID: mdl-34151966

RESUMO

Clinical notes present a wealth of information for applications in the clinical domain, but heterogeneity across clinical institutions and settings presents challenges for their processing. The clinical natural language processing field has made strides in overcoming domain heterogeneity, while pretrained deep learning models present opportunities to transfer knowledge from one task to another. Pretrained models have performed well when transferred to new tasks; however, it is not well understood if these models generalize across differences in institutions and settings within the clinical domain. We explore if institution or setting specific pretraining is necessary for pretrained models to perform well when transferred to new tasks. We find no significant performance difference between models pretrained across institutions and settings, indicating that clinically pretrained models transfer well across such boundaries. Given a clinically pretrained model, clinical natural language processing researchers may forgo the time-consuming pretraining step without a significant performance drop.

Assuntos

Aprendizado Profundo , Humanos , Processamento de Linguagem Natural , Pesquisadores

FasTag: Automatic text classification of unstructured medical narratives.

Venkataraman, Guhan Ram; Pineda, Arturo Lopez; Bear Don't Walk Iv, Oliver J; Zehnder, Ashley M; Ayyar, Sandeep; Page, Rodney L; Bustamante, Carlos D; Rivas, Manuel A.

PLoS One ; 15(6): e0234647, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32569327

RESUMO

Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the "neoplasia" category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.

Assuntos

Mineração de Dados , Medicina Narrativa , Software , Animais , Automação , Bases de Dados como Assunto , Humanos , Reprodutibilidade dos Testes , Especificidade da Espécie

Detecting Social and Behavioral Determinants of Health with Structured and Free-Text Clinical Data.

Feller, Daniel J; Bear Don't Walk Iv, Oliver J; Zucker, Jason; Yin, Michael T; Gordon, Peter; Elhadad, Noémie.

Appl Clin Inform ; 11(1): 172-181, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-32131117

RESUMO

BACKGROUND: Social and behavioral determinants of health (SBDH) are environmental and behavioral factors that often impede disease management and result in sexually transmitted infections. Despite their importance, SBDH are inconsistently documented in electronic health records (EHRs) and typically collected only in an unstructured format. Evidence suggests that structured data elements present in EHRs can contribute further to identify SBDH in the patient record. OBJECTIVE: Explore the automated inference of both the presence of SBDH documentation and individual SBDH risk factors in patient records. Compare the relative ability of clinical notes and structured EHR data, such as laboratory measurements and diagnoses, to support inference. METHODS: We attempt to infer the presence of SBDH documentation in patient records, as well as patient status of 11 SBDH, including alcohol abuse, homelessness, and sexual orientation. We compare classification performance when considering clinical notes only, structured data only, and notes and structured data together. We perform an error analysis across several SBDH risk factors. RESULTS: Classification models inferring the presence of SBDH documentation achieved good performance (F1 score: 92.7-78.7; F1 considered as the primary evaluation metric). Performance was variable for models inferring patient SBDH risk status; results ranged from F1 = 82.7 for LGBT (lesbian, gay, bisexual, and transgender) status to F1 = 28.5 for intravenous drug use. Error analysis demonstrated that lexical diversity and documentation of historical SBDH status challenge inference of patient SBDH status. Three of five classifiers inferring topic-specific SBDH documentation and 10 of 11 patient SBDH status classifiers achieved highest performance when trained using both clinical notes and structured data. CONCLUSION: Our findings suggest that combining clinical free-text notes and structured data provide the best approach in classifying patient SBDH status. Inferring patient SBDH status is most challenging among SBDH with low prevalence and high lexical diversity.

Assuntos

Documentação , Medição de Risco/métodos , Comportamento Social , Envio de Mensagens de Texto , Registros Eletrônicos de Saúde , Humanos , Fatores de Risco , Aprendizado de Máquina Supervisionado

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA