Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
J Biomed Semantics ; 13(1): 13, 2022 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-35527259

RESUMO

BACKGROUND: The high volume of research focusing on extracting patient information from electronic health records (EHRs) has led to an increase in the demand for annotated corpora, which are a precious resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multipurpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field. METHODS: In this study, a semantically annotated corpus was developed using clinical text from multiple medical specialties, document types, and institutions. In addition, we present, (1) a survey listing common aspects, differences, and lessons learned from previous research, (2) a fine-grained annotation schema that can be replicated to guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations. RESULTS: This study resulted in SemClinBr, a corpus that has 1000 clinical notes, labeled with 65,117 entities and 11,263 relations. In addition, both negation cues and medical abbreviation dictionaries were generated from the annotations. The average annotator agreement score varied from 0.71 (applying strict match) to 0.92 (considering a relaxed match) while accepting partial overlaps and hierarchically related semantic types. The extrinsic evaluation, when applying the corpus to two downstream NLP tasks, demonstrated the reliability and usefulness of annotations, with the systems achieving results that were consistent with the agreement scores. CONCLUSION: The SemClinBr corpus and other resources produced in this work can support clinical NLP studies, providing a common development and evaluation resource for the research community, boosting the utilization of EHRs in both clinical practice and biomedical research. To the best of our knowledge, SemClinBr is the first available Portuguese clinical corpus.


Assuntos
Medicina , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Humanos , Portugal , Reprodutibilidade dos Testes
2.
Braz. arch. biol. technol ; 64(spe): e21210142, 2021. tab, graf
Artigo em Inglês | LILACS-Express | LILACS | ID: biblio-1350282

RESUMO

Abstract Sepsis is a systematic response to an infectious disease, being a concerning factor because of the increase in the mortality ratio for every delayed hour in the identification and start of patient's treatment. Studies that aim to identify sepsis early are valuable for the healthcare domain. Further, studies that propose machine learning-based models to identify sepsis risk are scarce for the Brazilian scenario. Hence, we propose the early identification of sepsis considering data from a Brazilian hospital. We developed a temporal series based on LSTM to predict sepsis in patients considering a three-day timestep. The patients were selected using both criteria, ICD-10, and qSOFA, where we supplemented qSOFA with the additional identification of words referring to infections in the clinical texts. Additionally, we tested a Random Forest classifier to classify patients with sepsis with a single timestep before the sepsis event, evaluating the most relevant features. We achieved an accuracy of 0.907, a sensitivity of 0.912, and a specificity of 0.971 when considering a three-day timestep with LSTM. The Random Forest classifier achieved an accuracy of 0.971, a sensitivity of 0.611, and a specificity of 0.998. The features age, blood glucose, systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, and admission days had the most influence over the algorithm classification, with age being the most relevant feature. We achieved satisfactory results compared with the literature considering a scenario of spaced measures and a high amount of missing data.

3.
Stud Health Technol Inform ; 264: 123-127, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31437898

RESUMO

In this paper, we trained a set of Portuguese clinical word embedding models of different granularities from multi-specialty and multi-institutional clinical narrative datasets. Then, we assessed their impact on a downstream biomedical NLP task of Urinary Tract Infection disease identification. Additionally, we intrinsically evaluated our main model using an adapted version of Bio-SimLex for the Portuguese language. Our empirical results showed that the larger, coarse-grained model achieved a slightly better outcome when compared with the small, fine-grained model in the proposed task. Moreover, we obtained satisfactory results with Bio-SimLex intrinsic evaluation.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural , Idioma , Narração , Portugal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...