Pesquisa | Portal Regional da BVS (teste)

Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.

Oleynik, Michel; Kugic, Amila; Kasác, Zdenko; Kreuzthaler, Markus.

J Am Med Inform Assoc ; 26(11): 1247-1254, 2019 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-31512729

RESUMO

OBJECTIVE: Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to evaluate shallow and deep learning text classifiers and the impact of pretrained embeddings in a small clinical dataset. MATERIALS AND METHODS: We participated in the 2018 National NLP Clinical Challenges (n2c2) Shared Task on cohort selection and received an annotated dataset with medical narratives of 202 patients for multilabel binary text classification. We set our baseline to a majority classifier, to which we compared a rule-based classifier and orthogonal machine learning strategies: support vector machines, logistic regression, and long short-term memory neural networks. We evaluated logistic regression and long short-term memory using both self-trained and pretrained BioWordVec word embeddings as input representation schemes. RESULTS: Rule-based classifier showed the highest overall micro F1 score (0.9100), with which we finished first in the challenge. Shallow machine learning strategies showed lower overall micro F1 scores, but still higher than deep learning strategies and the baseline. We could not show a difference in classification efficiency between self-trained and pretrained embeddings. DISCUSSION: Clinical context, negation, and value-based criteria hindered shallow machine learning approaches, while deep learning strategies could not capture the term diversity due to the small training dataset. CONCLUSION: Shallow methods for clinical phenotyping can still outperform deep learning methods in small imbalanced data, even when supported by pretrained embeddings.

Assuntos

Ensaios Clínicos como Assunto/métodos , Mineração de Dados/métodos , Aprendizado de Máquina , Processamento de Linguagem Natural , Seleção de Pacientes , Classificação , Aprendizado Profundo , Humanos , Modelos Logísticos , Redes Neurais de Computação

Building an Experimental German User Interface Terminology Linked to SNOMED CT.

Hashemian Nik, David; Kasác, Zdenko; Goda, Zsófia; Semlitsch, Anita; Schulz, Stefan.

Stud Health Technol Inform ; 264: 153-157, 2019 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-31437904

RESUMO

We describe the process of creating a User Interface Terminology (UIT) with the goal to generate a maximum of German language interface terms that are mapped to the reference terminology SNOMED CT. The purpose is to offer a high coverage of medical jargon in order to optimise semantic annotations of clinical documents by text mining systems. The first step consisted in the creation of an n-gram table to which words and short phrases from the English SNOMED CT description table were automatically extracted and entered. The second step was to fill up the n-gram table with human and machine translations, manually enriched by POS tags. Top-down and bottom-up methods for manual terminology population were used. Grammar rules were formulated and embedded into a term generator, which then created one-to-many German variants per SNOMED CT description. Currently, the German user interface terminology contains 4,425,948 entries, created out of 111,605 German n-grams, assigned to 95,298 English n-grams. With 341,105 active concepts and 542,462 (non FSN) descriptions, it corresponds to an average of 13 interface terms per concept and 8.2 per description. An analysis of the current quality of this resource by blinded human assessment terminology states equivalence regarding term understandability compared to a fully automated Web-based translator, which, however does not yield any synonyms, so that there are good reasons to further develop this semi-automated terminology engineering method and recommend it for other language pairs.

Assuntos

Semântica , Systematized Nomenclature of Medicine , Mineração de Dados , Humanos

Analysis of Historical Medical Phenomena Using Large N-Gram Corpora.

Kasác, Zdenko; Schulz, Stefan.

Stud Health Technol Inform ; 245: 437-441, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29295132

RESUMO

Historically, numerous indirect references to real world phenomena have been conserved in literature. High-quality libraries of digitized books and their derivatives (like the Google NGram Viewer) have proliferated. These tools simplify the visualization of trends in phrase usage within the collective memory of language groups. A straightforward interpretation of these frequency changes is, however, too simplistic to draw conclusions about the underlying reality because it is affected by several sources of bias. Although these resources have been studied in social sciences and psychology, there is still lack of user-friendly, yet rigorous methods for analysis of phenomena relevant for medicine. We present a methodological framework to study relationships of observable phenomena quantitatively over periods, which span over centuries. We discuss its suitability for knowledge extraction from current and future large-scale, book-derived, n-gram collections.

Assuntos

Livros , Estatística como Assunto , Humanos , Idioma , Medicina

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA