Pesquisa | Portal Regional da BVS (teste)

The Impact of Specialized Corpora for Word Embeddings in Natural Langage Understanding.

Neuraz, Antoine; Rance, Bastien; Garcelon, Nicolas; Llanos, Leonardo Campillos; Burgun, Anita; Rosset, Sophie.

Stud Health Technol Inform ; 270: 432-436, 2020 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-32570421

RESUMO

Recent studies in the biomedical domain suggest that learning statistical word representations (static or contextualized word embeddings) on large corpora of specialized data improve the results on downstream natural language processing (NLP) tasks. In this paper, we explore the impact of the data source of word representations on a natural language understanding task. We compared embeddings learned with Fasttext (static embedding) and ELMo (contextualized embedding) representations, learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for the two sub-tasks (+7% and +4% of gain in F1-score). Moreover, ELMo representations were trained with only a fraction of the data used for Fasttext.

Assuntos

Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Idioma , Unified Medical Language System

Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task?

Neuraz, Antoine; Looten, Vincent; Rance, Bastien; Daniel, Nicolas; Garcelon, Nicolas; Llanos, Leonardo Campillos; Burgun, Anita; Rosset, Sophie.

Stud Health Technol Inform ; 264: 1558-1559, 2019 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-31438230

RESUMO

We explore the impact of data source on word representations for different NLP tasks in the clinical domain in French (natural language understanding and text classification). We compared word embeddings (Fasttext) and language models (ELMo), learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for one of the two tasks(+7% and +8% of gain in F1-score).

Assuntos

Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Técnicas Histológicas , Idioma

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA