Pesquisa | Portal Regional da BVS (teste)

Leveraging Wikipedia knowledge to classify multilingual biomedical documents.

Antonio Mouriño García, Marcos; Pérez Rodríguez, Roberto; Anido Rifón, Luis.

Artif Intell Med ; 88: 37-57, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29730047

RESUMO

This article presents a classifier that leverages Wikipedia knowledge to represent documents as vectors of concepts weights, and analyses its suitability for classifying biomedical documents written in any language when it is trained only with English documents. We propose the cross-language concept matching technique, which relies on Wikipedia interlanguage links to convert concept vectors between languages. The performance of the classifier is compared to a classifier based on machine translation, and two classifiers based on MetaMap. To perform the experiments, we created two multilingual corpus. The first one, Multi-Lingual UVigoMED (ML-UVigoMED) is composed of 23,647 Wikipedia documents about biomedical topics written in English, German, French, Spanish, Italian, Galician, Romanian, and Icelandic. The second one, English-French-Spanish-German UVigoMED (EFSG-UVigoMED) is composed of 19,210 biomedical abstract extracted from MEDLINE written in English, French, Spanish, and German. The performance of the approach proposed is superior to any of the state-of-the art classifier in the benchmark. We conclude that leveraging Wikipedia knowledge is of great advantage in tasks of multilingual classification of biomedical documents.

Assuntos

Pesquisa Biomédica/classificação , Mineração de Dados/métodos , Documentação/classificação , Enciclopédias como Assunto , Bases de Conhecimento , Multilinguismo , Processamento de Linguagem Natural , Semântica , Humanos

A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge*. Spanish-English Cross-language Case Study.

Mouriño-García, Marcos A; Pérez-Rodríguez, Roberto; Anido-Rifón, Luis E.

Methods Inf Med ; 56(5): 370-376, 2017 10 26.

Artigo em Inglês | MEDLINE | ID: mdl-28816337

RESUMO

OBJECTIVES: The ability to efficiently review the existing literature is essential for the rapid progress of research. This paper describes a classifier of text documents, represented as vectors in spaces of Wikipedia concepts, and analyses its suitability for classification of Spanish biomedical documents when only English documents are available for training. We propose the cross-language concept matching (CLCM) technique, which relies on Wikipedia interlanguage links to convert concept vectors from the Spanish to the English space. METHODS: The performance of the classifier is compared to several baselines: a classifier based on machine translation, a classifier that represents documents after performing Explicit Semantic Analysis (ESA), and a classifier that uses a domain-specific semantic annotator (MetaMap). The corpus used for the experiments (Cross-Language UVigoMED) was purpose-built for this study, and it is composed of 12,832 English and 2,184 Spanish MEDLINE abstracts. RESULTS: The performance of our approach is superior to any other state-of-the art classifier in the benchmark, with performance increases up to: 124% over classical machine translation, 332% over MetaMap, and 60 times over the classifier based on ESA. The results have statistical significance, showing p-values < 0.0001. CONCLUSION: Using knowledge mined from Wikipedia to represent documents as vectors in a space of Wikipedia concepts and translating vectors between language-specific concept spaces, a cross-language classifier can be built, and it performs better than several state-of-the-art classifiers.

Design process and preliminary psychometric study of a video game to detect cognitive impairment in senior adults.

Valladares-Rodriguez, Sonia; Perez-Rodriguez, Roberto; Facal, David; Fernandez-Iglesias, Manuel J; Anido-Rifon, Luis; Mouriño-Garcia, Marcos.

PeerJ ; 5: e3508, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28674661

RESUMO

INTRODUCTION: Assessment of episodic memory has been traditionally used to evaluate potential cognitive impairments in senior adults. Typically, episodic memory evaluation is based on personal interviews and pen-and-paper tests. This article presents the design, development and a preliminary validation of a novel digital game to assess episodic memory intended to overcome the limitations of traditional methods, such as the cost of its administration, its intrusive character, the lack of early detection capabilities, the lack of ecological validity, the learning effect and the existence of confounding factors. MATERIALS AND METHODS: Our proposal is based on the gamification of the California Verbal Learning Test (CVLT) and it has been designed to comply with the psychometric characteristics of reliability and validity. Two qualitative focus groups and a first pilot experiment were carried out to validate the proposal. RESULTS: A more ecological, non-intrusive and better administrable tool to perform cognitive assessment was developed. Initial evidence from the focus groups and pilot experiment confirmed the developed game's usability and offered promising results insofar its psychometric validity is concerned. Moreover, the potential of this game for the cognitive classification of senior adults was confirmed, and administration time is dramatically reduced with respect to pen-and-paper tests. LIMITATIONS: Additional research is needed to improve the resolution of the game for the identification of specific cognitive impairments, as well as to achieve a complete validation of the psychometric properties of the digital game. CONCLUSION: Initial evidence show that serious games can be used as an instrument to assess the cognitive status of senior adults, and even to predict the onset of mild cognitive impairments or Alzheimer's disease.

Biomedical literature classification using encyclopedic knowledge: a Wikipedia-based bag-of-concepts approach.

Mouriño García, Marcos Antonio; Pérez Rodríguez, Roberto; Anido Rifón, Luis E.

PeerJ ; 3: e1279, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26468436

RESUMO

Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria-that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text-thus suffering from synonymy and polysemy-and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge-concretely Wikipedia-in order to create bag-of-concepts (BoC) representations of documents, understanding concept as "unit of meaning", and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA