Pesquisa | Portal Regional da BVS (teste)

GENIA corpus--semantically annotated corpus for bio-textmining.

Kim, J-D; Ohta, T; Tateisi, Y; Tsujii, J.

Bioinformatics ; 19 Suppl 1: i180-2, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-12855455

RESUMO

MOTIVATION: Natural language processing (NLP) methods are regarded as being useful to raise the potential of text mining from biological literature. The lack of an extensively annotated corpus of this literature, however, causes a major bottleneck for applying NLP techniques. GENIA corpus is being developed to provide reference materials to let NLP techniques work for bio-textmining. RESULTS: GENIA corpus version 3.0 consisting of 2000 MEDLINE abstracts has been released with more than 400,000 words and almost 100,000 annotations for biological terms.

Assuntos

Indexação e Redação de Resumos/métodos , Biologia/métodos , Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Terminologia como Assunto , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Documentação , MEDLINE

Event extraction from biomedical papers using a full parser.

Yakushiji, A; Tateisi, Y; Miyao, Y; Tsujii, J.

Pac Symp Biocomput ; : 408-19, 2001.

Artigo em Inglês | MEDLINE | ID: mdl-11262959

RESUMO

We have designed and implemented an information extraction system using a full parser to investigate the plausibility of full analysis of text using general-purpose parser and grammar applied to biomedical domain. We partially solved the problems of full parsing of inefficiency, ambiguity, and low coverage by introducing the preprocessors, and proposed the use of modules that handles partial results of parsing for further improvement. Our approach makes it possible to modularize the system, so that the IE system as a whole becomes easy to be tuned to specific domains, and easy to be maintained and improved by incorporating various techniques of disambiguation, speed up, etc. In preliminary experiment, from 133 argument structures that should be extracted from 97 sentences, we obtained 23% uniquely and 24% with ambiguity. And 20% are extractable from not complete but partial results of full parsing.

Assuntos

Processamento de Linguagem Natural , Bases de Dados Factuais , Processamento Eletrônico de Dados

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA