Pesquisa | Portal Regional da BVS (teste)

Exploring the boundaries: gene and protein identification in biomedical text.

Finkel, Jenny; Dingare, Shipra; Manning, Christopher D; Nissim, Malvina; Alex, Beatrice; Grover, Claire.

BMC Bioinformatics ; 6 Suppl 1: S5, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-15960839

RESUMO

BACKGROUND: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. METHODS: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. RESULTS: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. CONCLUSION: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.

Assuntos

Pesquisa Biomédica/classificação , Genes , Literatura , Proteínas/classificação , Pesquisa Biomédica/métodos , Biologia Computacional/classificação , Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/classificação , Armazenamento e Recuperação da Informação/métodos , Terminologia como Assunto

A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations.

Dingare, Shipra; Nissim, Malvina; Finkel, Jenny; Manning, Christopher; Grover, Claire.

Comp Funct Genomics ; 6(1-2): 77-85, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-18629295

RESUMO

We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA