Search | VHL Regional Portal

The MiPACQ clinical question answering system.

Cairns, Brian L; Nielsen, Rodney D; Masanz, James J; Martin, James H; Palmer, Martha S; Ward, Wayne H; Savova, Guergana K.

AMIA Annu Symp Proc ; 2011: 171-80, 2011.

Article in English | MEDLINE | ID: mdl-22195068

ABSTRACT

The Multi-source Integrated Platform for Answering Clinical Questions (MiPACQ) is a QA pipeline that integrates a variety of information retrieval and natural language processing systems into an extensible question answering system. We present the system's architecture and an evaluation of MiPACQ on a human-annotated evaluation dataset based on the Medpedia health and medical encyclopedia. Compared with our baseline information retrieval system, the MiPACQ rule-based system demonstrates 84% improvement in Precision at One and the MiPACQ machine-learning-based system demonstrates 134% improvement. Other performance metrics including mean reciprocal rank and area under the precision/recall curves also showed significant improvement, validating the effectiveness of the MiPACQ design and implementation.

Subject(s)

Electronic Health Records , Natural Language Processing , Search Engine , Software , Artificial Intelligence , Computer Systems , Humans , Information Systems

Semantic role labeling for protein transport predicates.

Bethard, Steven; Lu, Zhiyong; Martin, James H; Hunter, Lawrence.

BMC Bioinformatics ; 9: 277, 2008 Jun 11.

Article in English | MEDLINE | ID: mdl-18547432

ABSTRACT

BACKGROUND: Automatic semantic role labeling (SRL) is a natural language processing (NLP) technique that maps sentences to semantic representations. This technique has been widely studied in the recent years, but mostly with data in newswire domains. Here, we report on a SRL model for identifying the semantic roles of biomedical predicates describing protein transport in GeneRIFs - manually curated sentences focusing on gene functions. To avoid the computational cost of syntactic parsing, and because the boundaries of our protein transport roles often did not match up with syntactic phrase boundaries, we approached this problem with a word-chunking paradigm and trained support vector machine classifiers to classify words as being at the beginning, inside or outside of a protein transport role. RESULTS: We collected a set of 837 GeneRIFs describing movements of proteins between cellular components, whose predicates were annotated for the semantic roles AGENT, PATIENT, ORIGIN and DESTINATION. We trained these models with the features of previous word-chunking models, features adapted from phrase-chunking models, and features derived from an analysis of our data. Our models were able to label protein transport semantic roles with 87.6% precision and 79.0% recall when using manually annotated protein boundaries, and 87.0% precision and 74.5% recall when using automatically identified ones. CONCLUSION: We successfully adapted the word-chunking classification paradigm to semantic role labeling, applying it to a new domain with predicates completely absent from any previous studies. By combining the traditional word and phrasal role labeling features with biomedical features like protein boundaries and MEDPOST part of speech tags, we were able to address the challenges posed by the new domain data and subsequently build robust models that achieved F-measures as high as 83.1. This system for extracting protein transport information from GeneRIFs performs well even with proteins identified automatically, and is therefore more robust than the rule-based methods previously used to extract protein transport roles.

Subject(s)

Abstracting and Indexing/methods , Computational Biology/methods , Genes/physiology , Natural Language Processing , Protein Transport/genetics , Animals , Classification/methods , Humans , Information Storage and Retrieval/methods , Information Theory , Pattern Recognition, Automated/methods , Semantics , Terminology as Topic , Vocabulary, Controlled

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL