Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Brief Bioinform ; 13(4): 460-94, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22833496

ABSTRACT

This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.


Subject(s)
Data Mining/methods , Pharmacogenetics , Data Collection , Information Storage and Retrieval , Natural Language Processing , Publications , Semantics
2.
Database (Oxford) ; 2012: bas021, 2012.
Article in English | MEDLINE | ID: mdl-22529178

ABSTRACT

The need for efficient text-mining tools that support curation of the biomedical literature is ever increasing. In this article, we describe an experiment aimed at verifying whether a text-mining tool capable of extracting meaningful relationships among domain entities can be successfully integrated into the curation workflow of a major biological database. We evaluate in particular (i) the usability of the system's interface, as perceived by users, and (ii) the correlation of the ranking of interactions, as provided by the text-mining system, with the choices of the curators.


Subject(s)
Data Mining/methods , Database Management Systems , Databases, Factual , Pharmacogenetics , Abstracting and Indexing , Biomedical Research , Reproducibility of Results , User-Computer Interface
3.
Pac Symp Biocomput ; : 410-21, 2012.
Article in English | MEDLINE | ID: mdl-22174296

ABSTRACT

Drug-drug interactions (DDIs) can occur when two drugs interact with the same gene product. Most available information about gene-drug relationships is contained within the scientific literature, but is dispersed over a large number of publications, with thousands of new publications added each month. In this setting, automated text mining is an attractive solution for identifying gene-drug relationships and aggregating them to predict novel DDIs. In previous work, we have shown that gene-drug interactions can be extracted from Medline abstracts with high fidelity - we extract not only the genes and drugs, but also the type of relationship expressed in individual sentences (e.g. metabolize, inhibit, activate and many others). We normalize these relationships and map them to a standardized ontology. In this work, we hypothesize that we can combine these normalized gene-drug relationships, drawn from a very broad and diverse literature, to infer DDIs. Using a training set of established DDIs, we have trained a random forest classifier to score potential DDIs based on the features of the normalized assertions extracted from the literature that relate two drugs to a gene product. The classifier recognizes the combinations of relationships, drugs and genes that are most associated with the gold standard DDIs, correctly identifying 79.8% of assertions relating interacting drug pairs and 78.9% of assertions relating noninteracting drug pairs. Most significantly, because our text processing method captures the semantics of individual gene-drug relationships, we can construct mechanistic pharmacological explanations for the newly-proposed DDIs. We show how our classifier can be used to explain known DDIs and to uncover new DDIs that have not yet been reported.


Subject(s)
Data Mining/methods , Drug Interactions , Algorithms , Aryl Hydrocarbon Hydroxylases/genetics , Aryl Hydrocarbon Hydroxylases/metabolism , Computational Biology , Cytochrome P-450 CYP2C9 , Cytochrome P-450 CYP3A/genetics , Cytochrome P-450 CYP3A/metabolism , Humans , Knowledge Bases , MEDLINE , Pharmacogenetics/statistics & numerical data , Verapamil/metabolism , Warfarin/metabolism
4.
Biomark Med ; 5(6): 795-806, 2011 Dec.
Article in English | MEDLINE | ID: mdl-22103613

ABSTRACT

The mission of the Pharmacogenomics Knowledge Base (PharmGKB; www.pharmgkb.org ) is to collect, encode and disseminate knowledge about the impact of human genetic variations on drug responses. It is an important worldwide resource of clinical pharmacogenomic biomarkers available to all. The PharmGKB website has evolved to highlight our knowledge curation and aggregation over our previous emphasis on collecting primary data. This review summarizes the methods we use to drive this expanded scope of 'Knowledge Acquisition to Clinical Applications', the new features available on our website and our future goals.


Subject(s)
Biomarkers/metabolism , Databases, Factual , Pharmacogenetics , Genetic Variation , Humans , Internet , Knowledge Bases
5.
J Biomed Semantics ; 2 Suppl 2: S10, 2011 May 17.
Article in English | MEDLINE | ID: mdl-21624156

ABSTRACT

BACKGROUND: Advances in Natural Language Processing (NLP) techniques enable the extraction of fine-grained relationships mentioned in biomedical text. The variability and the complexity of natural language in expressing similar relationships causes the extracted relationships to be highly heterogeneous, which makes the construction of knowledge bases difficult and poses a challenge in using these for data mining or question answering. RESULTS: We report on the semi-automatic construction of the PHARE relationship ontology (the PHArmacogenomic RElationships Ontology) consisting of 200 curated relations from over 40,000 heterogeneous relationships extracted via text-mining. These heterogeneous relations are then mapped to the PHARE ontology using synonyms, entity descriptions and hierarchies of entities and roles. Once mapped, relationships can be normalized and compared using the structure of the ontology to identify relationships that have similar semantics but different syntax. We compare and contrast the manual procedure with a fully automated approach using WordNet to quantify the degree of integration enabled by iterative curation and refinement of the PHARE ontology. The result of such integration is a repository of normalized biomedical relationships, named PHARE-KB, which can be queried using Semantic Web technologies such as SPARQL and can be visualized in the form of a biological network. CONCLUSIONS: The PHARE ontology serves as a common semantic framework to integrate more than 40,000 relationships pertinent to pharmacogenomics. The PHARE ontology forms the foundation of a knowledge base named PHARE-KB. Once populated with relationships, PHARE-KB (i) can be visualized in the form of a biological network to guide human tasks such as database curation and (ii) can be queried programmatically to guide bioinformatics applications such as the prediction of molecular interactions. PHARE is available at http://purl.bioontology.org/ontology/PHARE.

6.
Pharmacogenomics ; 11(10): 1467-89, 2010 Oct.
Article in English | MEDLINE | ID: mdl-21047206

ABSTRACT

The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications.


Subject(s)
Data Mining/trends , Pharmacogenetics/methods , Animals , Data Mining/methods , Databases, Genetic/trends , Humans , Information Storage and Retrieval/methods , Information Storage and Retrieval/trends , Pharmacogenetics/statistics & numerical data , Pharmacogenetics/trends
7.
J Biomed Inform ; 43(6): 1009-19, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20723615

ABSTRACT

Most pharmacogenomics knowledge is contained in the text of published studies, and is thus not available for automated computation. Natural Language Processing (NLP) techniques for extracting relationships in specific domains often rely on hand-built rules and domain-specific ontologies to achieve good performance. In a new and evolving field such as pharmacogenomics (PGx), rules and ontologies may not be available. Recent progress in syntactic NLP parsing in the context of a large corpus of pharmacogenomics text provides new opportunities for automated relationship extraction. We describe an ontology of PGx relationships built starting from a lexicon of key pharmacogenomic entities and a syntactic parse of more than 87 million sentences from 17 million MEDLINE abstracts. We used the syntactic structure of PGx statements to systematically extract commonly occurring relationships and to map them to a common schema. Our extracted relationships have a 70-87.7% precision and involve not only key PGx entities such as genes, drugs, and phenotypes (e.g., VKORC1, warfarin, clotting disorder), but also critical entities that are frequently modified by these key entities (e.g., VKORC1 polymorphism, warfarin response, clotting disorder treatment). The result of our analysis is a network of 40,000 relationships between more than 200 entity types with clear semantics. This network is used to guide the curation of PGx knowledge and provide a computable resource for knowledge discovery.


Subject(s)
Pharmacogenetics/methods , Semantics , Databases, Factual , MEDLINE , Natural Language Processing , Terminology as Topic , United States
9.
Pac Symp Biocomput ; : 305-14, 2010.
Article in English | MEDLINE | ID: mdl-19908383

ABSTRACT

A critical goal of pharmacogenomics research is to identify genes that can explain variation in drug response. We have previously reported a method that creates a genome-scale ranking of genes likely to interact with a drug. The algorithm uses information about drug structure and indications of use to rank the genes. Although the algorithm has good performance, its performance depends on a curated set of drug-gene relationships that is expensive to create and difficult to maintain. In this work, we assess the utility of text mining in extracting a network of drug-gene relationships automatically. This provides a valuable aggregate source of knowledge, subsequently used as input into the algorithm that ranks potential pharmacogenes. Using a drug-gene network created from sentence-level co-occurrence in the full text of scientific articles, we compared the performance to that of a network created by manual curation of those articles. Under a wide range of conditions, we show that a knowledge base derived from text-mining the literature performs as well as, and sometimes better than, a high-quality, manually curated knowledge base. We conclude that we can use relationships mined automatically from the literature as a knowledgebase for pharmacogenomics relationships. Additionally, when relationships are missed by text mining, our system can accurately extrapolate new relationships with 77.4% precision.


Subject(s)
Pharmacogenetics/statistics & numerical data , Algorithms , Computational Biology , Data Mining/statistics & numerical data , Humans , Knowledge Bases
10.
BMC Bioinformatics ; 10 Suppl 2: S6, 2009 Feb 05.
Article in English | MEDLINE | ID: mdl-19208194

ABSTRACT

BACKGROUND: Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities--particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases) are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations. RESULTS: Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively. CONCLUSION: Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at http://pharmspresso.stanford.edu.


Subject(s)
Pharmacogenetics/methods , Software , Computational Biology/methods , Databases, Genetic , Information Storage and Retrieval/methods , Internet
11.
Pac Symp Biocomput ; : 439-50, 2009.
Article in English | MEDLINE | ID: mdl-19209721

ABSTRACT

The immune system of higher organisms is, by any standard, complex. To date, using reductionist techniques, immunologists have elucidated many of the basic principles of how the immune system functions, yet our understanding is still far from complete. In an era of high throughput measurements, it is already clear that the scientific knowledge we have accumulated has itself grown larger than our ability to cope with it, and thus it is increasingly important to develop bioinformatics tools with which to navigate the complexity of the information that is available to us. Here, we describe ImmuneXpresso, an information extraction system, tailored for parsing the primary literature of immunology and relating it to experimental data. The immune system is very much dependent on the interactions of various white blood cells with each other, either in synaptic contacts, at a distance using cytokines or chemokines, or both. Therefore, as a first approximation, we used ImmuneXpresso to create a literature derived network of interactions between cells and cytokines. Integration of cell-specific gene expression data facilitates cross-validation of cytokine mediated cell-cell interactions and suggests novel interactions. We evaluate the performance of our automatically generated multi-scale model against existing manually curated data, and show how this system can be used to guide experimentalists in interpreting multi-scale, experimental data. Our methodology is scalable and can be generalized to other systems.


Subject(s)
Cell Communication/immunology , Cytokines/immunology , Immune System/physiology , Knowledge Bases , Animals , Biometry , CD4-Positive T-Lymphocytes/immunology , Cytokines/blood , Databases, Factual , Female , Gene Expression Profiling/statistics & numerical data , Humans , Lymphocyte Subsets/immunology , Male , Mice
12.
Stud Health Technol Inform ; 129(Pt 1): 550-4, 2007.
Article in English | MEDLINE | ID: mdl-17911777

ABSTRACT

In order to make more informed healthcare decisions, consumers need information systems that deliver accurate and reliable information about their illnesses and potential treatments. Reports of randomized clinical trials (RCTs) provide reliable medical evidence about the efficacy of treatments. Current methods to access, search for, and retrieve RCTs are keyword-based, time-consuming, and suffer from poor precision. Personalized semantic search and medical evidence summarization aim to solve this problem. The performance of these approaches may improve if they have access to study subject descriptors (e.g. age, gender, and ethnicity), trial sizes, and diseases/symptoms studied. We have developed a novel method to automatically extract such subject demographic information from RCT abstracts. We used text classification augmented with a Hidden Markov Model to identify sentences containing subject demographics, and subsequently these sentences were parsed using Natural Language Processing techniques to extract relevant information. Our results show accuracy levels of 82.5%, 92.5%, and 92.0% for extraction of subject descriptors, trial sizes, and diseases/symptoms descriptors respectively.


Subject(s)
Abstracting and Indexing , Information Storage and Retrieval/methods , Natural Language Processing , Markov Chains , Randomized Controlled Trials as Topic
13.
AMIA Annu Symp Proc ; : 443-7, 2007 Oct 11.
Article in English | MEDLINE | ID: mdl-18693875

ABSTRACT

Oncologists managing cancer patients use radiology imaging studies to evaluate changes in measurable cancer lesions. Currently, the textual radiology report summarizes the findings, but is disconnected from the primary image data. This makes it difficult for the physician to obtain a visual overview of the location and behavior of the disease. LesionViewer is a prototype software system designed to assist clinicians in comprehending and reviewing radiology imaging studies. The interface provides an Anatomical Summary View of the location of lesions identified in a series of studies, and direct navigation to the relevant primary image data. LesionViewer's Disease Summary View provides a temporal abstraction of the disease behavior between studies utilizing methods of the RECIST guideline. In a usability study, nine physicians used the system to accurately perform clinical tasks appropriate to the analysis of radiology reports and image data. All users reported they would use the system if available.


Subject(s)
Neoplasms/pathology , Radiographic Image Interpretation, Computer-Assisted , Algorithms , Attitude of Health Personnel , Clinical Competence , Humans , Neoplasms/diagnostic imaging , Radiology Information Systems , Software
14.
AMIA Annu Symp Proc ; : 175-9, 2006.
Article in English | MEDLINE | ID: mdl-17238326

ABSTRACT

Numerous health decision aids (HDAs) have been developed to increase the participation of patients in shared decision-making, but many have limited accessibility and narrow applicability in clinical care. In the Health e-Decision project, we address these limitations in our work on building general HDAs targeted for older adults. Our approach uses a decision-support software architecture that enables principled methods for HDAs. We have formalized a novel knowledge-based decision model (KBDM), using Protégé OWL, that developers and clinicians can instantiate to tailor the components of the architecture for a particular health problem. In this paper, we present the methods used in the architecture and the knowledgebase design; the latter encompasses influence-diagram concepts, specific health problems, health outcome states, and probabilistic relationships. We discuss how this approach improves upon prior HDA methods. We also show that our use of computer-interpretable knowledge provides a structured, customizable means of enabling patient-centered decision support.


Subject(s)
Decision Support Techniques , Knowledge Bases , Patient Participation , Software , Aged , Decision Making, Computer-Assisted , Humans , Medical Records Systems, Computerized
15.
Nucleic Acids Res ; 33(2): 605-15, 2005.
Article in English | MEDLINE | ID: mdl-15684410

ABSTRACT

Deciphering gene regulatory network architecture amounts to the identification of the regulators, conditions in which they act, genes they regulate, cis-acting motifs they bind, expression profiles they dictate and more complex relationships between alternative regulatory partnerships and alternative regulatory motifs that give rise to sub-modalities of expression profiles. The 'location data' in yeast is a comprehensive resource that provides transcription factor-DNA interaction information in vivo. Here, we provide two contributions: first, we developed means to assess the extent of noise in the location data, and consequently for extracting signals from it. Second, we couple signal extraction with better characterization of the genetic network architecture. We apply two methods for the detection of combinatorial associations between transcription factors (TFs), the integration of which provides a global map of combinatorial regulatory interactions. We discover the capacity of regulatory motifs and TF partnerships to dictate fine-tuned expression patterns of subsets of genes, which are clearly distinct from those displayed by most genes assigned to the same TF. Our findings provide carefully prioritized, high-quality assignments between regulators and regulated genes and as such should prove useful for experimental and computational biologists alike.


Subject(s)
Computational Biology/methods , DNA-Binding Proteins/metabolism , Gene Expression Regulation , Genomics/methods , Transcription Factors/metabolism , Binding Sites , DNA-Binding Proteins/analysis , Data Interpretation, Statistical , Fungal Proteins/metabolism , Gene Expression Profiling , Genome , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , Transcription Factors/analysis , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...