Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 7(7): e40946, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22911721

RESUMO

BACKGROUND: With the large amount of pharmacological and biological knowledge available in literature, finding novel drug indications for existing drugs using in silico approaches has become increasingly feasible. Typical literature-based approaches generate new hypotheses in the form of protein-protein interactions networks by means of linking concepts based on their cooccurrences within abstracts. However, this kind of approaches tends to generate too many hypotheses, and identifying new drug indications from large networks can be a time-consuming process. METHODOLOGY: In this work, we developed a method that acquires the necessary facts from literature and knowledge bases, and identifies new drug indications through automated reasoning. This is achieved by encoding the molecular effects caused by drug-target interactions and links to various diseases and drug mechanism as domain knowledge in AnsProlog, a declarative language that is useful for automated reasoning, including reasoning with incomplete information. Unlike other literature-based approaches, our approach is more fine-grained, especially in identifying indirect relationships for drug indications. CONCLUSION/SIGNIFICANCE: To evaluate the capability of our approach in inferring novel drug indications, we applied our method to 943 drugs from DrugBank and asked if any of these drugs have potential anti-cancer activities based on information on their targets and molecular interaction types alone. A total of 507 drugs were found to have the potential to be used for cancer treatments. Among the potential anti-cancer drugs, 67 out of 81 drugs (a recall of 82.7%) are indeed known cancer drugs. In addition, 144 out of 289 drugs (a recall of 49.8%) are non-cancer drugs that are currently tested in clinical trials for cancer treatments. These results suggest that our method is able to infer drug indications (original or alternative) based on their molecular targets and interactions alone and has the potential to discover novel drug indications for existing drugs.


Assuntos
Biologia Computacional/métodos , Simulação por Computador , Descoberta de Drogas/métodos , Inteligência Artificial , Bases de Dados Factuais , Estudos de Associação Genética , Humanos , Ligação Proteica
2.
J Biomed Inform ; 45(5): 842-50, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22564364

RESUMO

MOTIVATION: Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the "assumed average". Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. APPROACH: We automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. RESULTS: The performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. AVAILABILITY: http://bioai4core.fulton.asu.edu/snpshot.


Assuntos
Mineração de Dados/métodos , Bases de Dados Genéticas , Doença/genética , Farmacogenética/métodos , Polimorfismo de Nucleotídeo Único , Animais , Estudos de Associação Genética/métodos , Humanos , Bases de Conhecimento , Camundongos , PubMed , Ratos
3.
Bioinformatics ; 26(18): i547-53, 2010 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-20823320

RESUMO

MOTIVATION: Identifying drug-drug interactions (DDIs) is a critical process in drug administration and drug development. Clinical support tools often provide comprehensive lists of DDIs, but they usually lack the supporting scientific evidences and different tools can return inconsistent results. In this article, we propose a novel approach that integrates text mining and automated reasoning to derive DDIs. Through the extraction of various facts of drug metabolism, not only the DDIs that are explicitly mentioned in text can be extracted but also the potential interactions that can be inferred by reasoning. RESULTS: Our approach was able to find several potential DDIs that are not present in DrugBank. We manually evaluated these interactions based on their supporting evidences, and our analysis revealed that 81.3% of these interactions are determined to be correct. This suggests that our approach can uncover potential DDIs with scientific evidences explaining the mechanism of the interactions.


Assuntos
Mineração de Dados , Interações Medicamentosas , Bases de Dados Factuais , Enzimas/metabolismo , Estudos de Viabilidade , Humanos , Lógica , Processamento de Linguagem Natural , Preparações Farmacêuticas/administração & dosagem , Preparações Farmacêuticas/metabolismo
4.
Artigo em Inglês | MEDLINE | ID: mdl-20498514

RESUMO

Proteins and their interactions govern virtually all cellular processes, such as regulation, signaling, metabolism, and structure. Most experimental findings pertaining to such interactions are discussed in research papers, which, in turn, get curated by protein interaction databases. Authors, editors, and publishers benefit from efforts to alleviate the tasks of searching for relevant papers, evidence for physical interactions, and proper identifiers for each protein involved. The BioCreative II.5 community challenge addressed these tasks in a competition-style assessment to evaluate and compare different methodologies, to make aware of the increasing accuracy of automated methods, and to guide future implementations. In this paper, we present our approaches for protein-named entity recognition, including normalization, and for extraction of protein-protein interactions from full text. Our overall goal is to identify efficient individual components, and we compare various compositions to handle a single full-text article in between 10 seconds and 2 minutes. We propose strategies to transfer document-level annotations to the sentence-level, which allows for the creation of a more fine-grained training corpus; we use this corpus to automatically derive around 5,000 patterns. We rank sentences by relevance to the task of finding novel interactions with physical evidence, using a sentence classifier built from this training corpus. Heuristics for paraphrasing sentences help to further remove unnecessary information that might interfere with patterns, such as additional adjectives, clauses, or bracketed expressions. In BioCreative II.5, we achieved an f-score of 22 percent for finding protein interactions, and 43 percent for mapping proteins to UniProt IDs; disregarding species, f-scores are 30 percent and 55 percent, respectively. On average, our best-performing setup required around 2 minutes per full text. All data and pattern sets as well as Java classes that extend- - third-party software are available as supplementary information (see Appendix).


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Genéticas , Mapeamento de Interação de Proteínas/métodos , Bases de Dados de Proteínas , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Sociedades Científicas
5.
Pac Symp Biocomput ; : 465-76, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-19908398

RESUMO

Biological pathways are seen as highly critical in our understanding of the mechanism of biological functions. To collect information about pathways, manual curation has been the most popular method. However, pathway annotation is regarded as heavily time-consuming, as it requires expert curators to identify and collect information from different sources. Even with the pieces of biological facts and interactions collected from various sources, curators have to apply their biological knowledge to arrange the acquired interactions in such a way that together they perform a common biological function as a pathway. In this paper, we propose a novel approach for automated pathway synthesis that acquires facts from hand-curated knowledge bases. To comprehend the incompleteness of the knowledge bases, our approach also obtains facts through automated extraction from Medline abstracts. An essential component of our approach is to apply logical reasoning to the acquired facts based on the biological knowledge about pathways. By representing such biological knowledge, the reasoning component is capable of assigning ordering to the acquired facts and interactions that is necessary for pathway synthesis. We demonstrate the feasibility of our approach with the development of a system that synthesizes pharmacokinetic pathways. We evaluate our approach by reconstructing the existing pharmacokinetic pathways available in PharmGKB. Our results show that not only that our approach is capable of synthesizing these pathways but also uncovering information that is not available in the manually annotated pathways.


Assuntos
Farmacocinética , Inteligência Artificial , Carbamatos/farmacocinética , Biologia Computacional , Humanos , Bases de Conhecimento , MEDLINE , Redes e Vias Metabólicas , Modelos Biológicos , Piperidinas/farmacocinética , Pravastatina/farmacocinética , Biologia Sintética
6.
Pac Symp Biocomput ; : 87-98, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19209697

RESUMO

Curated biological knowledge of interactions and pathways is largely available from various databases, and network synthesis is a popular method to gain insight into the data. However, such data from curated databases presents a single view of the knowledge to the biologists, and it may not be suitable to researchers' specific needs. On the other hand, Medline abstracts are publicly accessible and encode the necessary information to synthesize different kinds of biological networks. In this paper, we propose a new paradigm in synthesizing biomolecular networks by allowing biologists to create their own networks through queries to a specialized database of Medline abstracts. With this approach, users can specify precisely what kind of information they want in the resulting networks. We demonstrate the feasibility of our approach in the synthesis of gene-drug, gene-disease and protein-protein interaction networks. We show that our approach is capable of synthesizing these networks with high precision and even finds relations that have yet to be curated in public databases. In addition, we demonstrate a scenario of recovering a drug-related pathway using our approach.


Assuntos
MEDLINE , Modelos Biológicos , Biometria , Bases de Dados Factuais , Doença/genética , Humanos , Processamento de Linguagem Natural , Farmacogenética/estatística & dados numéricos , Mapeamento de Interação de Proteínas/estatística & dados numéricos
7.
J Biomed Inform ; 42(1): 74-81, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18595779

RESUMO

We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.


Assuntos
Análise por Conglomerados , Lógica Fuzzy , Perfilação da Expressão Gênica/métodos , Genes/fisiologia , Software , Algoritmos , Biologia Computacional , Bases de Dados Genéticas , Genes Fúngicos/fisiologia , Internet , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética
8.
Pac Symp Biocomput ; : 28-39, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17992743

RESUMO

MOTIVATION: The promises of the post-genome era disease-related discoveries and advances have yet to be fully realized, with many opportunities for discovery hiding in the millions of biomedical papers published since. Public databases give access to data extracted from the literature by teams of experts, but their coverage is often limited and lags behind recent discoveries. We present a computational method that combines data extracted from the literature with data from curated sources in order to uncover possible gene-disease relationships that are not directly stated or were missed by the initial mining. METHOD: An initial set of genes and proteins is obtained from gene-disease relationships extracted from PubMed abstracts using natural language processing. Interactions involving the corresponding proteins are similarly extracted and integrated with interactions from curated databases (such as BIND and DIP), assigning a confidence measure to each interaction depending on its source. The augmented list of genes and gene products is then ranked combining two scores: one that reflects the strength of the relationship with the initial set of genes and incorporates user-defined weights and another that reflects the importance of the gene in maintaining the connectivity of the network. We applied the method to atherosclerosis to assess its effectiveness. RESULTS: Top-ranked proteins from the method are related to atherosclerosis with accuracy between 0.85 to 1.00 for the top 20 and 0.64 to 0.80 for the top 90 if duplicates are ignored, with 45% of the top 20 and 75% of the top 90 derived by the method, not extracted from text. Thus, though the initial gene set and interactions were automatically extracted from text (and subject to the impreciseness of automatic extraction), their use for further hypothesis generation is valuable given adequate computational analysis.


Assuntos
Mapeamento de Interação de Proteínas/estatística & dados numéricos , Aterosclerose/etiologia , Aterosclerose/genética , Biologia Computacional , Bases de Dados Genéticas , Genômica/estatística & dados numéricos , Humanos , Processamento de Linguagem Natural , Proteômica/estatística & dados numéricos , PubMed
9.
Bioinformatics ; 21 Suppl 2: ii213-9, 2005 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-16204106

RESUMO

MOTIVATION: The current knowledge about biochemical networks is largely incomplete. Thus biologists constantly need to revise or extend existing knowledge. The revision and/or extension are first formulated as theoretical hypotheses, then verified experimentally. Recently, biological data have been produced in great volumes and in diverse formats. It is a major challenge for biologists to process these data to reason about hypotheses. Many computer-aided systems have been developed to assist biologists in undertaking this challenge. The majority of the systems help in finding 'pattern' in data and leave the reasoning to biologists. A few systems have tried to automate the reasoning process of hypothesis formation. These systems generate hypotheses from a knowledge base and given observations. A main drawback of these knowledge-based systems is the knowledge representation formalisms they use. These formalisms are mostly monotonic and are now known to be not quite suitable for knowledge representation, especially in dealing with the inherently incomplete knowledge about biochemical networks. RESULTS: We present a knowledge-based framework for hypothesis formation for biochemical networks. The framework has been implemented by extending BioSigNet-RR-a knowledge based system that supports elaboration-tolerant representation and non-monotonic reasoning. Features of the extended system are illustrated by a case study of the p53 signal network. AVAILABILITY: http://www.biosignet.org


Assuntos
Algoritmos , Inteligência Artificial , Bioquímica/métodos , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador , Modelos Químicos , Proteoma/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...