Pesquisa | Portal Regional da BVS

Prasad, Rashmi; McRoy, Susan; Frid, Nadya; Joshi, Aravind; Yu, Hong.

BMC Bioinformatics ; 12: 188, 2011 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-21605399

RESUMO

BACKGROUND: Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource. RESULTS: We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus. Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles. We introduced new conventions and modifications to the sense classification. We report reliable inter-annotator agreement of over 80% for all sub-tasks. Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89). These results are comparable to results obtained with the same classifier on the PDTB data. With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data. The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances. Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57). CONCLUSION: Our work shows that discourse relations can be reliably annotated in biomedical text. Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data. The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus. The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.

Assuntos

Biologia Computacional/métodos , Mineração de Dados , Software , Humanos , Processamento de Linguagem Natural , Semântica

Statistical mechanics of helix bundles using a dynamic programming approach.

Lucas, Adam; Huang, Liang; Joshi, Aravind; Dill, Ken A.

J Am Chem Soc ; 129(14): 4272-81, 2007 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-17362002

RESUMO

Despite much study, biomolecule folding cooperativity is not well understood. There are quantitative models for helix-coil transitions and for coil-to-globule transitions, but no accurate models yet treat both chain collapse and secondary structure formation together. We develop here a dynamic programming approach to statistical mechanical partition functions of foldamer chain molecules. We call it the ascending levels model. We apply it to helix-coil and helix-bundle folding and cooperativity. For 14- to 50-mer Baldwin peptides, the model gives good predictions for the heat capacity and helicity versus temperature and urea. The model also gives good fits for the denaturation of Oas's three-helix bundle B domain of protein A (F13W*) and synthetic protein alpha3C by temperature and guanidine. The model predicts the conformational distributions. It shows that these proteins fold with transitions that are two-state, although the transitions in the Baldwin helices are nearly higher order. The model shows that the recently developed three-helix bundle polypeptoids of Lee et al. fold anti-cooperatively, with a predicted value of DeltaHvH/DeltaHcal = 0.72. The model also predicts that two-helix bundles are unstable in proteins but stable in peptoids. Our dynamic programming approach provides a general way to explore cooperativity in complex foldable polymers.

Assuntos

Proteínas/química , Calorimetria , Simulação por Computador , Modelos Moleculares , Desnaturação Proteica , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/metabolismo , Temperatura , Termodinâmica

Routes are trees: the parsing perspective on protein folding.

Hockenmaier, Julia; Joshi, Aravind K; Dill, Ken A.

Proteins ; 66(1): 1-15, 2007 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-17063473

RESUMO

An important puzzle in structural biology is the question of how proteins are able to fold so quickly into their unique native structures. There is much evidence that protein folding is hierarchic. In that case, folding routes are not linear, but have a tree structure. Trees are commonly used to represent the grammatical structure of natural language sentences, and chart parsing algorithms efficiently search the space of all possible trees for a given input string. Here we show that one such method, the CKY algorithm, can be useful both for providing novel insight into the physical protein folding process, and for computational protein structure prediction. As proof of concept, we apply this algorithm to the HP lattice model of proteins. Our algorithm identifies all direct folding route trees to the native state and allows us to construct a simple model of the folding process. Despite its simplicity, our model provides an account for the fact that folding rates depend only on the topology of the native state but not on sequence composition.

Assuntos

Algoritmos , Conformação Proteica , Biologia Computacional , Modelos Biológicos , Modelos Moleculares , Dobramento de Proteína , Proteínas/química

Grammatical representations of macromolecular structure.

Chiang, David; Joshi, Aravind K; Searls, David B.

J Comput Biol ; 13(5): 1077-100, 2006 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-16796552

RESUMO

Since the first application of context-free grammars to RNA secondary structures in 1988, many researchers have used both ad hoc and formal methods from computational linguistics to model RNA and protein structure. We show how nearly all of these methods are based on the same core principles and can be converted into equivalent approaches in the framework of tree-adjoining grammars and related formalisms. We also propose some new approaches that extend these core principles in novel ways.

Assuntos

Algoritmos , Modelos Químicos , Conformação de Ácido Nucleico , Estrutura Secundária de Proteína , Análise de Sequência de Proteína , Análise de Sequência de RNA , Biologia Computacional

A grammatical theory for the conformational changes of simple helix bundles.

Chiang, David; Joshi, Aravind K; Dill, Ken A.

J Comput Biol ; 13(1): 21-42, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-16472020

RESUMO

Polymers, including biomolecules such as proteins, have two particularly important types of single-molecule transitions: a helix-coil transition, driven by interactions that are local in the chain, and a collapse transition, driven by nonlocal interactions. A long-standing challenge of polymer statistical mechanics has been to deal with both types of transition in a single theoretical framework. The simplest paradigmatic problem would be a theory of helix-bundle folding. Here, we show how the machinery of formal grammars, originally developed in the context of linguistic analysis and programming-language compilation, provides a simple and general way to combine the Zimm-Bragg model of alpha-helices with the model of Chen and Dill for nonlocal interactions in antiparallel polymeric systems. We use a well-known construction in the theory of formal grammars to give the statistical mechanical partition function for two-helix bundles. Predictions are shown to be quite good in comparison to exact enumerations within a lattice model.

Assuntos

Biologia Computacional , Modelos Moleculares , Conformação de Ácido Nucleico , Estrutura Secundária de Proteína , Animais , Sequência de Bases , Humanos , Estrutura Secundária de Proteína/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA