Pesquisa | Portal Regional da BVS

1.

Missing gene identification using functional coherence scores.

Chitale, Meghana; Khan, Ishita K; Kihara, Daisuke.

Sci Rep ; 6: 31725, 2016 08 24.

Artigo em Inglês | MEDLINE | ID: mdl-27552989

RESUMO

Reconstructing metabolic and signaling pathways is an effective way of interpreting a genome sequence. A challenge in a pathway reconstruction is that often genes in a pathway cannot be easily found, reflecting current imperfect information of the target organism. In this work, we developed a new method for finding missing genes, which integrates multiple features, including gene expression, phylogenetic profile, and function association scores. Particularly, for considering function association between candidate genes and neighboring proteins to the target missing gene in the network, we used Co-occurrence Association Score (CAS) and PubMed Association Score (PAS), which are designed for capturing functional coherence of proteins. We showed that adding CAS and PAS substantially improve the accuracy of identifying missing genes in the yeast enzyme-enzyme network compared to the cases when only the conventional features, gene expression, phylogenetic profile, were used. Finally, it was also demonstrated that the accuracy improves by considering indirect neighbors to the target enzyme position in the network using a proper network-topology-based weighting scheme.

Assuntos

Biologia Computacional/métodos , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Redes Reguladoras de Genes , Genômica/métodos , Saccharomyces cerevisiae/genética , Algoritmos , Mapeamento Cromossômico , Simulação por Computador , Enzimas/química , Proteínas Fúngicas/química , Modelos Estatísticos , Filogenia , Probabilidade , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/enzimologia

2.

PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool.

Khan, Ishita K; Wei, Qing; Chitale, Meghana; Kihara, Daisuke.

Bioinformatics ; 31(2): 271-2, 2015 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-25273111

RESUMO

UNLABELLED: Protein function prediction (PFP) is an automated function prediction method that predicts Gene Ontology (GO) annotations for a protein sequence using distantly related sequences and contextual associations of GO terms. Extended similarity group (ESG) is another GO prediction algorithm that makes predictions based on iterative sequence database searches. Here, we provide interactive web servers for the PFP and ESG algorithms that are equipped with an effective visualization of the GO predictions in a hierarchical topology. AVAILABILITY: PFP/ESG servers are freely available at http://kiharalab.org/web/pfp.php and http://kiharalab.org/web/esg.php, or access both at http://kiharalab.org/pfp_esg.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Biologia Computacional/métodos , Gráficos por Computador , Ontologia Genética , Anotação de Sequência Molecular , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Humanos

3.

In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.

Chitale, Meghana; Khan, Ishita K; Kihara, Daisuke.

BMC Bioinformatics ; 14 Suppl 3: S2, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23514353

RESUMO

BACKGROUND: Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA. RESULTS: We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST. CONCLUSION: The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.

Assuntos

Proteínas/fisiologia , Análise de Sequência de Proteína , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/genética

4.

A large-scale evaluation of computational protein function prediction.

Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kaßner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian.

Nat Methods ; 10(3): 221-7, 2013 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-23353650

RESUMO

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Assuntos

Biologia Computacional/métodos , Biologia Molecular/métodos , Anotação de Sequência Molecular , Proteínas/fisiologia , Algoritmos , Animais , Bases de Dados de Proteínas , Exorribonucleases/classificação , Exorribonucleases/genética , Exorribonucleases/fisiologia , Previsões , Humanos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Especificidade da Espécie

5.

Evaluation of function predictions by PFP, ESG,and PSI-BLAST for moonlighting proteins.

Khan, Ishita; Chitale, Meghana; Rayon, Catherine; Kihara, Daisuke.

BMC Proc ; 6 Suppl 7: S5, 2012 Nov 13.

Artigo em Inglês | MEDLINE | ID: mdl-23173871

RESUMO

BACKGROUND: Advancements in function prediction algorithms are enabling large scale computational annotation for newly sequenced genomes. With the increase in the number of functionally well characterized proteins it has been observed that there are many proteins involved in more than one function. These proteins characterized as moonlighting proteins show varied functional behavior depending on the cell type, localization in the cell, oligomerization, multiple binding sites, etc. The functional diversity shown by moonlighting proteins may have significant impact on the traditional sequence based function prediction methods. Here we investigate how well diverse functions of moonlighting proteins can be predicted by some existing function prediction methods. RESULTS: We have analyzed the performances of three major sequence based function prediction methods,PSI-BLAST, the Protein Function Prediction (PFP), and the Extended Similarity Group (ESG) on predicting diverse functions of moonlighting proteins. In predicting discrete functions of a set of 19 experimentally identified moonlighting proteins, PFP showed overall highest recall among the three methods. Although ESG showed the highest precision, its recall was lower than PSI-BLAST. Recall by PSI-BLAST greatly improved when BLOSUM45 was used instead of BLOSUM62. CONCLUSION: We have analyzed the performances of PFP, ESG, and PSI-BLAST in predicting the functional diversity of moonlighting proteins. PFP shows overall better performance in predicting diverse moonlighting functions as compared with PSI-BLAST and ESG. Recall by PSI-BLAST greatly improved when BLOSUM45 was used. This analysis indicates that considering weakly similar sequences in prediction enhances the performance of sequence based AFP methods in predicting functional diversity of moonlighting proteins. The current study will also motivate development of novel computational frameworks for automatic identification of such proteins.

6.

Protein domain recurrence and order can enhance prediction of protein functions.

Messih, Mario Abdel; Chitale, Meghana; Bajic, Vladimir B; Kihara, Daisuke; Gao, Xin.

Bioinformatics ; 28(18): i444-i450, 2012 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-22962465

RESUMO

MOTIVATION: Burgeoning sequencing technologies have generated massive amounts of genomic and proteomic data. Annotating the functions of proteins identified in this data has become a big and crucial problem. Various computational methods have been developed to infer the protein functions based on either the sequences or domains of proteins. The existing methods, however, ignore the recurrence and the order of the protein domains in this function inference. RESULTS: We developed two new methods to infer protein functions based on protein domain recurrence and domain order. Our first method, DRDO, calculates the posterior probability of the Gene Ontology terms based on domain recurrence and domain order information, whereas our second method, DRDO-NB, relies on the naïve Bayes methodology using the same domain architecture information. Our large-scale benchmark comparisons show strong improvements in the accuracy of the protein function inference achieved by our new methods, demonstrating that domain recurrence and order can provide important information for inference of protein functions. AVAILABILITY: The new models are provided as open source programs at http://sfb.kaust.edu.sa/Pages/Software.aspx. CONTACT: dkihara@cs.purdue.edu, xin.gao@kaust.edu.sa SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Online.

Assuntos

Modelos Estatísticos , Estrutura Terciária de Proteína , Proteínas/fisiologia , Teorema de Bayes , Análise de Sequência de Proteína

7.

Structure- and sequence-based function prediction for non-homologous proteins.

Sael, Lee; Chitale, Meghana; Kihara, Daisuke.

J Struct Funct Genomics ; 13(2): 111-23, 2012 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-22270458

RESUMO

The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.

Assuntos

Algoritmos , Bases de Dados de Proteínas , Proteínas/análise , Análise de Sequência de Proteína/métodos , Software , Sítios de Ligação , Biologia Computacional/métodos , Internet , Anotação de Sequência Molecular , Conformação Proteica , Proteínas/química , Reprodutibilidade dos Testes , Homologia de Sequência de Aminoácidos , Relação Estrutura-Atividade

8.

Quantification of protein group coherence and pathway assignment using functional association.

Chitale, Meghana; Palakodety, Shriphani; Kihara, Daisuke.

BMC Bioinformatics ; 12: 373, 2011 Sep 19.

Artigo em Inglês | MEDLINE | ID: mdl-21929787

RESUMO

BACKGROUND: Genomics and proteomics experiments produce a large amount of data that are awaiting functional elucidation. An important step in analyzing such data is to identify functional units, which consist of proteins that play coherent roles to carry out the function. Importantly, functional coherence is not identical with functional similarity. For example, proteins in the same pathway may not share the same Gene Ontology (GO) terms, but they work in a coordinated fashion so that the aimed function can be performed. Thus, simply applying existing functional similarity measures might not be the best solution to identify functional units in omics data. RESULTS: We have designed two scores for quantifying the functional coherence by considering association of GO terms observed in two biological contexts, co-occurrences in protein annotations and co-mentions in literature in the PubMed database. The counted co-occurrences of GO terms were normalized in a similar fashion as the statistical amino acid contact potential is computed in the protein structure prediction field. We demonstrate that the developed scores can identify functionally coherent protein sets, i.e. proteins in the same pathways, co-localized proteins, and protein complexes, with statistically significant score values showing a better accuracy than existing functional similarity scores. The scores are also capable of detecting protein pairs that interact with each other. It is further shown that the functional coherence scores can accurately assign proteins to their respective pathways. CONCLUSION: We have developed two scores which quantify the functional coherence of sets of proteins. The scores reflect the actual associations of GO terms observed either in protein annotations or in literature. It has been shown that they have the ability to accurately distinguish biologically relevant groups of proteins from random ones as well as a good discriminative power for detecting interacting pairs of proteins. The scores were further successfully applied for assigning proteins to pathways.

Assuntos

Genômica , Proteínas/genética , Proteínas/metabolismo , Vocabulário Controlado , Biologia Computacional , Probabilidade , Proteínas/química , Proteômica , PubMed

9.

Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP.

Hawkins, Troy; Chitale, Meghana; Kihara, Daisuke.

BMC Bioinformatics ; 11: 265, 2010 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-20482861

RESUMO

BACKGROUND: A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. The ability of biologists to analyze and interpret such data relies on functional annotation of the included proteins, but even in highly characterized organisms many proteins can lack the functional evidence necessary to infer their biological relevance. RESULTS: Here we have applied high confidence function predictions from our automated prediction system, PFP, to three genome sequences, Escherichia coli, Saccharomyces cerevisiae, and Plasmodium falciparum (malaria). The number of annotated genes is increased by PFP to over 90% for all of the genomes. Using the large coverage of the function annotation, we introduced the functional similarity networks which represent the functional space of the proteomes. Four different functional similarity networks are constructed for each proteome, one each by considering similarity in a single Gene Ontology (GO) category, i.e. Biological Process, Cellular Component, and Molecular Function, and another one by considering overall similarity with the funSim score. The functional similarity networks are shown to have higher modularity than the protein-protein interaction network. Moreover, the funSim score network is distinct from the single GO-score networks by showing a higher clustering degree exponent value and thus has a higher tendency to be hierarchical. In addition, examining function assignments to the protein-protein interaction network and local regions of genomes has identified numerous cases where subnetworks or local regions have functionally coherent proteins. These results will help interpreting interactions of proteins and gene orders in a genome. Several examples of both analyses are highlighted. CONCLUSION: The analyses demonstrate that applying high confidence predictions from PFP can have a significant impact on a researchers' ability to interpret the immense biological data that are being generated today. The newly introduced functional similarity networks of the three organisms show different network properties as compared with the protein-protein interaction networks.

Assuntos

Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/genética , Bases de Dados Genéticas , Bases de Dados de Proteínas , Proteínas/metabolismo

10.

ESG: extended similarity group method for automated protein function prediction.

Chitale, Meghana; Hawkins, Troy; Park, Changsoon; Kihara, Daisuke.

Bioinformatics ; 25(14): 1739-45, 2009 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-19435743

RESUMO

MOTIVATION: Importance of accurate automatic protein function prediction is ever increasing in the face of a large number of newly sequenced genomes and proteomics data that are awaiting biological interpretation. Conventional methods have focused on high sequence similarity-based annotation transfer which relies on the concept of homology. However, many cases have been reported that simple transfer of function from top hits of a homology search causes erroneous annotation. New methods are required to handle the sequence similarity in a more robust way to combine together signals from strongly and weakly similar proteins for effectively predicting function for unknown proteins with high reliability. RESULTS: We present the extended similarity group (ESG) method, which performs iterative sequence database searches and annotates a query sequence with Gene Ontology terms. Each annotation is assigned with probability based on its relative similarity score with the multiple-level neighbors in the protein similarity graph. We will depict how the statistical framework of ESG improves the prediction accuracy by iteratively taking into account the neighborhood of query protein in the sequence similarity space. ESG outperforms conventional PSI-BLAST and the protein function prediction (PFP) algorithm. It is found that the iterative search is effective in capturing multiple-domains in a query protein, enabling accurately predicting several functions which originate from different domains. AVAILABILITY: ESG web server is available for automated protein function prediction at http://dragon.bio.purdue.edu/ESG/.

Assuntos

Biologia Computacional/métodos , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Bases de Dados de Proteínas , Homologia de Sequência de Aminoácidos

11.

PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.

Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke.

Proteins ; 74(3): 566-82, 2009 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-18655063

RESUMO

Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/.

Assuntos

Proteínas/genética , Análise de Sequência de Proteína/métodos , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Genes , Mapeamento de Interação de Proteínas , Proteínas/química , Proteoma/análise , Software

12.

New paradigm in protein function prediction for large scale omics analysis.

Hawkins, Troy; Chitale, Meghana; Kihara, Daisuke.

Mol Biosyst ; 4(3): 223-31, 2008 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-18437265

RESUMO

Biological interpretation of large scale omics data, such as protein-protein interaction data and microarray gene expression data, requires that the function of many genes in a data set is annotated or predicted. Here the predicted function for a gene does not necessarily have to be a detailed biochemical function; a broad class of function, or low-resolution function, may be sufficient to understand why a set of genes shows the observed expression pattern or interaction pattern. In this Highlight, we focus on two recent approaches for function prediction which aim to provide large coverage in function prediction, namely omics data driven approaches and a thorough data mining approach on homology search results.

Assuntos

Biologia Computacional , Proteínas/química , Proteínas/metabolismo , Animais , Bases de Dados Genéticas , Modelos Biológicos , Ligação Proteica , Análise de Sequência

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA