Search | VHL Regional Portal

Effusion: prediction of protein function from sequence similarity networks.

Yunes, Jeffrey M; Babbitt, Patricia C.

Bioinformatics ; 35(3): 442-451, 2019 02 01.

Article in English | MEDLINE | ID: mdl-30084920

ABSTRACT

Motivation: Critical evaluation of methods for protein function prediction shows that data integration improves the performance of methods that predict protein function, but a basic BLAST-based method is still a top contender. We sought to engineer a method that modernizes the classical approach while avoiding pitfalls common to state-of-the-art methods. Results: We present a method for predicting protein function, Effusion, which uses a sequence similarity network to add context for homology transfer, a probabilistic model to account for the uncertainty in labels and function propagation, and the structure of the Gene Ontology (GO) to best utilize sparse input labels and make consistent output predictions. Effusion's model makes it practical to integrate rare experimental data and abundant primary sequence and sequence similarity. We demonstrate Effusion's performance using a critical evaluation method and provide an in-depth analysis. We also dissect the design decisions we used to address challenges for predicting protein function. Finally, we propose directions in which the framework of the method can be modified for additional predictive power. Availability and implementation: The source code for an implementation of Effusion is freely available at https://github.com/babbittlab/effusion. Supplementary information: Supplementary data are available at Bioinformatics online.

Subject(s)

Computational Biology , Proteins/chemistry , Software , Gene Ontology

GOATOOLS: A Python library for Gene Ontology analyses.

Klopfenstein, D V; Zhang, Liangsheng; Pedersen, Brent S; Ramírez, Fidel; Warwick Vesztrocy, Alex; Naldi, Aurélien; Mungall, Christopher J; Yunes, Jeffrey M; Botvinnik, Olga; Weigel, Mark; Dampier, Will; Dessimoz, Christophe; Flick, Patrick; Tang, Haibao.

Sci Rep ; 8(1): 10872, 2018 Jul 18.

Article in English | MEDLINE | ID: mdl-30022098

ABSTRACT

The biological interpretation of gene lists with interesting shared properties, such as up- or down-regulation in a particular experiment, is typically accomplished using gene ontology enrichment analysis tools. Given a list of genes, a gene ontology (GO) enrichment analysis may return hundreds of statistically significant GO results in a "flat" list, which can be challenging to summarize. It can also be difficult to keep pace with rapidly expanding biological knowledge, which often results in daily changes to any of the over 47,000 gene ontologies that describe biological knowledge. GOATOOLS, a Python-based library, makes it more efficient to stay current with the latest ontologies and annotations, perform gene ontology enrichment analyses to determine over- and under-represented terms, and organize results for greater clarity and easier interpretation using a novel GOATOOLS GO grouping method. We performed functional analyses on both stochastic simulation data and real data from a published RNA-seq study to compare the enrichment results from GOATOOLS to two other popular tools: DAVID and GOstats. GOATOOLS is freely available through GitHub: https://github.com/tanghaibao/goatools .

Subject(s)

Alzheimer Disease/genetics , Biomarkers/analysis , Computational Biology/methods , Disease Models, Animal , Gene Expression Regulation, Developmental , Gene Ontology , Software , Algorithms , Alzheimer Disease/pathology , Animals , Gene Expression Profiling , Mice

The Structure-Function Linkage Database.

Akiva, Eyal; Brown, Shoshana; Almonacid, Daniel E; Barber, Alan E; Custer, Ashley F; Hicks, Michael A; Huang, Conrad C; Lauck, Florian; Mashiyama, Susan T; Meng, Elaine C; Mischel, David; Morris, John H; Ojha, Sunil; Schnoes, Alexandra M; Stryke, Doug; Yunes, Jeffrey M; Ferrin, Thomas E; Holliday, Gemma L; Babbitt, Patricia C.

Nucleic Acids Res ; 42(Database issue): D521-30, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24271399

ABSTRACT

The Structure-Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure-function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies 'look alike', making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.

Subject(s)

Databases, Protein , Enzymes/chemistry , Enzymes/classification , Enzymes/metabolism , Internet , Molecular Sequence Annotation , Sequence Alignment , Structure-Activity Relationship

A large-scale evaluation of computational protein function prediction.

Radivojac, Predrag; Clark, Wyatt T; Oron, Tal Ronnen; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kaßner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian.

Nat Methods ; 10(3): 221-7, 2013 Mar.

Article in English | MEDLINE | ID: mdl-23353650

ABSTRACT

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Subject(s)

Computational Biology/methods , Molecular Biology/methods , Molecular Sequence Annotation , Proteins/physiology , Algorithms , Animals , Databases, Protein , Exoribonucleases/classification , Exoribonucleases/genetics , Exoribonucleases/physiology , Forecasting , Humans , Proteins/chemistry , Proteins/classification , Proteins/genetics , Species Specificity

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL