Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 35(8): 1388-1394, 2019 04 15.
Article in English | MEDLINE | ID: mdl-30192921

ABSTRACT

MOTIVATION: Biological experiments including proteomics and transcriptomics approaches often reveal sets of proteins that are most likely to be involved in a disease/disorder. To understand the functional nature of a set of proteins, it is important to capture the function of the proteins as a group, even in cases where function of individual proteins is not known. In this work, we propose a model that takes groups of proteins found to work together in a certain biological context, integrates them into functional relevance networks, and subsequently employs an iterative inference on graphical models to identify group functions of the proteins, which are then extended to predict function of individual proteins. RESULTS: The proposed algorithm, iterative group function prediction (iGFP), depicts proteins as a graph that represents functional relevance of proteins considering their known functional, proteomics and transcriptional features. Proteins in the graph will be clustered into groups by their mutual functional relevance, which is iteratively updated using a probabilistic graphical model, the conditional random field. iGFP showed robust accuracy even when substantial amount of GO annotations were missing. The perspective of 'group' function annotation opens up novel approaches for understanding functional nature of proteins in biological systems.Availability and implementation: http://kiharalab.org/iGFP/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology , Algorithms , Proteins , Proteomics
2.
IEEE/ACM Trans Comput Biol Bioinform ; 15(4): 1247-1258, 2018.
Article in English | MEDLINE | ID: mdl-26415209

ABSTRACT

Proteins carry out their function in a cell through interactions with other proteins. A large scale protein-protein interaction (PPI) network of an organism provides static yet an essential structure of interactions, which is valuable clue for understanding the functions of proteins and pathways. PPIs are determined primarily by experimental methods; however, computational PPI prediction methods can supplement or verify PPIs identified by experiment. Here, we developed a novel scoring method for predicting PPIs from Gene Ontology (GO) annotations of proteins. Unlike existing methods that consider functional similarity as an indication of interaction between proteins, the new score, named the protein-protein Interaction Association Score (IAS), was computed from GO term associations of known interacting protein pairs in 49 organisms. IAS was evaluated on PPI data of six organisms and found to outperform existing GO term-based scoring methods. Moreover, consensus scoring methods that combine different scores further improved performance of PPI prediction.


Subject(s)
Gene Ontology , Protein Interaction Mapping/methods , Protein Interaction Maps , Systems Biology/methods , Animals , Fungi/genetics , Humans , Plants/genetics , Protein Interaction Maps/genetics , Protein Interaction Maps/physiology , Proteins/genetics , Proteins/metabolism , Proteins/physiology
3.
Bioinformatics ; 33(14): i83-i91, 2017 Jul 15.
Article in English | MEDLINE | ID: mdl-28881966

ABSTRACT

MOTIVATION: Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant impact in computational function prediction and annotation in databases. Currently MPs are not labeled as such in biological databases even in cases where multiple distinct functions are known for the proteins. In this work, we propose a novel method named DextMP, which predicts whether a protein is a MP or not based on its textual features extracted from scientific literature and the UniProt database. RESULTS: DextMP extracts three categories of textual information for a protein: titles, abstracts from literature, and function description in UniProt. Three language models were applied and compared: a state-of-the-art deep unsupervised learning algorithm along with two other language models of different types, Term Frequency-Inverse Document Frequency in the bag-of-words and Latent Dirichlet Allocation in the topic modeling category. Cross-validation results on a dataset of known MPs and non-MPs showed that DextMP successfully predicted MPs with over 91% accuracy with significant improvement over existing MP prediction methods. Lastly, we ran DextMP with the best performing language models and text-based feature combinations on three genomes, human, yeast and Xenopus laevis , and found that about 2.5-35% of the proteomes are potential MPs. AVAILABILITY AND IMPLEMENTATION: Code available at http://kiharalab.org/DextMP . CONTACT: dkihara@purdue.edu.


Subject(s)
Data Mining/methods , Proteomics/methods , Software , Unsupervised Machine Learning , Animals , Databases, Factual , Humans , Models, Biological , Molecular Sequence Annotation , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Xenopus laevis/genetics , Xenopus laevis/metabolism
4.
Methods Mol Biol ; 1611: 1-14, 2017.
Article in English | MEDLINE | ID: mdl-28451967

ABSTRACT

Elucidating biological function of proteins is a fundamental problem in molecular biology and bioinformatics. Conventionally, protein function is annotated based on homology using sequence similarity search tools such as BLAST and FASTA. These methods perform well when obvious homologs exist for a query sequence; however, they will not provide any functional information otherwise. As a result, the functions of many genes in newly sequenced genomes are left unknown, which await functional interpretation. Here, we introduce two webservers for function prediction methods, which effectively use distantly related sequences to improve function annotation coverage and accuracy: Protein Function Prediction (PFP) and Extended Similarity Group (ESG). These two methods have been tested extensively in various benchmark studies and ranked among the top in community-based assessments for computational function annotation, including Critical Assessment of Function Annotation (CAFA) in 2010-2011 (CAFA1) and 2013-2014 (CAFA2). Both servers are equipped with user-friendly visualizations of predicted GO terms, which provide intuitive illustrations of relationships of predicted GO terms. In addition to PFP and ESG, we also introduce NaviGO, a server for the interactive analysis of GO annotations of proteins. All the servers are available at http://kiharalab.org/software.php .


Subject(s)
Computational Biology/methods , Proteins , Software , Algorithms , Databases, Protein , Sequence Analysis, Protein
5.
Methods Mol Biol ; 1611: 45-57, 2017.
Article in English | MEDLINE | ID: mdl-28451971

ABSTRACT

An increasing number of proteins have been found which are capable of performing two or more distinct functions. These proteins, known as moonlighting proteins, have drawn much attention recently as they may play critical roles in disease pathways and development. However, because moonlighting proteins are often found serendipitously, our understanding of moonlighting proteins is still quite limited. In order to lay the foundation for systematic moonlighting proteins studies, we developed MPFit, a software package for predicting moonlighting proteins from their omics features including protein-protein and gene interaction networks. Here, we describe and demonstrate the algorithm of MPFit, the idea behind it, and provide instruction for using the software.


Subject(s)
Computational Biology/methods , Proteins/metabolism , Proteins/genetics
6.
BMC Bioinformatics ; 18(1): 177, 2017 Mar 20.
Article in English | MEDLINE | ID: mdl-28320317

ABSTRACT

BACKGROUND: The number of genomics and proteomics experiments is growing rapidly, producing an ever-increasing amount of data that are awaiting functional interpretation. A number of function prediction algorithms were developed and improved to enable fast and automatic function annotation. With the well-defined structure and manual curation, Gene Ontology (GO) is the most frequently used vocabulary for representing gene functions. To understand relationship and similarity between GO annotations of genes, it is important to have a convenient pipeline that quantifies and visualizes the GO function analyses in a systematic fashion. RESULTS: NaviGO is a web-based tool for interactive visualization, retrieval, and computation of functional similarity and associations of GO terms and genes. Similarity of GO terms and gene functions is quantified with six different scores including protein-protein interaction and context based association scores we have developed in our previous works. Interactive navigation of the GO function space provides intuitive and effective real-time visualization of functional groupings of GO terms and genes as well as statistical analysis of enriched functions. CONCLUSIONS: We developed NaviGO, which visualizes and analyses functional similarity and associations of GO terms and genes. The NaviGO webserver is freely available at: http://kiharalab.org/web/navigo .


Subject(s)
Gene Ontology/trends , Genomics/methods
7.
Genome Biol ; 17(1): 184, 2016 09 07.
Article in English | MEDLINE | ID: mdl-27604469

ABSTRACT

BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.


Subject(s)
Computational Biology , Proteins/chemistry , Software , Structure-Activity Relationship , Algorithms , Databases, Protein , Gene Ontology , Humans , Molecular Sequence Annotation , Proteins/genetics
8.
Sci Rep ; 6: 31725, 2016 08 24.
Article in English | MEDLINE | ID: mdl-27552989

ABSTRACT

Reconstructing metabolic and signaling pathways is an effective way of interpreting a genome sequence. A challenge in a pathway reconstruction is that often genes in a pathway cannot be easily found, reflecting current imperfect information of the target organism. In this work, we developed a new method for finding missing genes, which integrates multiple features, including gene expression, phylogenetic profile, and function association scores. Particularly, for considering function association between candidate genes and neighboring proteins to the target missing gene in the network, we used Co-occurrence Association Score (CAS) and PubMed Association Score (PAS), which are designed for capturing functional coherence of proteins. We showed that adding CAS and PAS substantially improve the accuracy of identifying missing genes in the yeast enzyme-enzyme network compared to the cases when only the conventional features, gene expression, phylogenetic profile, were used. Finally, it was also demonstrated that the accuracy improves by considering indirect neighbors to the target enzyme position in the network using a proper network-topology-based weighting scheme.


Subject(s)
Computational Biology/methods , Gene Expression Profiling , Gene Expression Regulation, Fungal , Gene Regulatory Networks , Genomics/methods , Saccharomyces cerevisiae/genetics , Algorithms , Chromosome Mapping , Computer Simulation , Enzymes/chemistry , Fungal Proteins/chemistry , Models, Statistical , Phylogeny , Probability , Reproducibility of Results , Saccharomyces cerevisiae/enzymology
9.
Bioinformatics ; 32(15): 2281-8, 2016 08 01.
Article in English | MEDLINE | ID: mdl-27153604

ABSTRACT

MOTIVATION: Moonlighting proteins (MPs) show multiple cellular functions within a single polypeptide chain. To understand the overall landscape of their functional diversity, it is important to establish a computational method that can identify MPs on a genome scale. Previously, we have systematically characterized MPs using functional and omics-scale information. In this work, we develop a computational prediction model for automatic identification of MPs using a diverse range of protein association information. RESULTS: We incorporated a diverse range of protein association information to extract characteristic features of MPs, which range from gene ontology (GO), protein-protein interactions, gene expression, phylogenetic profiles, genetic interactions and network-based graph properties to protein structural properties, i.e. intrinsically disordered regions in the protein chain. Then, we used machine learning classifiers using the broad feature space for predicting MPs. Because many known MPs lack some proteomic features, we developed an imputation technique to fill such missing features. Results on the control dataset show that MPs can be predicted with over 98% accuracy when GO terms are available. Furthermore, using only the omics-based features the method can still identify MPs with over 75% accuracy. Last, we applied the method on three genomes: Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens, and found that about 2-10% of proteins in the genomes are potential MPs. AVAILABILITY AND IMPLEMENTATION: Code available at http://kiharalab.org/MPprediction CONTACT: dkihara@purdue.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology , Proteomics , Genome , Humans , Machine Learning , Phylogeny , Proteins
10.
Gigascience ; 4: 43, 2015.
Article in English | MEDLINE | ID: mdl-26380077

ABSTRACT

BACKGROUND: Functional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013-2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets. RESULTS: For CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed. CONCLUSIONS: Updating the annotation database was successful, improving the Fmax prediction accuracy score for both PFP and ESG. Adding the prior distribution of GO terms did not make much improvement. Both of the ensemble methods we developed improved the average Fmax score over all individual component methods except for ESG. Our benchmark results will not only complement the overall assessment that will be done by the CAFA organizers, but also help elucidate the predictive powers of sequence-based function prediction methods in general.


Subject(s)
Databases, Protein , Proteins/physiology
11.
Bioinformatics ; 31(2): 271-2, 2015 Jan 15.
Article in English | MEDLINE | ID: mdl-25273111

ABSTRACT

UNLABELLED: Protein function prediction (PFP) is an automated function prediction method that predicts Gene Ontology (GO) annotations for a protein sequence using distantly related sequences and contextual associations of GO terms. Extended similarity group (ESG) is another GO prediction algorithm that makes predictions based on iterative sequence database searches. Here, we provide interactive web servers for the PFP and ESG algorithms that are equipped with an effective visualization of the GO predictions in a hierarchical topology. AVAILABILITY: PFP/ESG servers are freely available at http://kiharalab.org/web/pfp.php and http://kiharalab.org/web/esg.php, or access both at http://kiharalab.org/pfp_esg.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Computational Biology/methods , Computer Graphics , Gene Ontology , Molecular Sequence Annotation , Proteins/metabolism , Sequence Analysis, Protein/methods , Databases, Protein , Humans
12.
Biol Direct ; 9: 30, 2014 Dec 11.
Article in English | MEDLINE | ID: mdl-25497125

ABSTRACT

BACKGROUND: Moonlighting proteins perform two or more cellular functions, which are selected based on various contexts including the cell type they are expressed, their oligomerization status, and the binding of different ligands at different sites. To understand overall landscape of their functional diversity, it is important to establish methods that can identify moonlighting proteins in a systematic fashion. Here, we have developed a computational framework to find moonlighting proteins on a genome scale and identified multiple proteomic characteristics of these proteins. RESULTS: First, we analyzed Gene Ontology (GO) annotations of known moonlighting proteins. We found that the GO annotations of moonlighting proteins can be clustered into multiple groups reflecting their diverse functions. Then, by considering the observed GO term separations, we identified 33 novel moonlighting proteins in Escherichia coli and confirmed them by literature review. Next, we analyzed moonlighting proteins in terms of protein-protein interaction, gene expression, phylogenetic profile, and genetic interaction networks. We found that moonlighting proteins physically interact with a higher number of distinct functional classes of proteins than non-moonlighting ones and also found that most of the physically interacting partners of moonlighting proteins share the latter's primary functions. Interestingly, we also found that moonlighting proteins tend to interact with other moonlighting proteins. In terms of gene expression and phylogenetically related proteins, a weak trend was observed that moonlighting proteins interact with more functionally diverse proteins. Structural characteristics of moonlighting proteins, i.e. intrinsic disordered regions and ligand binding sites were also investigated. CONCLUSION: Additional functions of moonlighting proteins are difficult to identify by experiments and these proteins also pose a significant challenge for computational function annotation. Our method enables identification of novel moonlighting proteins from current functional annotations in public databases. Moreover, we showed that potential moonlighting proteins without sufficient functional annotations can be identified by analyzing available omics-scale data. Our findings open up new possibilities for investigating the multi-functional nature of proteins at the systems level and for exploring the complex functional interplay of proteins in a cell. REVIEWERS: This article was reviewed by Michael Galperin, Eugine Koonin, and Nick Grishin.


Subject(s)
Computational Biology/methods , Gene Ontology , Genome , Proteome , Databases, Protein , Epistasis, Genetic , Gene Expression , Phylogeny , Protein Interaction Maps
13.
Biochem Soc Trans ; 42(6): 1780-5, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25399606

ABSTRACT

Moonlighting proteins perform multiple independent cellular functions within one polypeptide chain. Moonlighting proteins switch functions depending on various factors including the cell-type in which they are expressed, cellular location, oligomerization status and the binding of different ligands at different sites. Although an increasing number of moonlighting proteins have been experimentally identified in recent years, the quantity of known moonlighting proteins is insufficient to elucidate their overall landscape. Moreover, most moonlighting proteins have been identified as a serendipitous discovery. Hence, characterization of moonlighting proteins using bioinformatics approaches can have a significant impact on the overall understanding of protein function. In this work, we provide a short review of existing computational approaches for illuminating the functional diversity of moonlighting proteins.


Subject(s)
Computational Biology , Proteins/physiology , Biopolymers/chemistry , Biopolymers/metabolism , Databases, Protein , Proteins/chemistry , Proteins/genetics
14.
BMC Bioinformatics ; 14 Suppl 3: S2, 2013.
Article in English | MEDLINE | ID: mdl-23514353

ABSTRACT

BACKGROUND: Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA. RESULTS: We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST. CONCLUSION: The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.


Subject(s)
Proteins/physiology , Sequence Analysis, Protein , Algorithms , Computational Biology/methods , Databases, Protein , Proteins/genetics
15.
BMC Proc ; 6 Suppl 7: S5, 2012 Nov 13.
Article in English | MEDLINE | ID: mdl-23173871

ABSTRACT

BACKGROUND: Advancements in function prediction algorithms are enabling large scale computational annotation for newly sequenced genomes. With the increase in the number of functionally well characterized proteins it has been observed that there are many proteins involved in more than one function. These proteins characterized as moonlighting proteins show varied functional behavior depending on the cell type, localization in the cell, oligomerization, multiple binding sites, etc. The functional diversity shown by moonlighting proteins may have significant impact on the traditional sequence based function prediction methods. Here we investigate how well diverse functions of moonlighting proteins can be predicted by some existing function prediction methods. RESULTS: We have analyzed the performances of three major sequence based function prediction methods,PSI-BLAST, the Protein Function Prediction (PFP), and the Extended Similarity Group (ESG) on predicting diverse functions of moonlighting proteins. In predicting discrete functions of a set of 19 experimentally identified moonlighting proteins, PFP showed overall highest recall among the three methods. Although ESG showed the highest precision, its recall was lower than PSI-BLAST. Recall by PSI-BLAST greatly improved when BLOSUM45 was used instead of BLOSUM62. CONCLUSION: We have analyzed the performances of PFP, ESG, and PSI-BLAST in predicting the functional diversity of moonlighting proteins. PFP shows overall better performance in predicting diverse moonlighting functions as compared with PSI-BLAST and ESG. Recall by PSI-BLAST greatly improved when BLOSUM45 was used. This analysis indicates that considering weakly similar sequences in prediction enhances the performance of sequence based AFP methods in predicting functional diversity of moonlighting proteins. The current study will also motivate development of novel computational frameworks for automatic identification of such proteins.

SELECTION OF CITATIONS
SEARCH DETAIL
...