Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 34(2): 323-329, 2018 Jan 15.
Article in English | MEDLINE | ID: mdl-28968857

ABSTRACT

The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.

2.
Nat Methods ; 13(5): 425-30, 2016 05.
Article in English | MEDLINE | ID: mdl-27043882

ABSTRACT

Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.


Subject(s)
Computational Biology/standards , Genomics/standards , Phylogeny , Proteomics/standards , Archaea/classification , Archaea/genetics , Bacteria/classification , Bacteria/genetics , Computational Biology/methods , Databases, Genetic , Eukaryota/classification , Eukaryota/genetics , Gene Ontology , Genomics/methods , Models, Genetic , Proteomics/methods , Sequence Analysis, Protein , Sequence Homology , Species Specificity
3.
Genome Biol Evol ; 7(7): 1988-99, 2015 Jul 01.
Article in English | MEDLINE | ID: mdl-26133389

ABSTRACT

Quest for Orthologs (QfO) is a community effort with the goal to improve and benchmark orthology predictions. As quality assessment assumes prior knowledge on species phylogenies, we investigated the congruency between existing species trees by comparing the relationships of 147 QfO reference organisms from six Tree of Life (ToL)/species tree projects: The National Center for Biotechnology Information (NCBI) taxonomy, Opentree of Life, the sequenced species/species ToL, the 16S ribosomal RNA (rRNA) database, and trees published by Ciccarelli et al. (Ciccarelli FD, et al. 2006. Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283-1287) and by Huerta-Cepas et al. (Huerta-Cepas J, Marcet-Houben M, Gabaldon T. 2014. A nested phylogenetic reconstruction approach provides scalable resolution in the eukaryotic Tree Of Life. PeerJ PrePrints 2:223) Our study reveals that each species tree suggests a different phylogeny: 87 of the 146 (60%) possible splits of a dichotomous and rooted tree are congruent, while all other splits are incongruent in at least one of the species trees. Topological differences are observed not only at deep speciation events, but also within younger clades, such as Hominidae, Rodentia, Laurasiatheria, or rosids. The evolutionary relationships of 27 archaea and bacteria are highly inconsistent. By assessing 458,108 gene trees from 65 genomes, we show that consistent species topologies are more often supported by gene phylogenies than contradicting ones. The largest concordant species tree includes 77 of the QfO reference organisms at the most. Results are summarized in the form of a consensus ToL (http://swisstree.vital-it.ch/species_tree) that can serve different benchmarking purposes.


Subject(s)
Phylogeny , Archaea/classification , Archaea/genetics , Bacteria/classification , Bacteria/genetics , Eukaryota/classification , Eukaryota/genetics , Genes
4.
Bioinformatics ; 30(21): 2993-8, 2014 Nov 01.
Article in English | MEDLINE | ID: mdl-25064571

ABSTRACT

UNLABELLED: Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION: All such materials are available at http://questfororthologs.org.


Subject(s)
Genomics/methods , Sequence Homology , Algorithms , Protein Structure, Tertiary , Proteome , Sequence Analysis, DNA , Sequence Analysis, Protein
5.
Plant Physiol ; 165(4): 1709-1722, 2014 Aug.
Article in English | MEDLINE | ID: mdl-24920445

ABSTRACT

CASPARIAN STRIP MEMBRANE DOMAIN PROTEINS (CASPs) are four-membrane-span proteins that mediate the deposition of Casparian strips in the endodermis by recruiting the lignin polymerization machinery. CASPs show high stability in their membrane domain, which presents all the hallmarks of a membrane scaffold. Here, we characterized the large family of CASP-like (CASPL) proteins. CASPLs were found in all major divisions of land plants as well as in green algae; homologs outside of the plant kingdom were identified as members of the MARVEL protein family. When ectopically expressed in the endodermis, most CASPLs were able to integrate the CASP membrane domain, which suggests that CASPLs share with CASPs the propensity to form transmembrane scaffolds. Extracellular loops are not necessary for generating the scaffold, since CASP1 was still able to localize correctly when either one of the extracellular loops was deleted. The CASP first extracellular loop was found conserved in euphyllophytes but absent in plants lacking Casparian strips, an observation that may contribute to the study of Casparian strip and root evolution. In Arabidopsis (Arabidopsis thaliana), CASPL showed specific expression in a variety of cell types, such as trichomes, abscission zone cells, peripheral root cap cells, and xylem pole pericycle cells.

6.
PLoS One ; 8(3): e58126, 2013.
Article in English | MEDLINE | ID: mdl-23505460

ABSTRACT

A heme-containing transmembrane ferric reductase domain (FRD) is found in bacterial and eukaryotic protein families, including ferric reductases (FRE), and NADPH oxidases (NOX). The aim of this study was to understand the phylogeny of the FRD superfamily. Bacteria contain FRD proteins consisting only of the ferric reductase domain, such as YedZ and short bFRE proteins. Full length FRE and NOX enzymes are mostly found in eukaryotic cells and all possess a dehydrogenase domain, allowing them to catalyze electron transfer from cytosolic NADPH to extracellular metal ions (FRE) or oxygen (NOX). Metazoa possess YedZ-related STEAP proteins, possibly derived from bacteria through horizontal gene transfer. Phylogenetic analyses suggests that FRE enzymes appeared early in evolution, followed by a transition towards EF-hand containing NOX enzymes (NOX5- and DUOX-like). An ancestral gene of the NOX(1-4) family probably lost the EF-hands and new regulatory mechanisms of increasing complexity evolved in this clade. Two signature motifs were identified: NOX enzymes are distinguished from FRE enzymes through a four amino acid motif spanning from transmembrane domain 3 (TM3) to TM4, and YedZ/STEAP proteins are identified by the replacement of the first canonical heme-spanning histidine by a highly conserved arginine. The FRD superfamily most likely originated in bacteria.


Subject(s)
Biological Evolution , FMN Reductase/chemistry , FMN Reductase/metabolism , Protein Interaction Domains and Motifs , Amino Acid Motifs , Cluster Analysis , Conserved Sequence , FMN Reductase/classification , FMN Reductase/genetics , Heme/chemistry , Heme/metabolism , Models, Biological , Multigene Family , NADH, NADPH Oxidoreductases/chemistry , NADH, NADPH Oxidoreductases/metabolism , Phylogeny , Position-Specific Scoring Matrices , Reactive Oxygen Species/metabolism
7.
Brief Bioinform ; 12(5): 423-35, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21737420

ABSTRACT

Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of 'Gold standard' phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.


Subject(s)
Databases, Genetic , Genomics/methods , Phylogeny , Algorithms , Evolution, Molecular , Pilot Projects
8.
Nucleic Acids Res ; 34(11): 3309-16, 2006.
Article in English | MEDLINE | ID: mdl-16835308

ABSTRACT

Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.


Subject(s)
Algorithms , Databases, Protein , Genomics/methods , Evolution, Molecular , Phylogeny , Proteins/classification , Proteins/genetics , Sequence Alignment , Sequence Analysis, Protein
9.
Nucleic Acids Res ; 34(Database issue): D187-91, 2006 Jan 01.
Article in English | MEDLINE | ID: mdl-16381842

ABSTRACT

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.


Subject(s)
Databases, Protein , Internet , Proteins/chemistry , Proteins/classification , Proteins/physiology , Proteome/chemistry , Sequence Analysis, Protein , Systems Integration , User-Computer Interface
10.
C R Biol ; 328(10-11): 882-99, 2005.
Article in English | MEDLINE | ID: mdl-16286078

ABSTRACT

We all know that the dogma 'one gene, one protein' is obsolete. A functional protein and, likewise, a protein's ultimate function depend not only on the underlying genetic information but also on the ongoing conditions of the cellular system. Frequently the transcript, like the polypeptide, is processed in multiple ways, but only one or a few out of a multitude of possible variants are produced at a time. An overview on processes that can lead to sequence variety and structural diversity in eukaryotes is given. The UniProtKB/Swiss-Prot protein knowledgebase provides a wealth of information regarding protein variety, function and associated disorders. Examples for such annotation are shown and further ones are available at http://www.expasy.org/sprot/tutorial/examples_CRB.


Subject(s)
Knowledge Bases , Proteins/chemistry , Amino Acid Sequence , Molecular Sequence Data , Protein Folding
11.
Nucleic Acids Res ; 33(Database issue): D154-9, 2005 Jan 01.
Article in English | MEDLINE | ID: mdl-15608167

ABSTRACT

The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks.


Subject(s)
Databases, Protein , Proteins/chemistry , Amino Acid Sequence , Proteins/physiology , Systems Integration , User-Computer Interface
12.
Proteomics ; 4(6): 1537-50, 2004 Jun.
Article in English | MEDLINE | ID: mdl-15174124

ABSTRACT

High-throughput proteomic studies produce a wealth of new information regarding post-translational modifications (PTMs). The Swiss-Prot knowledge base is faced with the challenge of including this information in a consistent and structured way, in order to facilitate easy retrieval and promote understanding by biologist expert users as well as computer programs. We are therefore standardizing the annotation of PTM features represented in Swiss-Prot. Indeed, a controlled vocabulary has been associated with every described PTM. In this paper, we present the major update of the feature annotation, and, by showing a few examples, explain how the annotation is implemented and what it means. Mod-Prot, a future companion database of Swiss-Prot, devoted to the biological aspects of PTMs (i.e., general description of the process, identity of the modification enzyme(s), taxonomic range, mass modification) is briefly described. Finally we encourage once again the scientific community (i.e., both individual researchers and database maintainers) to interact with us, so that we can continuously enhance the quality and swiftness of our services.


Subject(s)
Databases, Protein , Protein Processing, Post-Translational , Computational Biology , Databases, Protein/standards , Forecasting , Information Systems , Sequence Analysis, Protein , Systems Integration
13.
Brief Bioinform ; 5(1): 39-55, 2004 Mar.
Article in English | MEDLINE | ID: mdl-15153305

ABSTRACT

We describe some of the aspects of Swiss-Prot that make it unique, explain what are the developments we believe to be necessary for the database to continue to play its role as a focal point of protein knowledge, and provide advice pertinent to the development of high-quality knowledge resources on one aspect or the other of the life sciences.


Subject(s)
Databases, Protein , Software Design , Amino Acid Sequence , Animals , Databases, Protein/history , History, 20th Century , History, 21st Century , Humans , Information Storage and Retrieval , Internet , Proteins/classification , Proteins/genetics , User-Computer Interface
14.
Nucleic Acids Res ; 32(Database issue): D115-9, 2004 Jan 01.
Article in English | MEDLINE | ID: mdl-14681372

ABSTRACT

To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.


Subject(s)
Computational Biology , Databases, Protein , Proteins/chemistry , Proteins/metabolism , Animals , Humans , Internet , Protein Conformation , Proteins/classification , Proteome , Proteomics , Terminology as Topic
15.
Nucleic Acids Res ; 31(1): 365-70, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12520024

ABSTRACT

The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.


Subject(s)
Databases, Protein , Proteins/chemistry , Animals , Archaeal Proteins/chemistry , Bacterial Proteins/chemistry , Humans , Information Storage and Retrieval , Models, Animal , Plant Proteins/chemistry , Proteins/classification , Proteome/chemistry , Proteomics , Systems Integration , Terminology as Topic
SELECTION OF CITATIONS
SEARCH DETAIL
...