Search | VHL Regional Portal

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

Remm, M; Storm, C E; Sonnhammer, E L.

J Mol Biol ; 314(5): 1041-52, 2001 Dec 14.

Article in English | MEDLINE | ID: mdl-11743721

ABSTRACT

Orthologs are genes in different species that originate from a single gene in the last common ancestor of these species. Such genes have often retained identical biological roles in the present-day organisms. It is hence important to identify orthologs for transferring functional information between genes in different organisms with a high degree of reliability. For example, orthologs of human proteins are often functionally characterized in model organisms. Unfortunately, orthology analysis between human and e.g. invertebrates is often complex because of large numbers of paralogs within protein families. Paralogs that predate the species split, which we call out-paralogs, can easily be confused with true orthologs. Paralogs that arose after the species split, which we call in-paralogs, however, are bona fide orthologs by definition. Orthologs and in-paralogs are typically detected with phylogenetic methods, but these are slow and difficult to automate. Automatic clustering methods based on two-way best genome-wide matches on the other hand, have so far not separated in-paralogs from out-paralogs effectively. We present a fully automatic method for finding orthologs and in-paralogs from two species. Ortholog clusters are seeded with a two-way best pairwise match, after which an algorithm for adding in-paralogs is applied. The method bypasses multiple alignments and phylogenetic trees, which can be slow and error-prone steps in classical ortholog detection. Still, it robustly detects complex orthologous relationships and assigns confidence values for both orthologs and in-paralogs. The program, called INPARANOID, was tested on all completely sequenced eukaryotic genomes. To assess the quality of INPARANOID results, ortholog clusters were generated from a dataset of worm and mammalian transmembrane proteins, and were compared to clusters derived by manual tree-based ortholog detection methods. This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.A WWW server that allows searching for orthologs between human and several fully sequenced genomes is installed at http://www.cgb.ki.se/inparanoid/. This is the first comprehensive resource with orthologs of all fully sequenced eukaryotic genomes. Programs and tables of orthology assignments are available from the same location.

Subject(s)

Caenorhabditis elegans/genetics , Computational Biology/methods , Drosophila melanogaster/genetics , Evolution, Molecular , Genome , Genomics/methods , Sequence Homology , Algorithms , Animals , Automation/methods , Caenorhabditis elegans Proteins/genetics , Cluster Analysis , Databases, Genetic , Drosophila Proteins/genetics , Eukaryotic Cells/metabolism , Humans , Phylogeny , Software , Species Specificity

NIFAS: visual analysis of domain evolution in proteins.

Storm, C E; Sonnhammer, E L.

Bioinformatics ; 17(4): 343-8, 2001 Apr.

Article in English | MEDLINE | ID: mdl-11301303

ABSTRACT

MOTIVATION: Multi-domain proteins have evolved by insertions or deletions of distinct protein domains. Tracing the history of a certain domain combination can be important for functional annotation of multi-domain proteins, and for understanding the function of individual domains. In order to analyze the evolutionary history of the domains in modular proteins it is desirable to inspect a phylogenetic tree based on sequence divergence with the modular architecture of the sequences superimposed on the tree. RESULT: A Java applet, NIFAS, that integrates graphical domain schematics for each sequence in an evolutionary tree was developed. NIFAS retrieves domain information from the Pfam database and uses CLUSTAL W to calculate a tree for a given Pfam domain. The tree can be displayed with symbolic bootstrap values, and to allow the user to focus on a part of the tree, the layout can be altered by swapping nodes, changing the outgroup, and showing/collapsing subtrees. NIFAS is integrated with the Pfam database and is accessible over the internet (http://www.cgr.ki.se/Pfam). As an example, we use NIFAS to analyze the evolution of domains in Protein Kinases C.

Subject(s)

Evolution, Molecular , Image Processing, Computer-Assisted , Protein Kinase C/chemistry , Protein Structure, Tertiary , Proteins/chemistry , Software , Computer Graphics , Humans , Protein Kinase C/classification , Proteins/classification

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL