Your browser doesn't support javascript.
loading
Simultaneous gene finding in multiple genomes.
König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario.
Affiliation
  • König S; Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, 17487, Germany.
  • Romoth LW; Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, 17487, Germany.
  • Gerischer L; Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, 17487, Germany.
  • Stanke M; Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, 17487, Germany.
Bioinformatics ; 32(22): 3388-3395, 2016 11 15.
Article in En | MEDLINE | ID: mdl-27466621
MOTIVATION: As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. RESULTS: The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. AVAILABILITY AND IMPLEMENTATION: The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Sequence Alignment / Genome Type of study: Diagnostic_studies / Prognostic_studies Limits: Animals / Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2016 Document type: Article Affiliation country: Germany Country of publication: United kingdom

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Sequence Alignment / Genome Type of study: Diagnostic_studies / Prognostic_studies Limits: Animals / Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2016 Document type: Article Affiliation country: Germany Country of publication: United kingdom