Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Nature ; 515(7527): 355-64, 2014 Nov 20.
Article in English | MEDLINE | ID: mdl-25409824

ABSTRACT

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.


Subject(s)
Genome/genetics , Genomics , Mice/genetics , Molecular Sequence Annotation , Animals , Cell Lineage/genetics , Chromatin/genetics , Chromatin/metabolism , Conserved Sequence/genetics , DNA Replication/genetics , Deoxyribonuclease I/metabolism , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Genome-Wide Association Study , Humans , RNA/genetics , Regulatory Sequences, Nucleic Acid/genetics , Species Specificity , Transcription Factors/metabolism , Transcriptome/genetics
2.
PLoS Genet ; 10(5): e1004342, 2014.
Article in English | MEDLINE | ID: mdl-24831947

ABSTRACT

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.


Subject(s)
Evolution, Molecular , Genome, Human , Recombination, Genetic , Selection, Genetic/genetics , Algorithms , Computer Simulation , Humans , Markov Chains , Models, Genetic , Monte Carlo Method
3.
Genome Res ; 24(3): 475-86, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24310000

ABSTRACT

Accurate gene tree-species tree reconciliation is fundamental to inferring the evolutionary history of a gene family. However, although it has long been appreciated that population-related effects such as incomplete lineage sorting (ILS) can dramatically affect the gene tree, many of the most popular reconciliation methods consider discordance only due to gene duplication and loss (and sometimes horizontal gene transfer). Methods that do model ILS are either highly parameterized or consider a restricted set of histories, thus limiting their applicability and accuracy. To address these challenges, we present a novel algorithm DLCpar for inferring a most parsimonious (MP) history of a gene family in the presence of duplications, losses, and ILS. Our algorithm relies on a new reconciliation structure, the labeled coalescent tree (LCT), that simultaneously describes coalescent and duplication-loss history. We show that the LCT representation enables an exhaustive and efficient search over the space of reconciliations, and, for most gene families, the least common ancestor (LCA) mapping is an optimal solution for the species mapping between the gene tree and species tree in an MP LCT. Applying our algorithm to a variety of clades, including flies, fungi, and primates, as well as to simulated phylogenies, we achieve high accuracy, comparable to sophisticated probabilistic reconciliation methods, at reduced run time and with far fewer parameters. These properties enable inferences of the complex evolution of gene families across a broad range of species and large data sets.


Subject(s)
Diptera/genetics , Evolution, Molecular , Fungi/genetics , Gene Deletion , Gene Duplication , Primates/genetics , Algorithms , Animals , Gene Transfer, Horizontal , Genes , Genome , Models, Genetic , Multigene Family , Phylogeny , Species Specificity
4.
Syst Biol ; 62(1): 110-20, 2013 Jan 01.
Article in English | MEDLINE | ID: mdl-22949484

ABSTRACT

Accurate gene tree reconstruction is a fundamental problem in phylogenetics, with many important applications. However, sequence data alone often lack enough information to confidently support one gene tree topology over many competing alternatives. Here, we present a novel framework for combining sequence data and species tree information, and we describe an implementation of this framework in TreeFix, a new phylogenetic program for improving gene tree reconstructions. Given a gene tree (preferably computed using a maximum-likelihood phylogenetic program), TreeFix finds a "statistically equivalent" gene tree that minimizes a species tree-based cost function. We have applied TreeFix to 2 clades of 12 Drosophila and 16 fungal genomes, as well as to simulated phylogenies and show that it dramatically improves reconstructions compared with current state-of-the-art programs. Given its accuracy, speed, and simplicity, TreeFix should be applicable to a wide range of analyses and have many important implications for future investigations of gene evolution. The source code and a sample data set are available at http://compbio.mit.edu/treefix.


Subject(s)
Classification/methods , Phylogeny , Software , Animals , Drosophila/classification , Drosophila/genetics , Fungi/classification , Fungi/genetics , Reproducibility of Results
5.
Mol Biol Evol ; 29(11): 3309-20, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22617954

ABSTRACT

The prominent role of Horizontal Gene Transfer (HGT) in the evolution of bacteria is now well documented, but few studies have differentiated between evolutionary events that predominantly cause genes in one lineage to be replaced by homologs from another lineage ("replacing HGT") and events that result in the addition of substantial new genomic material ("additive HGT"). Here in, we make use of the distinct phylogenetic signatures of replacing and additive HGTs in a genome-wide study of the important human pathogen Streptococcus pyogenes (SPY) and its close relatives S. dysgalactiae subspecies equisimilis (SDE) and S. dysgalactiae subspecies dysgalactiae (SDD). Using recently developed statistical models and computational methods, we find evidence for abundant gene flow of both kinds within each of the SPY and SDE clades and of reduced levels of exchange between SPY and SDD. In addition, our analysis strongly supports a pronounced asymmetry in SPY-SDE gene flow, favoring the SPY-to-SDE direction. This finding is of particular interest in light of the recent increase in virulence of pathogenic SDE. We find much stronger evidence for SPY-SDE gene flow among replacing than among additive transfers, suggesting a primary influence from homologous recombination between co-occurring SPY and SDE cells in human hosts. Putative virulence genes are correlated with transfer events, but this correlation is found to be driven by additive, not replacing, HGTs. The genes affected by additive HGTs are enriched for functions having to do with transposition, recombination, and DNA integration, consistent with previous findings, whereas replacing HGTs seen to influence a more diverse set of genes. Additive transfers are also found to be associated with evidence of positive selection. These findings shed new light on the manner in which HGT has shaped pathogenic bacterial genomes.


Subject(s)
Gene Transfer, Horizontal/genetics , Phylogeny , Streptococcus/genetics , Gene Duplication/genetics , Genes, Bacterial/genetics , Genes, Essential/genetics , Humans , Models, Genetic , Selection, Genetic
6.
PLoS One ; 7(2): e31730, 2012.
Article in English | MEDLINE | ID: mdl-22359624

ABSTRACT

The major facilitator superfamily (MFS) transporter Pho84 and the type III transporter Pho89 are responsible for metabolic effects of inorganic phosphate in yeast. While the Pho89 ortholog Pit1 was also shown to be involved in phosphate-activated MAPK in mammalian cells, it is currently unknown, whether orthologs of Pho84 have a role in phosphate-sensing in metazoan species. We show here that the activation of MAPK by phosphate observed in mammals is conserved in Drosophila cells, and used this assay to characterize the roles of putative phosphate transporters. Surprisingly, while we found that RNAi-mediated knockdown of the fly Pho89 ortholog dPit had little effect on the activation of MAPK in Drosophila S2R+ cells by phosphate, two Pho84/SLC17A1-9 MFS orthologs (MFS10 and MFS13) specifically inhibited this response. Further, using a Xenopus oocyte assay, we show that MSF13 mediates uptake of [(33)P]-orthophosphate in a sodium-dependent fashion. Consistent with a role in phosphate physiology, MSF13 is expressed highest in the Drosophila crop, midgut, Malpighian tubule, and hindgut. Altogether, our findings provide the first evidence that Pho84 orthologs mediate cellular effects of phosphate in metazoan cells. Finally, while phosphate is essential for Drosophila larval development, loss of MFS13 activity is compatible with viability indicating redundancy at the levels of the transporters.


Subject(s)
Drosophila Proteins/physiology , Drosophila melanogaster/metabolism , Phosphates/metabolism , Proton-Phosphate Symporters/physiology , Sodium-Phosphate Cotransporter Proteins, Type III/physiology , Animals , Drosophila Proteins/metabolism , Drosophila melanogaster/genetics , Mitogen-Activated Protein Kinases/metabolism , Proton-Phosphate Symporters/metabolism , Saccharomyces cerevisiae Proteins , Sodium-Phosphate Cotransporter Proteins, Type III/metabolism , Tissue Distribution
7.
Genome Res ; 22(4): 755-65, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22271778

ABSTRACT

Gene phylogenies provide a rich source of information about the way evolution shapes genomes, populations, and phenotypes. In addition to substitutions, evolutionary events such as gene duplication and loss (as well as horizontal transfer) play a major role in gene evolution, and many phylogenetic models have been developed in order to reconstruct and study these events. However, these models typically make the simplifying assumption that population-related effects such as incomplete lineage sorting (ILS) are negligible. While this assumption may have been reasonable in some settings, it has become increasingly problematic as increased genome sequencing has led to denser phylogenies, where effects such as ILS are more prominent. To address this challenge, we present a new probabilistic model, DLCoal, that defines gene duplication and loss in a population setting, such that coalescence and ILS can be directly addressed. Interestingly, this model implies that in addition to the usual gene tree and species tree, there exists a third tree, the locus tree, which will likely have many applications. Using this model, we develop the first general reconciliation method that accurately infers gene duplications and losses in the presence of ILS, and we show its improved inference of orthologs, paralogs, duplications, and losses for a variety of clades, including flies, fungi, and primates. Also, our simulations show that gene duplications increase the frequency of ILS, further illustrating the importance of a joint model. Going forward, we believe that this unified model can offer insights to questions in both phylogenetics and population genetics.


Subject(s)
Algorithms , Evolution, Molecular , Models, Genetic , Phylogeny , Animals , Gene Deletion , Gene Duplication , Gene Transfer, Horizontal , Genetic Loci/genetics , Genome/genetics , Humans , Models, Statistical , Mutation , Species Specificity , Yeasts/classification , Yeasts/genetics
8.
Mol Biol Evol ; 29(2): 689-705, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21900599

ABSTRACT

Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.


Subject(s)
Drosophila/genetics , Evolution, Molecular , Phylogeny , Protein Structure, Tertiary/genetics , Algorithms , Animals , Biological Evolution , Gene Duplication , Gene Fusion/genetics , RNA, Messenger/biosynthesis
9.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-21993624

ABSTRACT

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Mammals/genetics , Animals , Disease , Exons/genetics , Genomics , Health , Humans , Molecular Sequence Annotation , Phylogeny , RNA/classification , RNA/genetics , Selection, Genetic/genetics , Sequence Alignment , Sequence Analysis, DNA
10.
Mol Biol Evol ; 28(1): 273-90, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20660489

ABSTRACT

Recent sequencing and computing advances have enabled phylogenetic analyses to expand to both entire genomes and large clades, thus requiring more efficient and accurate methods designed specifically for the phylogenomic context. Here, we present SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss (DL) rates, speciation times, and correlated substitution rate variation across both species and loci. We have implemented and applied this method on two clades of fully sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies and find dramatic improvements in reconstruction accuracy as compared with the most popular existing methods, including those that take the species tree into account. We find that reconstruction inaccuracies of traditional phylogenetic methods overestimate the number of DL events by as much as 2-3-fold, whereas our method achieves significantly higher accuracy. We feel that the results and methods presented here will have many important implications for future investigations of gene evolution.


Subject(s)
Bayes Theorem , Computational Biology/methods , Models, Genetic , Phylogeny , Algorithms , Animals , Evolution, Molecular , Gene Duplication , Genes, Fungal , Genes, Insect , Humans
11.
Nature ; 459(7247): 657-62, 2009 Jun 04.
Article in English | MEDLINE | ID: mdl-19465905

ABSTRACT

Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.


Subject(s)
Candida/physiology , Candida/pathogenicity , Evolution, Molecular , Genome, Fungal/genetics , Reproduction/genetics , Candida/classification , Candida/genetics , Codon/genetics , Conserved Sequence , Diploidy , Genes, Fungal/genetics , Meiosis/genetics , Polymorphism, Genetic , Saccharomyces/classification , Saccharomyces/genetics , Virulence/genetics
12.
PLoS Comput Biol ; 4(4): e1000067, 2008 Apr 18.
Article in English | MEDLINE | ID: mdl-18421375

ABSTRACT

Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (< or =240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.


Subject(s)
Chromosome Mapping/methods , Drosophila Proteins/genetics , Drosophila/genetics , Genetic Variation/genetics , Open Reading Frames/genetics , Animals , Base Sequence , Discriminant Analysis , Drosophila/classification , Molecular Sequence Data , Reproducibility of Results , Sensitivity and Specificity , Sequence Analysis, DNA/methods , Species Specificity
13.
Nature ; 450(7167): 219-32, 2007 Nov 08.
Article in English | MEDLINE | ID: mdl-17994088

ABSTRACT

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.


Subject(s)
Drosophila/classification , Drosophila/genetics , Evolution, Molecular , Genome, Insect/genetics , Genomics , Animals , Base Sequence , Binding Sites , Conserved Sequence , Drosophila Proteins/genetics , Exons/genetics , Gene Expression Regulation/genetics , Genes, Insect/genetics , MicroRNAs/genetics , Molecular Sequence Data , Organ Specificity , Phylogeny , Untranslated Regions/genetics
14.
Genome Res ; 17(12): 1932-42, 2007 Dec.
Article in English | MEDLINE | ID: mdl-17989260

ABSTRACT

Comparative genomics provides a general methodology for discovering functional DNA elements and understanding their evolution. The availability of many related genomes enables more powerful analyses, but requires rigorous phylogenetic methods to resolve orthologous genes and regions. Here, we use 12 recently sequenced Drosophila genomes and nine fungal genomes to address the problem of accurate gene-tree reconstruction across many complete genomes. We show that existing phylogenetic methods that treat each gene tree in isolation show large-scale inaccuracies, largely due to insufficient phylogenetic information in individual genes. However, we find that gene trees exhibit common properties that can be exploited for evolutionary studies and accurate phylogenetic reconstruction. Evolutionary rates can be decoupled into gene-specific and species-specific components, which can be learned across complete genomes. We develop a phylogenetic reconstruction methodology that exploits these properties and achieves significantly higher accuracy, addressing the species-level heterotachy and enabling studies of gene evolution in the context of species evolution.


Subject(s)
Evolution, Molecular , Genome , Genomics , Phylogeny , Species Specificity , Animals , Artificial Intelligence , Drosophila/genetics , Genes, Fungal , Genes, Insect , Models, Genetic , Sequence Alignment , Synteny
15.
Plant Physiol ; 133(2): 510-6, 2003 Oct.
Article in English | MEDLINE | ID: mdl-14555780

ABSTRACT

As structural and functional genomics efforts provide the biological community with ever-broadening sets of interrelated data, the need to explore such complex information for subtle relationships expands. We present wCLUTO, a Web-enabled version of the stand-alone application CLUTO, designed to apply clustering methods to genomic information. Its first application is focused on the clustering transcriptome data from microarrays. Data can be uploaded by the user into the clustering tool, a choice of several clustering methods can be made and configured, and data are presented to the user in a variety of visual formats, including a three-dimensional "mountain" view of the clusters. Parameters can be explored to rapidly examine a variety of clustering results, and the resulting clusters can be downloaded either for manipulation by other programs or to be saved in a format for publication.


Subject(s)
Internet , Plants/genetics , Research/trends , Algorithms , Computer Simulation , Image Processing, Computer-Assisted , Research Design
SELECTION OF CITATIONS
SEARCH DETAIL
...