Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Genome Biol Evol ; 11(12): 3353-3371, 2019 12 01.
Article in English | MEDLINE | ID: mdl-31702783

ABSTRACT

The genus Rhododendron (Ericaceae), which includes horticulturally important plants such as azaleas, is a highly diverse and widely distributed genus of >1,000 species. Here, we report the chromosome-scale de novo assembly and genome annotation of Rhododendron williamsianum as a basis for continued study of this large genus. We created multiple short fragment genomic libraries, which were assembled using ALLPATHS-LG. This was followed by contiguity preserving transposase sequencing (CPT-seq) and fragScaff scaffolding of a large fragment library, which improved the assembly by decreasing the number of scaffolds and increasing scaffold length. Chromosome-scale scaffolding was performed by proximity-guided assembly (LACHESIS) using chromatin conformation capture (Hi-C) data. Chromosome-scale scaffolding was further refined and linkage groups defined by restriction-site associated DNA (RAD) sequencing of the parents and progeny of a genetic cross. The resulting linkage map confirmed the LACHESIS clustering and ordering of scaffolds onto chromosomes and rectified large-scale inversions. Assessments of the R. williamsianum genome assembly and gene annotation estimate them to be 89% and 79% complete, respectively. Predicted coding sequences from genome annotation were used in syntenic analyses and for generating age distributions of synonymous substitutions/site between paralgous gene pairs, which identified whole-genome duplications (WGDs) in R. williamsianum. We then analyzed other publicly available Ericaceae genomes for shared WGDs. Based on our spatial and temporal analyses of paralogous gene pairs, we find evidence for two shared, ancient WGDs in Rhododendron and Vaccinium (cranberry/blueberry) members that predate the Ericaceae family and, in one case, the Ericales order.


Subject(s)
Chromosomes, Plant/genetics , Ericaceae/genetics , Genome, Plant/genetics , Rhododendron/genetics , Synteny , Base Sequence , Chromatin/genetics , Chromosome Mapping , Genetic Linkage , Genomic Library , Molecular Sequence Annotation , Transposases/genetics
2.
Cell ; 163(3): 698-711, 2015 Oct 22.
Article in English | MEDLINE | ID: mdl-26496609

ABSTRACT

Most human transcripts are alternatively spliced, and many disease-causing mutations affect RNA splicing. Toward better modeling the sequence determinants of alternative splicing, we measured the splicing patterns of over two million (M) synthetic mini-genes, which include degenerate subsequences totaling over 100 M bases of variation. The massive size of these training data allowed us to improve upon current models of splicing, as well as to gain new mechanistic insights. Our results show that the vast majority of hexamer sequence motifs measurably influence splice site selection when positioned within alternative exons, with multiple motifs acting additively rather than cooperatively. Intriguingly, motifs that enhance (suppress) exon inclusion in alternative 5' splicing also enhance (suppress) exon inclusion in alternative 3' or cassette exon splicing, suggesting a universal mechanism for alternative exon recognition. Finally, our empirically trained models are highly predictive of the effects of naturally occurring variants on alternative splicing in vivo.


Subject(s)
Alternative Splicing , Genome, Human , Models, Genetic , Polymorphism, Single Nucleotide , Base Sequence , Humans , Molecular Sequence Data , Nucleotide Motifs , RNA Splice Sites
3.
PLoS Genet ; 10(10): e1004592, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25340400

ABSTRACT

In addition to their protein coding function, exons can also serve as transcriptional enhancers. Mutations in these exonic-enhancers (eExons) could alter both protein function and transcription. However, the functional consequence of eExon mutations is not well known. Here, using massively parallel reporter assays, we dissect the enhancer activity of three liver eExons (SORL1 exon 17, TRAF3IP2 exon 2, PPARG exon 6) at single nucleotide resolution in the mouse liver. We find that both synonymous and non-synonymous mutations have similar effects on enhancer activity and many of the deleterious mutation clusters overlap known liver-associated transcription factor binding sites. Carrying a similar massively parallel reporter assay in HeLa cells with these three eExons found differences in their mutation profiles compared to the liver, suggesting that enhancers could have distinct operating profiles in different tissues. Our results demonstrate that eExon mutations could lead to multiple phenotypes by disrupting both the protein sequence and enhancer activity and that enhancers can have distinct mutation profiles in different cell types.


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , Enhancer Elements, Genetic , Exons/genetics , Membrane Transport Proteins/genetics , PPAR gamma/genetics , Receptors, LDL/genetics , Animals , Binding Sites , Gene Expression Regulation , HeLa Cells , Humans , Liver/metabolism , Mice , Mutation, Missense , Polymorphism, Single Nucleotide , RNA Splicing/genetics , Transcription Factors/biosynthesis
4.
Nat Biotechnol ; 31(12): 1119-25, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24185095

ABSTRACT

Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of Homo sapiens and key model organisms generated by the Human Genome Project. To address this problem, we need scalable, cost-effective methods to obtain assemblies with chromosome-scale contiguity. Here we show that genome-wide chromatin interaction data sets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this finding, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving--for the human genome--98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.


Subject(s)
Algorithms , Chromatin/genetics , Chromosome Mapping/methods , Contig Mapping/methods , Sequence Analysis, DNA/methods , Animals , Base Sequence , Drosophila , Humans , Mice , Molecular Sequence Data
5.
Nat Genet ; 45(9): 1021-1028, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23892608

ABSTRACT

Despite continual progress in the cataloging of vertebrate regulatory elements, little is known about their organization and regulatory architecture. Here we describe a massively parallel experiment to systematically test the impact of copy number, spacing, combination and order of transcription factor binding sites on gene expression. A complex library of ∼5,000 synthetic regulatory elements containing patterns from 12 liver-specific transcription factor binding sites was assayed in mice and in HepG2 cells. We find that certain transcription factors act as direct drivers of gene expression in homotypic clusters of binding sites, independent of spacing between sites, whereas others function only synergistically. Heterotypic enhancers are stronger than their homotypic analogs and favor specific transcription factor binding site combinations, mimicking putative native enhancers. Exhaustive testing of binding site permutations suggests that there is flexibility in binding site order. Our findings provide quantitative support for a flexible model of regulatory element activity and suggest a framework for the design of synthetic tissue-specific enhancers.


Subject(s)
Gene Expression Regulation , Models, Biological , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Animals , Binding Sites , Cell Line , Cluster Analysis , Enhancer Elements, Genetic , Gene Amplification , Gene Dosage , Gene Expression , Genes, Reporter , Humans , Liver/metabolism , Male , Mice , Nucleotide Motifs , Organ Specificity/genetics , Protein Binding
6.
Nat Biotechnol ; 30(3): 265-70, 2012 Feb 26.
Article in English | MEDLINE | ID: mdl-22371081

ABSTRACT

The functional consequences of genetic variation in mammalian regulatory elements are poorly understood. We report the in vivo dissection of three mammalian enhancers at single-nucleotide resolution through a massively parallel reporter assay. For each enhancer, we synthesized a library of >100,000 mutant haplotypes with 2-3% divergence from the wild-type sequence. Each haplotype was linked to a unique sequence tag embedded within a transcriptional cassette. We introduced each enhancer library into mouse liver and measured the relative activities of individual haplotypes en masse by sequencing the transcribed tags. Linear regression analysis yielded highly reproducible estimates of the effect of every possible single-nucleotide change on enhancer activity. The functional consequence of most mutations was modest, with ∼22% affecting activity by >1.2-fold and ∼3% by >2-fold. Several, but not all, positions with higher effects showed evidence for purifying selection, or co-localized with known liver-associated transcription factor binding sites, demonstrating the value of empirical high-resolution functional analysis.


Subject(s)
Enhancer Elements, Genetic , Transcription Factors/genetics , Animals , Binding Sites , Evolution, Molecular , Genes, Reporter , Haplotypes , Humans , Linear Models , Liver/metabolism , Mice , Mutagenesis , Mutation , Transcription Factors/metabolism , Transcription, Genetic
7.
Science ; 331(6017): 555-61, 2011 Feb 04.
Article in English | MEDLINE | ID: mdl-21292972

ABSTRACT

We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 megabases and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than a third of Daphnia's genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes, including many additional loci within sequenced regions that are otherwise devoid of annotations, are the most responsive genes to ecological challenges.


Subject(s)
Daphnia/genetics , Ecosystem , Genome , Adaptation, Physiological , Amino Acid Sequence , Animals , Base Sequence , Chromosome Mapping , Daphnia/physiology , Environment , Evolution, Molecular , Gene Conversion , Gene Duplication , Gene Expression , Gene Expression Profiling , Gene Expression Regulation , Genes , Genes, Duplicate , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation , Molecular Sequence Data , Multigene Family , Phylogeny , Sequence Analysis, DNA
8.
Nat Biotechnol ; 29(1): 59-63, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21170042

ABSTRACT

Haplotype information is essential to the complete description and interpretation of genomes, genetic diversity and genetic ancestry. Although individual human genome sequencing is increasingly routine, nearly all such genomes are unresolved with respect to haplotype. Here we combine the throughput of massively parallel sequencing with the contiguity information provided by large-insert cloning to experimentally determine the haplotype-resolved genome of a South Asian individual. A single fosmid library was split into a modest number of pools, each providing ∼3% physical coverage of the diploid genome. Sequencing of each pool yielded reads overwhelmingly derived from only one homologous chromosome at any given location. These data were combined with whole-genome shotgun sequence to directly phase 94% of ascertained heterozygous single nucleotide polymorphisms (SNPs) into long haplotype blocks (N50 of 386 kilobases (kbp)). This method also facilitates the analysis of structural variation, for example, to anchor novel insertions to specific locations and haplotypes.


Subject(s)
Asian People/genetics , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Base Sequence , Cell Line , Heterozygote , Humans , Models, Molecular , Polymorphism, Single Nucleotide/genetics
9.
Nat Methods ; 7(2): 119-22, 2010 Feb.
Article in English | MEDLINE | ID: mdl-20081835

ABSTRACT

We demonstrate subassembly, an in vitro library construction method that extends the utility of short-read sequencing platforms to applications requiring long, accurate reads. A long DNA fragment library is converted to a population of nested sublibraries, and a tag sequence directs grouping of short reads derived from the same long fragment, enabling localized assembly of long fragment sequences. Subassembly may facilitate accurate de novo genome assembly and metagenome sequencing.


Subject(s)
Chromosome Mapping/methods , Sequence Analysis, DNA/methods , Base Sequence , Expressed Sequence Tags , Molecular Sequence Data
10.
Nat Biotechnol ; 27(12): 1173-5, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19915551

ABSTRACT

We present a method that harnesses massively parallel DNA synthesis and sequencing for the high-throughput functional analysis of regulatory sequences at single-nucleotide resolution. As a proof of concept, we quantitatively assayed the effects of all possible single-nucleotide mutations for three bacteriophage promoters and three mammalian core promoters in a single experiment per promoter. The method may also serve as a rapid screening tool for regulatory element engineering in synthetic biology.


Subject(s)
Algorithms , DNA/chemistry , DNA/genetics , Mutagenesis, Site-Directed/methods , Regulatory Elements, Transcriptional/genetics , Sequence Analysis, DNA/methods , Base Sequence , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...