Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
Add more filters










Publication year range
1.
DNA Res ; 30(1)2023 Feb 01.
Article in English | MEDLINE | ID: mdl-36208288

ABSTRACT

A contiguous assembly of the inbred 'EL10' sugar beet (Beta vulgaris ssp. vulgaris) genome was constructed using PacBio long-read sequencing, BioNano optical mapping, Hi-C scaffolding, and Illumina short-read error correction. The EL10.1 assembly was 540 Mb, of which 96.2% was contained in nine chromosome-sized pseudomolecules with lengths from 52 to 65 Mb, and 31 contigs with a median size of 282 kb that remained unassembled. Gene annotation incorporating RNA-seq data and curated sequences via the MAKER annotation pipeline generated 24,255 gene models. Results indicated that the EL10.1 genome assembly is a contiguous genome assembly highly congruent with the published sugar beet reference genome. Gross duplicate gene analyses of EL10.1 revealed little large-scale intra-genome duplication. Reduced gene copy number for well-annotated gene families relative to other core eudicots was observed, especially for transcription factors. Variation in genome size in B. vulgaris was investigated by flow cytometry among 50 individuals producing estimates from 633 to 875 Mb/1C. Read-depth mapping with short-read whole-genome sequences from other sugar beet germplasm suggested that relatively few regions of the sugar beet genome appeared associated with high-copy number variation.


Subject(s)
Beta vulgaris , Humans , Beta vulgaris/genetics , DNA Copy Number Variations , Chromosomes , Molecular Sequence Annotation , Sugars
2.
Genet Sel Evol ; 54(1): 62, 2022 Sep 14.
Article in English | MEDLINE | ID: mdl-36104777

ABSTRACT

BACKGROUND: The genetic mechanisms that underlie phenotypic differentiation in breeding animals have important implications in evolutionary biology and agriculture. However, the contribution of cis-regulatory variants to pig phenotypes is poorly understood. Therefore, our aim was to elucidate the molecular mechanisms by which non-coding variants cause phenotypic differences in pigs by combining evolutionary biology analyses and functional genomics. RESULTS: We obtained a high-resolution phased chromosome-scale reference genome with a contig N50 of 18.03 Mb for the Luchuan pig breed (a representative eastern breed) and profiled potential selective sweeps in eastern and western pigs by resequencing the genomes of 234 pigs. Multi-tissue transcriptome and chromatin accessibility analyses of these regions suggest that tissue-specific selection pressure is mediated by promoters and distal cis-regulatory elements. Promoter variants that are associated with increased expression of the lysozyme (LYZ) gene in the small intestine might enhance the immunity of the gastrointestinal tract and roughage tolerance in pigs. In skeletal muscle, an enhancer-modulating single-nucleotide polymorphism that is associated with up-regulation of the expression of the troponin C1, slow skeletal and cardiac type (TNNC1) gene might increase the proportion of slow muscle fibers and affect meat quality. CONCLUSIONS: Our work sheds light on the molecular mechanisms by which non-coding variants shape phenotypic differences in pigs and provides valuable resources and novel perspectives to dissect the role of gene regulatory evolution in animal domestication and breeding.


Subject(s)
Genome , Genomics , Animals , Evolution, Molecular , Phenotype , Sequence Analysis, DNA , Swine/genetics
3.
BMC Genomics ; 23(1): 344, 2022 May 04.
Article in English | MEDLINE | ID: mdl-35508966

ABSTRACT

BACKGROUND: The gaur (Bos gaurus) is the largest extant wild bovine species, native to South and Southeast Asia, with unique traits, and is listed as vulnerable by the International Union for Conservation of Nature (IUCN). RESULTS: We report the first gaur reference genome and identify three biological pathways including lysozyme activity, proton transmembrane transporter activity, and oxygen transport with significant changes in gene copy number in gaur compared to other mammals. These may reflect adaptation to challenges related to climate and nutrition. Comparative analyses with domesticated indicine (Bos indicus) and taurine (Bos taurus) cattle revealed genomic signatures of artificial selection, including the expansion of sperm odorant receptor genes in domesticated cattle, which may have important implications for understanding selection for male fertility. CONCLUSIONS: Apart from aiding dissection of economically important traits, the gaur genome will also provide the foundation to conserve the species.


Subject(s)
Receptors, Odorant , Animals , Cattle/genetics , Genome , Genomics , Male , Mammals , Receptors, Odorant/genetics , Spermatozoa , Zona Pellucida Glycoproteins
4.
Nat Biotechnol ; 40(5): 711-719, 2022 05.
Article in English | MEDLINE | ID: mdl-34980911

ABSTRACT

Microbial communities might include distinct lineages of closely related organisms that complicate metagenomic assembly and prevent the generation of complete metagenome-assembled genomes (MAGs). Here we show that deep sequencing using long (HiFi) reads combined with Hi-C binning can address this challenge even for complex microbial communities. Using existing methods, we sequenced the sheep fecal metagenome and identified 428 MAGs with more than 90% completeness, including 44 MAGs in single circular contigs. To resolve closely related strains (lineages), we developed MAGPhase, which separates lineages of related organisms by discriminating variant haplotypes across hundreds of kilobases of genomic sequence. MAGPhase identified 220 lineage-resolved MAGs in our dataset. The ability to resolve closely related microbes in complex microbial communities improves the identification of biosynthetic gene clusters and the precision of assigning mobile genetic elements to host genomes. We identified 1,400 complete and 350 partial biosynthetic gene clusters, most of which are novel, as well as 424 (298) potential host-viral (host-plasmid) associations using Hi-C data.


Subject(s)
Metagenome , Microbiota , Animals , Feces , Metagenome/genetics , Metagenomics , Microbiota/genetics , Sequence Analysis, DNA , Sheep
5.
G3 (Bethesda) ; 12(2)2022 02 04.
Article in English | MEDLINE | ID: mdl-34897429

ABSTRACT

The zebra mussel, Dreissena polymorpha, continues to spread from its native range in Eurasia to Europe and North America, causing billions of dollars in damage and dramatically altering invaded aquatic ecosystems. Despite these impacts, there are few genomic resources for Dreissena or related bivalves. Although the D. polymorpha genome is highly repetitive, we have used a combination of long-read sequencing and Hi-C-based scaffolding to generate a high-quality chromosome-scale genome assembly. Through comparative analysis and transcriptomics experiments, we have gained insights into processes that likely control the invasive success of zebra mussels, including shell formation, synthesis of byssal threads, and thermal tolerance. We identified multiple intact steamer-like elements, a retrotransposon that has been linked to transmissible cancer in marine clams. We also found that D. polymorpha have an unusual 67 kb mitochondrial genome containing numerous tandem repeats, making it the largest observed in Eumetazoa. Together these findings create a rich resource for invasive species research and control efforts.


Subject(s)
Dreissena , Animals , Dreissena/genetics , Ecosystem , Genome , Genomics , Introduced Species
6.
Front Plant Sci ; 12: 720670, 2021.
Article in English | MEDLINE | ID: mdl-34567033

ABSTRACT

A defining component of agroforestry parklands across Sahelo-Sudanian Africa (SSA), the shea tree (Vitellaria paradoxa) is central to sustaining local livelihoods and the farming environments of rural communities. Despite its economic and cultural value, however, not to mention the ecological roles it plays as a dominant parkland species, shea remains semi-domesticated with virtually no history of systematic genetic improvement. In truth, shea's extended juvenile period makes traditional breeding approaches untenable; but the opportunity for genome-assisted breeding is immense, provided the foundational resources are available. Here we report the development and public release of such resources. Using the FALCON-Phase workflow, 162.6 Gb of long-read PacBio sequence data were assembled into a 658.7 Mbp, chromosome-scale reference genome annotated with 38,505 coding genes. Whole genome duplication (WGD) analysis based on this gene space revealed clear signatures of two ancient WGD events in shea's evolutionary past, one prior to the Astrid-Rosid divergence (116-126 Mya) and the other at the root of the order Ericales (65-90 Mya). In a first genome-wide look at the suite of fatty acid (FA) biosynthesis genes that likely govern stearin content, the primary determinant of shea butter quality, relatively high copy numbers of six key enzymes were found (KASI, KASIII, FATB, FAD2, FAD3, and FAX2), some likely originating in shea's more recent WGD event. To help translate these findings into practical tools for characterization, selection, and genome-wide association studies (GWAS), resequencing data from a shea diversity panel was used to develop a database of more than 3.5 million functionally annotated, physically anchored SNPs. Two smaller, more curated sets of suggested SNPs, one for GWAS (104,211 SNPs) and the other targeting FA biosynthesis genes (90 SNPs), are also presented. With these resources, the hope is to support national programs across the shea belt in the strategic, genome-enabled conservation and long-term improvement of the shea tree for SSA.

7.
Nat Commun ; 12(1): 1935, 2021 04 28.
Article in English | MEDLINE | ID: mdl-33911078

ABSTRACT

Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.


Subject(s)
Contig Mapping/methods , Genome, Human/genetics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Algorithms , Animals , Cattle , Haplotypes/genetics , Humans , Polymorphism, Single Nucleotide/genetics , Zebrafish/genetics
8.
G3 (Bethesda) ; 11(2)2021 02 09.
Article in English | MEDLINE | ID: mdl-33598705

ABSTRACT

Mummy berry disease, caused by the fungal pathogen Monilinia vaccinii-corymbosi (Mvc), is one of the most economically important diseases of blueberries in North America. Mvc is capable of inducing two separate blighting stages during its life cycle. Infected fruits are rendered mummified and unmarketable. Genomic data for this pathogen is lacking, but could be useful in understanding the reproductive biology of Mvc and the mechanisms it deploys to facilitate host infection. In this study, PacBio sequencing and Hi-C interaction data were utilized to create a chromosome-scale reference genome for Mvc. The genome comprises nine chromosomes with a total length of 30 Mb, an N50 length of 4.06 Mb, and an average 413X sequence coverage. A total of 9399 gene models were predicted and annotated, and BUSCO analysis revealed that 98% of 1,438 searched conserved eukaryotic genes were present in the predicted gene set. Potential effectors were identified, and the mating-type (MAT) locus was characterized. Biotrophic effectors allow the pathogen to avoid recognition by the host plant and evade or mitigate host defense responses during the early stages of fruit infection. Following locule colonization, necrotizing effectors promote the mummification of host tissues. Potential biotrophic effectors utilized by Mvc include chorismate mutase for reducing host salicylate and necrotrophic effectors include necrosis-inducing proteins and hydrolytic enzymes for macerating host tissue. The MAT locus sequences indicate the potential for homothallism in the reference genome, but a deletion allele of the MAT locus, characterized in a second isolate, indicates heterothallism. Further research is needed to verify the roles of individual effectors in virulence and to determine the role of the MAT locus in outcrossing and population genotypic diversity.


Subject(s)
Ascomycota/genetics , Blueberry Plants , Plant Diseases , Fruit , North America , Plant Diseases/microbiology
9.
Mol Ecol Resour ; 21(1): 263-286, 2021 Jan.
Article in English | MEDLINE | ID: mdl-32937018

ABSTRACT

Genome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies now enable assembling genomes at unprecedented quality and contiguity. However, the difficulty in assembling repeat-rich and GC-rich regions (genomic "dark matter") limits insights into the evolution of genome structure and regulatory networks. Here, we compare the efficiency of currently available sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter. By adopting different de novo assembly strategies, we compare individual draft assemblies to a curated multiplatform reference assembly and identify the genomic features that cause gaps within each assembly. We show that a multiplatform assembly implementing long-read, linked-read and proximity sequencing technologies performs best at recovering transposable elements, multicopy MHC genes, GC-rich microchromosomes and the repeat-rich W chromosome. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is now possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects for optimized completeness of both the coding and noncoding parts of nonmodel genomes.


Subject(s)
Genome, Plant , Genomics , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA , Strelitziaceae/genetics , DNA Transposable Elements , Genomics/methods
10.
Front Plant Sci ; 10: 1434, 2019.
Article in English | MEDLINE | ID: mdl-31798605

ABSTRACT

The genome is reprogrammed during development to produce diverse cell types, largely through altered expression and activity of key transcription factors. The accessibility and critical functions of epidermal cells have made them a model for connecting transcriptional events to development in a range of model systems. In Arabidopsis thaliana and many other plants, fertilization triggers differentiation of specialized epidermal seed coat cells that have a unique morphology caused by large extracellular deposits of polysaccharides. Here, we used DNase I-seq to generate regulatory landscapes of A. thaliana seeds at two critical time points in seed coat maturation (4 and 7 DPA), enriching for seed coat cells with the INTACT method. We found over 3,000 developmentally dynamic regulatory DNA elements and explored their relationship with nearby gene expression. The dynamic regulatory elements were enriched for motifs for several transcription factors families; most notably the TCP family at the earlier time point and the MYB family at the later one. To assess the extent to which the observed regulatory sites in seeds added to previously known regulatory sites in A. thaliana, we compared our data to 11 other data sets generated with 7-day-old seedlings for diverse tissues and conditions. Surprisingly, over a quarter of the regulatory, i.e. accessible, bases observed in seeds were novel. Notably, plant regulatory landscapes from different tissues, cell types, or developmental stages were more dynamic than those generated from bulk tissue in response to environmental perturbations, highlighting the importance of extending studies of regulatory DNA to single tissues and cell types during development.

11.
Commun Biol ; 2: 357, 2019.
Article in English | MEDLINE | ID: mdl-31583288

ABSTRACT

Multispecies host-parasite evolution is common, but how parasites evolve after speciating remains poorly understood. Shared evolutionary history and physiology may propel species along similar evolutionary trajectories whereas pursuing different strategies can reduce competition. We test these scenarios in the economically important association between honey bees and ectoparasitic mites by sequencing the genomes of the sister mite species Varroa destructor and Varroa jacobsoni. These genomes were closely related, with 99.7% sequence identity. Among the 9,628 orthologous genes, 4.8% showed signs of positive selection in at least one species. Divergent selective trajectories were discovered in conserved chemosensory gene families (IGR, SNMP), and Halloween genes (CYP) involved in moulting and reproduction. However, there was little overlap in these gene sets and associated GO terms, indicating different selective regimes operating on each of the parasites. Based on our findings, we suggest that species-specific strategies may be needed to combat evolving parasite communities.


Subject(s)
Bees/parasitology , Evolution, Molecular , Varroidae/genetics , Animals , Cytochrome P-450 Enzyme System/genetics , DNA, Mitochondrial , Female , Host-Parasite Interactions , Male , Species Specificity
12.
Genome Biol ; 20(1): 153, 2019 08 02.
Article in English | MEDLINE | ID: mdl-31375138

ABSTRACT

We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.


Subject(s)
Drug Resistance, Microbial/genetics , Metagenomics/methods , Microbiota/genetics , Sequence Analysis, DNA/methods , Viruses/genetics , Animals , Cattle , Clustered Regularly Interspaced Short Palindromic Repeats , Gene Transfer, Horizontal , Genes, Microbial , Open Reading Frames , Prophages/genetics , Rumen/microbiology , Rumen/virology , Viruses/isolation & purification
13.
ISME J ; 13(10): 2437-2446, 2019 10.
Article in English | MEDLINE | ID: mdl-31147603

ABSTRACT

The rapid spread of antibiotic resistance among bacterial pathogens is a serious human health threat. While a range of environments have been identified as reservoirs of antibiotic resistance genes (ARGs), we lack understanding of the origins of these ARGs and their spread from environment to clinic. This is partly due to our inability to identify the natural bacterial hosts of ARGs and the mobile genetic elements that mediate this spread, such as plasmids and integrons. Here we demonstrate that the in vivo proximity-ligation method Hi-C can reconstruct a known plasmid-host association from a wastewater community, and identify the in situ host range of ARGs, plasmids, and integrons by physically linking them to their host chromosomes. Hi-C detected both previously known and novel associations between ARGs, mobile genetic elements and host genomes, thus validating this method. We showed that IncQ plasmids and class 1 integrons had the broadest host range in this wastewater, and identified bacteria belonging to Moraxellaceae, Bacteroides, and Prevotella, and especially Aeromonadaceae as the most likely reservoirs of ARGs in this community. A better identification of the natural carriers of ARGs will aid the development of strategies to limit resistance spread to pathogens.


Subject(s)
Anti-Bacterial Agents/pharmacology , Bacteria/drug effects , Bacteria/genetics , Drug Resistance, Bacterial , Plasmids/genetics , Bacteria/classification , Bacteria/isolation & purification , Bacterial Infections/microbiology , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Humans , Integrons , Microbiota/drug effects , Phylogeny , Plasmids/metabolism , Wastewater/microbiology
14.
J Am Soc Nephrol ; 30(3): 421-441, 2019 Mar.
Article in English | MEDLINE | ID: mdl-30760496

ABSTRACT

BACKGROUND: Linking genetic risk loci identified by genome-wide association studies (GWAS) to their causal genes remains a major challenge. Disease-associated genetic variants are concentrated in regions containing regulatory DNA elements, such as promoters and enhancers. Although researchers have previously published DNA maps of these regulatory regions for kidney tubule cells and glomerular endothelial cells, maps for podocytes and mesangial cells have not been available. METHODS: We generated regulatory DNA maps (DNase-seq) and paired gene expression profiles (RNA-seq) from primary outgrowth cultures of human glomeruli that were composed mainly of podocytes and mesangial cells. We generated similar datasets from renal cortex cultures, to compare with those of the glomerular cultures. Because regulatory DNA elements can act on target genes across large genomic distances, we also generated a chromatin conformation map from freshly isolated human glomeruli. RESULTS: We identified thousands of unique regulatory DNA elements, many located close to transcription factor genes, which the glomerular and cortex samples expressed at different levels. We found that genetic variants associated with kidney diseases (GWAS) and kidney expression quantitative trait loci were enriched in regulatory DNA regions. By combining GWAS, epigenomic, and chromatin conformation data, we functionally annotated 46 kidney disease genes. CONCLUSIONS: We demonstrate a powerful approach to functionally connect kidney disease-/trait-associated loci to their target genes by leveraging unique regulatory DNA maps and integrated epigenomic and genetic analysis. This process can be applied to other kidney cell types and will enhance our understanding of genome regulation and its effects on gene expression in kidney disease.

15.
BMC Genomics ; 20(1): 120, 2019 Feb 07.
Article in English | MEDLINE | ID: mdl-30732559

ABSTRACT

BACKGROUND: Genes involved in production of secondary metabolites (SMs) in fungi are exceptionally diverse. Even strains of the same species may exhibit differences in metabolite production, a finding that has important implications for drug discovery. Unlike in other eukaryotes, genes producing SMs are often clustered and co-expressed in fungal genomes, but the genetic mechanisms involved in the creation and maintenance of these secondary metabolite biosynthetic gene clusters (SMBGCs) remains poorly understood. RESULTS: In order to address the role of genome architecture and chromosome scale structural variation in generating diversity of SMBGCs, we generated chromosome scale assemblies of six geographically diverse isolates of the insect pathogenic fungus Tolypocladium inflatum, producer of the multi-billion dollar lifesaving immunosuppressant drug cyclosporin, and utilized a Hi-C chromosome conformation capture approach to address the role of genome architecture and structural variation in generating intraspecific diversity in SMBGCs. Our results demonstrate that the exchange of DNA between heterologous chromosomes plays an important role in generating novelty in SMBGCs in fungi. In particular, we demonstrate movement of a polyketide synthase (PKS) and several adjacent genes by translocation to a new chromosome and genomic context, potentially generating a novel PKS cluster. We also provide evidence for inter-chromosomal recombination between nonribosomal peptide synthetases located within subtelomeres and uncover a polymorphic cluster present in only two strains that is closely related to the cluster responsible for biosynthesis of the mycotoxin aflatoxin (AF), a highly carcinogenic compound that is a major public health concern worldwide. In contrast, the cyclosporin cluster, located internally on chromosomes, was conserved across strains, suggesting selective maintenance of this important virulence factor for infection of insects. CONCLUSIONS: This research places the evolution of SMBGCs within the context of whole genome evolution and suggests a role for recombination between chromosomes in generating novel SMBGCs in the medicinal fungus Tolypocladium inflatum.


Subject(s)
Chromosomes, Fungal/genetics , Cyclosporine/metabolism , Gene Rearrangement , Genetic Variation , Hypocreales/genetics , Hypocreales/metabolism , Secondary Metabolism/genetics , Chromosome Duplication , Evolution, Molecular , Genome, Fungal/genetics , Multigene Family/genetics , Recombination, Genetic , Species Specificity
16.
Front Microbiol ; 10: 2986, 2019.
Article in English | MEDLINE | ID: mdl-32038514

ABSTRACT

Polyurethanes (PU) are the sixth most produced plastics with around 18-million tons in 2016, but since they are not recyclable, they are burned or landfilled, generating damage to human health and ecosystems. To elucidate the mechanisms that landfill microbial communities perform to attack recalcitrant PU plastics, we studied the degradative activity of a mixed microbial culture, selected from a municipal landfill by its capability to grow in a water PU dispersion (WPUD) as the only carbon source, as a model for the BP8 landfill microbial community. The WPUD contains a polyether-polyurethane-acrylate (PE-PU-A) copolymer and xenobiotic additives (N-methylpyrrolidone, isopropanol and glycol ethers). To identify the changes that the BP8 microbial community culture generates to the WPUD additives and copolymer, we performed chemical and physical analyses of the biodegradation process during 25 days of cultivation. These analyses included Nuclear magnetic resonance, Fourier transform infrared spectroscopy, Thermogravimetry, Differential scanning calorimetry, Gel permeation chromatography, and Gas chromatography coupled to mass spectrometry techniques. Moreover, for revealing the BP8 community structure and its genetically encoded potential biodegradative capability we also performed a proximity ligation-based metagenomic analysis. The additives present in the WPUD were consumed early whereas the copolymer was cleaved throughout the 25-days of incubation. The analysis of the biodegradation process and the identified biodegradation products showed that BP8 cleaves esters, C-C, and the recalcitrant aromatic urethanes and ether groups by hydrolytic and oxidative mechanisms, both in the soft and the hard segments of the copolymer. The proximity ligation-based metagenomic analysis allowed the reconstruction of five genomes, three of them from novel species. In the metagenome, genes encoding known enzymes, and putative enzymes and metabolic pathways accounting for the biodegradative activity of the BP8 community over the additives and PE-PU-A copolymer were identified. This is the first study revealing the genetically encoded potential biodegradative capability of a microbial community selected from a landfill, that thrives within a WPUD system and shows potential for bioremediation of polyurethane- and xenobiotic additives-contamitated sites.

17.
Gigascience ; 7(8)2018 08 01.
Article in English | MEDLINE | ID: mdl-30107523

ABSTRACT

Background: The fragmented nature of most draft plant genomes has hindered downstream gene discovery, trait mapping for breeding, and other functional genomics applications. There is a pressing need to improve or finish draft plant genome assemblies. Findings: Here, we present a chromosome-scale assembly of the black raspberry genome using single-molecule real-time Pacific Biosciences sequencing and high-throughput chromatin conformation capture (Hi-C) genome scaffolding. The updated V3 assembly has a contig N50 of 5.1 Mb, representing an ∼200-fold improvement over the previous Illumina-based version. Each of the 235 contigs was anchored and oriented into seven chromosomes, correcting several major misassemblies. Black raspberry V3 contains 47 Mb of new sequences including large pericentromeric regions and thousands of previously unannotated protein-coding genes. Among the new genes are hundreds of expanded tandem gene arrays that were collapsed in the Illumina-based assembly. Detailed comparative genomics with the high-quality V4 woodland strawberry genome (Fragaria vesca) revealed near-perfect 1:1 synteny with dramatic divergence in tandem gene array composition. Lineage-specific tandem gene arrays in black raspberry are related to agronomic traits such as disease resistance and secondary metabolite biosynthesis. Conclusions: The improved resolution of tandem gene arrays highlights the need to reassemble these highly complex and biologically important regions in draft plant genomes. The updated, high-quality black raspberry reference genome will be useful for comparative genomics across the horticulturally important Rosaceae family and enable the development of marker assisted breeding in Rubus.


Subject(s)
Genome, Plant , Rubus/genetics , Sequence Analysis, DNA , Chromosomes, Plant , Genomics
18.
Hortic Res ; 5: 8, 2018.
Article in English | MEDLINE | ID: mdl-29423238

ABSTRACT

Black raspberry (Rubus occidentalis L.) is a niche fruit crop valued for its flavor and potential health benefits. The improvement of fruit and cane characteristics via molecular breeding technologies has been hindered by the lack of a high-quality reference genome. The recently released draft genome for black raspberry (ORUS 4115-3) lacks assembly of scaffolds to chromosome scale. We used high-throughput chromatin conformation capture (Hi-C) and Proximity-Guided Assembly (PGA) to cluster and order 9650 out of 11,936 contigs of this draft genome assembly into seven pseudo-chromosomes. The seven pseudo-chromosomes cover ~97.2% of the total contig length (~223.8 Mb). Locating existing genetic markers on the physical map resolved multiple discrepancies in marker order on the genetic map. Centromeric regions were inferred from recombination frequencies of genetic markers, alignment of 303 bp centromeric sequence with the PGA, and heat map showing the physical contact matrix over the entire genome. We demonstrate a high degree of synteny between each of the seven chromosomes of black raspberry and a high-quality reference genome for strawberry (Fragaria vesca L.) assembled using only PacBio long-read sequences. We conclude that PGA is a cost-effective and rapid method of generating chromosome-scale assemblies from Illumina short-read sequencing data.

19.
J Hered ; 108(6): 693-700, 2017 Sep 01.
Article in English | MEDLINE | ID: mdl-28821183

ABSTRACT

Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based proximity-guided assembly (PGA) to perform a de novo genome assembly from relatively short contigs. Using Hi-C based PGA, we generated complete chromosome assemblies from a distribution of short contigs (20-100 kb). We found that 96.40% of contigs were correctly assigned to linkage groups (LGs), with ordering nearly identical to the previous genome assembly. Using available bacterial artificial chromosome (BAC) end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the 2 assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (~21.7 Mb) contigs to LGs. Together, our results highlight the potential of the Hi-C based PGA method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods.


Subject(s)
Chromosome Mapping/methods , Smegmamorpha/genetics , Alaska , Animals , Chromosomes, Artificial, Bacterial , Contig Mapping , Genomics , Male , Sequence Analysis, DNA/methods
20.
Nat Genet ; 49(4): 643-650, 2017 Apr.
Article in English | MEDLINE | ID: mdl-28263316

ABSTRACT

The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus) based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced what is, to our knowledge, the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ∼400-fold improvement in continuity due to properly assembled gaps, compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex yet produced for an individual of a ruminant species.


Subject(s)
Chromatin/genetics , Genome/genetics , Goats/genetics , Animals , Chromosomes/genetics , High-Throughput Nucleotide Sequencing/methods , Repetitive Sequences, Nucleic Acid/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...