Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
2.
Genome Biol ; 25(1): 91, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38589937

ABSTRACT

BACKGROUND: Although sequencing technologies have boosted the measurement of the genomic diversity of plant crops, it remains challenging to accurately genotype millions of genetic variants, especially structural variations, with only short reads. In recent years, many graph-based variation genotyping methods have been developed to address this issue and tested for human genomes. However, their performance in plant genomes remains largely elusive. Furthermore, pipelines integrating the advantages of current genotyping methods might be required, considering the different complexity of plant genomes. RESULTS: Here we comprehensively evaluate eight such genotypers in different scenarios in terms of variant type and size, sequencing parameters, genomic context, and complexity, as well as graph size, using both simulated and real data sets from representative plant genomes. Our evaluation reveals that there are still great challenges to applying existing methods to plants, such as excessive repeats and variants or high resource consumption. Therefore, we propose a pipeline called Ensemble Variant Genotyper (EVG) that can achieve better genotyping performance in almost all experimental scenarios and comparably higher genotyping recall and precision even using 5× reads. Furthermore, we demonstrate that EVG is more robust with an increasing number of graphed genomes, especially for insertions and deletions. CONCLUSIONS: Our study will provide new insights into the development and application of graph-based genotyping algorithms. We conclude that EVG provides an accurate, unbiased, and cost-effective way for genotyping both small and large variations and will be potentially used in population-scale genotyping for large, repetitive, and heterozygous plant genomes.


Subject(s)
Algorithms , Benchmarking , Humans , Genotype , Genomics/methods , Genotyping Techniques/methods , Genome, Plant , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods
3.
Hortic Res ; 10(12): uhad241, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38156287

ABSTRACT

Tree peony belongs to one of the Saxifragales families, Paeoniaceae. It is one of the most famous ornamental plants, and is also a promising woody oil plant. Although two Paeoniaceae genomes have been released, their assembly qualities are still to be improved. Additionally, more genomes from wild peonies are needed to accelerate genomic-assisted breeding. Here we assemble a high-quality and chromosome-scale 10.3-Gb genome of a wild Tibetan tree peony, Paeonia ludlowii, which features substantial sequence divergence, including around 75% specific sequences and gene-level differentials compared with other peony genomes. Our phylogenetic analyses suggest that Saxifragales and Vitales are sister taxa and, together with rosids, they are the sister taxon to asterids. The P. ludlowii genome is characterized by frequent chromosome reductions, centromere rearrangements, broadly distributed heterochromatin, and recent continuous bursts of transposable element (TE) movement in peony, although it lacks recent whole-genome duplication. These recent TE bursts appeared during the uplift and glacial period of the Qinghai-Tibet Plateau, perhaps contributing to adaptation to rapid climate changes. Further integrated analyses with methylome data revealed that genome expansion in peony might be dynamically affected by complex interactions among TE proliferation, TE removal, and DNA methylation silencing. Such interactions also impact numerous recently duplicated genes, particularly those related to oil biosynthesis and flower traits. This genome resource will not only provide the genomic basis for tree peony breeding but also shed light on the study of the evolution of huge genome structures as well as their protein-coding genes.

4.
Nat Genet ; 55(11): 1964-1975, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37783780

ABSTRACT

The orange subfamily (Aurantioideae) contains several Citrus species cultivated worldwide, such as sweet orange and lemon. The origin of Citrus species has long been debated and less is known about the Aurantioideae. Here, we compiled the genome sequences of 314 accessions, de novo assembled the genomes of 12 species and constructed a graph-based pangenome for Aurantioideae. Our analysis indicates that the ancient Indian Plate is the ancestral area for Citrus-related genera and that South Central China is the primary center of origin of the Citrus genus. We found substantial variations in the sequence and expression of the PH4 gene in Citrus relative to Citrus-related genera. Gene editing and biochemical experiments demonstrate a central role for PH4 in the accumulation of citric acid in citrus fruits. This study provides insights into the origin and evolution of the orange subfamily and a regulatory mechanism underpinning the evolution of fruit taste.


Subject(s)
Citrus sinensis , Citrus , Citrus/genetics , Citrus/metabolism , Citrus sinensis/genetics , Citrus sinensis/metabolism , Citric Acid/metabolism , Fruit/genetics , China
5.
Nat Genet ; 54(3): 342-348, 2022 03.
Article in English | MEDLINE | ID: mdl-35241824

ABSTRACT

Potato is the most widely produced tuber crop worldwide. However, reconstructing the four haplotypes of its autotetraploid genome remained an unsolved challenge. Here, we report the 3.1 Gb haplotype-resolved (at 99.6% precision), chromosome-scale assembly of the potato cultivar 'Otava' based on high-quality long reads, single-cell sequencing of 717 pollen genomes and Hi-C data. Unexpectedly, ~50% of the genome was identical-by-descent due to recent inbreeding, which was contrasted by highly abundant structural rearrangements involving ~20% of the genome. Among 38,214 genes, only 54% were present in all four haplotypes with an average of 3.2 copies per gene. Taking the leaf transcriptome as an example, 11% of the genes were differently expressed in at least one haplotype, where 25% of them were likely regulated through allele-specific DNA methylation. Our work sheds light on the recent breeding history of potato, the functional organization of its tetraploid genome and has the potential to strengthen the future of genomics-assisted breeding.


Subject(s)
Solanum tuberosum , Tetraploidy , Alleles , Chromosomes , Haplotypes/genetics , Plant Breeding , Solanum tuberosum/genetics
6.
Proc Natl Acad Sci U S A ; 118(39)2021 09 28.
Article in English | MEDLINE | ID: mdl-34548402

ABSTRACT

The timing of reproduction is an adaptive trait in many organisms. In plants, the timing, duration, and intensity of flowering differ between annual and perennial species. To identify interspecies variation in these traits, we studied introgression lines derived from hybridization of annual and perennial species, Arabis montbretiana and Arabis alpina, respectively. Recombination mapping identified two tandem A. montbretiana genes encoding MADS-domain transcription factors that confer extreme late flowering on A. alpina These genes are related to the MADS AFFECTING FLOWERING (MAF) cluster of floral repressors of other Brassicaceae species and were named A. montbretiana (Am) MAF-RELATED (MAR) genes. AmMAR1 but not AmMAR2 prevented floral induction at the shoot apex of A. alpina, strongly enhancing the effect of the MAF cluster, and MAR1 is absent from the genomes of all A. alpina accessions analyzed. Exposure of plants to cold (vernalization) represses AmMAR1 transcription and overcomes its inhibition of flowering. Assembly of the tandem arrays of MAR and MAF genes of six A. alpina accessions and three related species using PacBio long-sequence reads demonstrated that the MARs arose within the Arabis genus by interchromosomal transposition of a MAF1-like gene followed by tandem duplication. Time-resolved comparative RNA-sequencing (RNA-seq) suggested that AmMAR1 may be retained in A. montbretiana to enhance the effect of the AmMAF cluster and extend the duration of vernalization required for flowering. Our results demonstrate that MAF genes transposed independently in different Brassicaceae lineages and suggest that they were retained to modulate adaptive flowering responses that differ even among closely related species.


Subject(s)
Arabis/metabolism , Flowers/metabolism , Gene Duplication , Gene Expression Regulation, Plant , MADS Domain Proteins/metabolism , Phenotype , Plant Proteins/metabolism , Arabis/genetics , Arabis/growth & development , Flowers/genetics , Flowers/growth & development , MADS Domain Proteins/genetics , Plant Proteins/genetics
7.
Mol Biol Evol ; 38(4): 1225-1240, 2021 04 13.
Article in English | MEDLINE | ID: mdl-33247726

ABSTRACT

Although gene duplications provide genetic backup and allow genomic changes under relaxed selection, they may potentially limit gene flow. When different copies of a duplicated gene are pseudofunctionalized in different genotypes, genetic incompatibilities can arise in their hybrid offspring. Although such cases have been reported after manual crosses, it remains unclear whether they occur in nature and how they affect natural populations. Here, we identified four duplicated-gene based incompatibilities including one previously not reported within an artificial Arabidopsis intercross population. Unexpectedly, however, for each of the genetic incompatibilities we also identified the incompatible alleles in natural populations based on the genomes of 1,135 Arabidopsis accessions published by the 1001 Genomes Project. Using the presence of incompatible allele combinations as phenotypes for GWAS, we mapped genomic regions that included additional gene copies which likely rescue the genetic incompatibility. Reconstructing the geographic origins and evolutionary trajectories of the individual alleles suggested that incompatible alleles frequently coexist, even in geographically closed regions, and that their effects can be overcome by additional gene copies collectively shaping the evolutionary dynamics of duplicated genes during population history.


Subject(s)
Arabidopsis/genetics , Gene Duplication , Reproductive Isolation , Alleles , Phylogeography
8.
Genome Biol ; 21(1): 306, 2020 12 29.
Article in English | MEDLINE | ID: mdl-33372615

ABSTRACT

Generating chromosome-level, haplotype-resolved assemblies of heterozygous genomes remains challenging. To address this, we developed gamete binning, a method based on single-cell sequencing of haploid gametes enabling separation of the whole-genome sequencing reads into haplotype-specific reads sets. After assembling the reads of each haplotype, the contigs are scaffolded to chromosome level using a genetic map derived from the gametes. We assemble the two genomes of a diploid apricot tree based on whole-genome sequencing of 445 individual pollen grains. The two haplotype assemblies (N50: 25.5 and 25.8 Mb) feature a haplotyping precision of greater than 99% and are accurately scaffolded to chromosome-level.


Subject(s)
Chromosomes , Genome , Germ Cells , Haplotypes , High-Throughput Nucleotide Sequencing/methods , Diploidy , Genome Size , Haploidy , Heterozygote , Plant Shoots , Pollen/genetics , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Spain , Whole Genome Sequencing
9.
Nat Commun ; 11(1): 989, 2020 02 20.
Article in English | MEDLINE | ID: mdl-32080174

ABSTRACT

Despite hundreds of sequenced Arabidopsis genomes, very little is known about the degree of genomic collinearity within single species, due to the low number of chromosome-level assemblies. Here, we report chromosome-level reference-quality assemblies of seven Arabidopsis thaliana accessions selected across its global range. Each genome reveals between 13-17 Mb rearranged, and 5-6 Mb non-reference sequences introducing copy-number changes in ~5000 genes, including ~1900 non-reference genes. Quantifying the collinearity between the genomes reveals ~350 euchromatic regions, where accession-specific tandem duplications destroy the collinearity between the genomes. These hotspots of rearrangements are characterized by reduced meiotic recombination in hybrids and genes implicated in biotic stress response. This suggests that hotspots of rearrangements undergo altered evolutionary dynamics, as compared to the rest of the genome, which are mostly based on the accumulation of new mutations and not on the recombination of existing variation, and thereby enable a quick response to the biotic stress.


Subject(s)
Arabidopsis/genetics , Evolution, Molecular , Gene Rearrangement , Genome, Plant , Chromosomes, Plant/genetics , Gene Dosage , Stress, Physiological/genetics , Synteny
10.
Genome Biol ; 20(1): 277, 2019 12 16.
Article in English | MEDLINE | ID: mdl-31842948

ABSTRACT

Genomic differences range from single nucleotide differences to complex structural variations. Current methods typically annotate sequence differences ranging from SNPs to large indels accurately but do not unravel the full complexity of structural rearrangements, including inversions, translocations, and duplications, where highly similar sequence changes in location, orientation, or copy number. Here, we present SyRI, a pairwise whole-genome comparison tool for chromosome-level assemblies. SyRI starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearranged regions. This distinction is important as rearranged regions are inherited differently compared to syntenic regions.


Subject(s)
Gene Rearrangement , Genetic Techniques , Genomics/methods , Animals , Arabidopsis , Humans , Software , Synteny
11.
Mol Ecol ; 28(17): 3887-3901, 2019 09.
Article in English | MEDLINE | ID: mdl-31338892

ABSTRACT

Achieving high intraspecific genetic diversity is a critical goal in ecological restoration as it increases the adaptive potential and long-term resilience of populations. Thus, we investigated genetic diversity within and between pristine sites in a fossil floodplain and compared it to sites restored by hay transfer between 1997 and 2014. RAD-seq genotyping revealed that the stenoecious floodplain species Arabis nemorensis is co-occurring with individuals that, based on ploidy, ITS-sequencing and morphology, probably belong to the close relative Arabis sagittata, which has a documented preference for dry calcareous grasslands but has not been reported in floodplain meadows. We show that hay transfer maintains genetic diversity for both species. Additionally, in A. sagittata, transfer from multiple genetically isolated pristine sites resulted in restored sites with increased diversity and admixed local genotypes. In A. nemorensis, transfer did not create novel admixture dynamics because genetic diversity between pristine sites was less differentiated. Thus, the effects of hay transfer on genetic diversity also depend on the genetic make-up of the donor communities of each species, especially when local material is mixed. Our results demonstrate the efficiency of hay transfer for habitat restoration and emphasize the importance of prerestoration characterization of microgeographic patterns of intraspecific diversity of the community to guarantee that restoration practices reach their goal, that is maximize the adaptive potential of the entire restored plant community. Overlooking these patterns may alter the balance between species in the community. Additionally, our comparison of summary statistics obtained from de novo- and reference-based RAD-seq pipelines shows that the genomic impact of restoration can be reliably monitored in species lacking prior genomic knowledge.


Subject(s)
Arabis/genetics , Conservation of Natural Resources , Ecosystem , Restriction Mapping , Sequence Analysis, DNA , Genetic Variation , Genetics, Population , Hybridization, Genetic , Recombination, Genetic/genetics , Species Specificity
12.
Nat Plants ; 5(8): 846-855, 2019 08.
Article in English | MEDLINE | ID: mdl-31358959

ABSTRACT

Comparative genomics can unravel the genetic basis of species differences; however, successful reports on quantitative traits are still scarce. Here we present genome assemblies of 31 so-far unassembled Brassicaceae plant species and combine them with 16 previously published assemblies to establish the Brassicaceae Diversity Panel. Using a new interspecies association strategy for quantitative traits, we found a so-far unknown association between the unexpectedly high variation in CG to TG substitution rates in genes and the absence of CHROMOMETHYLASE3 (CMT3) orthologues. Low substitution rates were associated with the loss of CMT3, while species with conserved CMT3 orthologues showed high substitution rates. Species without CMT3 also lacked gene-body methylation (gbM), suggesting an evolutionary trade-off between the unknown function of gbM and low substitution rates in Brassicaceae, possibly due to low mutability of non-methylated cytosines.


Subject(s)
Brassicaceae/genetics , Genome, Plant , Nucleotides/genetics , Brassicaceae/classification , Brassicaceae/metabolism , Chromosome Mapping , Cytosine , Genetic Association Studies , Genomics , Guanine , Methylation , Phylogeny , Quantitative Trait Loci , Thymine
13.
Genome Res ; 27(5): 778-786, 2017 05.
Article in English | MEDLINE | ID: mdl-28159771

ABSTRACT

Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes; however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated Pacific Biosciences (PacBio) long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromosome conformation capture data for genome scaffolding. Despite their technical differences, optical mapping and chromosome conformation capture performed similarly and doubled N50 values. After improving both integration methods, assembly contiguity reached chromosome-arm-levels. We rigorously assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map information. This showed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome. Most, but not all, of these misjoints were removed during the integration of the optical mapping and chromosome conformation capture data. Even though none of the centromeres were fully assembled, the scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions, which are not present in gold standard reference sequences.


Subject(s)
Chromosomes, Plant/chemistry , Contig Mapping/methods , Genome, Plant , Genomics/methods , Software , Arabidopsis/genetics , Chromosomes, Plant/genetics , Contig Mapping/standards , Genomics/standards
14.
Curr Opin Plant Biol ; 36: 64-70, 2017 04.
Article in English | MEDLINE | ID: mdl-28231512

ABSTRACT

Since the introduction of next generation sequencing, plant genome assembly projects do not need to rely on dedicated research facilities or community-wide consortia anymore, even individual research groups can sequence and assemble the genomes they are interested in. However, such assemblies are typically not based on the entire breadth of genomic technologies including genetic and physical maps and their contiguities tend to be low compared to the full-length gold standard reference sequences. Recently emerging third generation genomic technologies like long-read sequencing or optical mapping promise to bridge this quality gap and enable simple and cost-effective solutions for chromosomal-level assemblies.


Subject(s)
Genome, Plant , Genomics/methods , Animals , Heterozygote , Polyploidy , Sequence Analysis, DNA
15.
Sci Data ; 3: 160076, 2016 Sep 13.
Article in English | MEDLINE | ID: mdl-27622467

ABSTRACT

Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63) Gb of raw sequence data from 166 (ZS97) and 209 (MH63) pools of BAC clones, and generated ~97 (ZS97) and ~74 (MH63) Gb of paired-end whole-genome shotgun (WGS) sequence data with Illumina sequencing technology. With these data, we successfully assembled two platinum standard reference genomes that have been publicly released. Here we provide the full sets of raw data used to generate these two reference genome assemblies. These data sets can be used to test new programs for better genome assembly and annotation, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.


Subject(s)
Genome , Oryza/genetics
16.
Proc Natl Acad Sci U S A ; 113(35): E5163-71, 2016 08 30.
Article in English | MEDLINE | ID: mdl-27535938

ABSTRACT

Asian cultivated rice consists of two subspecies: Oryza sativa subsp. indica and O. sativa subsp. japonica Despite the fact that indica rice accounts for over 70% of total rice production worldwide and is genetically much more diverse, a high-quality reference genome for indica rice has yet to be published. We conducted map-based sequencing of two indica rice lines, Zhenshan 97 (ZS97) and Minghui 63 (MH63), which represent the two major varietal groups of the indica subspecies and are the parents of an elite Chinese hybrid. The genome sequences were assembled into 237 (ZS97) and 181 (MH63) contigs, with an accuracy >99.99%, and covered 90.6% and 93.2% of their estimated genome sizes. Comparative analyses of these two indica genomes uncovered surprising structural differences, especially with respect to inversions, translocations, presence/absence variations, and segmental duplications. Approximately 42% of nontransposable element related genes were identical between the two genomes. Transcriptome analysis of three tissues showed that 1,059-2,217 more genes were expressed in the hybrid than in the parents and that the expressed genes in the hybrid were much more diverse due to their divergence between the parental genomes. The public availability of two high-quality reference genomes for the indica subspecies of rice will have large-ranging implications for plant biology and crop genetic improvement.


Subject(s)
Chromosomes, Plant/genetics , Genetic Variation , Genome, Plant/genetics , Oryza/genetics , Chromosome Mapping/methods , Gene Expression Profiling , Gene Expression Regulation, Plant , Genes, Plant/genetics , INDEL Mutation , Oryza/classification , Polymorphism, Single Nucleotide , Species Specificity
17.
Proc Natl Acad Sci U S A ; 113(28): E4052-60, 2016 07 12.
Article in English | MEDLINE | ID: mdl-27354520

ABSTRACT

Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.


Subject(s)
Arabidopsis/genetics , Chromosome Inversion , Chromosomes, Plant , Genomic Structural Variation , Translocation, Genetic , Gene Dosage , Genome, Plant , Haplotypes , Karyotyping
18.
Science ; 350(6267): 1521-4, 2015 Dec 18.
Article in English | MEDLINE | ID: mdl-26680197

ABSTRACT

In terrestrial ecosystems, plants take up phosphate predominantly via association with arbuscular mycorrhizal fungi (AMF). We identified loss of responsiveness to AMF in the rice (Oryza sativa) mutant hebiba, reflected by the absence of physical contact and of characteristic transcriptional responses to fungal signals. Among the 26 genes deleted in hebiba, DWARF 14 LIKE is, the one responsible for loss of symbiosis . It encodes an alpha/beta-fold hydrolase, that is a component of an intracellular receptor complex involved in the detection of the smoke compound karrikin. Our finding reveals an unexpected plant recognition strategy for AMF and a previously unknown signaling link between symbiosis and plant development.


Subject(s)
Furans/metabolism , Hydrolases/metabolism , Mycorrhizae/physiology , Oryza/enzymology , Oryza/microbiology , Plant Proteins/metabolism , Pyrans/metabolism , Symbiosis/physiology , Hydrolases/genetics , Oryza/genetics , Phosphates/metabolism , Plant Proteins/genetics , Symbiosis/genetics , Transcription, Genetic
19.
Plant J ; 75(6): 954-64, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23738603

ABSTRACT

Heterozyosity is an important feature of many plant genomes, and is related to heterosis. Sweet orange, a highly heterozygous species, is thought to have originated from an inter-species hybrid between pummelo and mandarin. To investigate the heterozygosity of the sweet orange genome and examine how this heterozygosity affects gene expression, we characterized the genome of Valencia orange for single nucleotide variations (SNVs), small insertions and deletions (InDels) and structural variations (SVs), and determined their functional effects on protein-coding genes and non-coding sequences. Almost half of the genes containing large-effect SNVs and InDels were expressed in a tissue-specific manner. We identified 3542 large SVs (>50 bp), including deletions, insertions and inversions. Most of the 296 genes located in large-deletion regions showed low expression levels. RNA-Seq reads and DNA sequencing reads revealed that the alleles of 1062 genes were differentially expressed. In addition, we detected approximately 42 Mb of contigs that were not found in the reference genome of a haploid sweet orange by de novo assembly of unmapped reads, and annotated 134 protein-coding genes within these contigs. We discuss how this heterozygosity affects the quality of genome assembly. This study advances our understanding of the genome architecture of sweet orange, and provides a global view of gene expression at heterozygous loci.


Subject(s)
Citrus sinensis/genetics , Gene Expression Regulation, Plant , Genetic Variation , Genome, Plant , Genome-Wide Association Study/methods , Contig Mapping , Molecular Sequence Annotation , Transcriptome
20.
Nat Genet ; 45(1): 59-66, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23179022

ABSTRACT

Oranges are an important nutritional source for human health and have immense economic value. Here we present a comprehensive analysis of the draft genome of sweet orange (Citrus sinensis). The assembled sequence covers 87.3% of the estimated orange genome, which is relatively compact, as 20% is composed of repetitive elements. We predicted 29,445 protein-coding genes, half of which are in the heterozygous state. With additional sequencing of two more citrus species and comparative analyses of seven citrus genomes, we present evidence to suggest that sweet orange originated from a backcross hybrid between pummelo and mandarin. Focused analysis on genes involved in vitamin C metabolism showed that GalUR, encoding the rate-limiting enzyme of the galacturonate pathway, is significantly upregulated in orange fruit, and the recent expansion of this gene family may provide a genomic basis. This draft genome represents a valuable resource for understanding and improving many important citrus traits in the future.


Subject(s)
Citrus sinensis/genetics , Genome, Plant , Chimera , Chromosome Mapping , Citrus sinensis/metabolism , Cluster Analysis , Computational Biology/methods , Evolution, Molecular , Gene Expression Profiling , Gene Expression Regulation, Plant , Gene Order , Heterozygote , High-Throughput Nucleotide Sequencing , Molecular Sequence Data , Phylogeny , Vitamins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...