Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Commun Biol ; 6(1): 902, 2023 09 04.
Article in English | MEDLINE | ID: mdl-37667032

ABSTRACT

High-quality reference genome assemblies, representative of global heterotic patterns, offer an ideal platform to accurately characterize and utilize genetic variation in the primary gene pool of hybrid crops. Here we report three platinum grade de-novo, near gap-free, chromosome-level reference genome assemblies from the active breeding germplasm in pearl millet with a high degree of contiguity, completeness, and accuracy. An improved Tift genome (Tift23D2B1-P1-P5) assembly has a contig N50 ~ 7,000-fold (126 Mb) compared to the previous version and better alignment in centromeric regions. Comparative genome analyses of these three lines clearly demonstrate a high level of collinearity and multiple structural variations, including inversions greater than 1 Mb. Differential genes in improved Tift genome are enriched for serine O-acetyltransferase and glycerol-3-phosphate metabolic process which play an important role in improving the nutritional quality of seed protein and disease resistance in plants, respectively. Multiple marker-trait associations are identified for a range of agronomic traits, including grain yield through genome-wide association study. Improved genome assemblies and marker resources developed in this study provide a comprehensive framework/platform for future applications such as marker-assisted selection of mono/oligogenic traits as well as whole-genome prediction and haplotype-based breeding of complex traits.


Subject(s)
Pennisetum , Pennisetum/genetics , DNA Shuffling , Genome-Wide Association Study , Plant Breeding , Agriculture
2.
BMC Genomics ; 22(1): 23, 2021 Jan 06.
Article in English | MEDLINE | ID: mdl-33407087

ABSTRACT

BACKGROUND: Three-dimensional chromatin loop structures connect regulatory elements to their target genes in regions known as anchors. In complex plant genomes, such as maize, it has been proposed that loops span heterochromatic regions marked by higher repeat content, but little is known on their spatial organization and genome-wide occurrence in relation to transcriptional activity. RESULTS: Here, ultra-deep Hi-C sequencing of maize B73 leaf tissue was combined with gene expression and open chromatin sequencing for chromatin loop discovery and correlation with hierarchical topologically-associating domains (TADs) and transcriptional activity. A majority of all anchors are shared between multiple loops from previous public maize high-resolution interactome datasets, suggesting a highly dynamic environment, with a conserved set of anchors involved in multiple interaction networks. Chromatin loop interiors are marked by higher repeat contents than the anchors flanking them. A small fraction of high-resolution interaction anchors, fully embedded in larger chromatin loops, co-locate with active genes and putative protein-binding sites. Combinatorial analyses indicate that all anchors studied here co-locate with at least 81.5% of expressed genes and 74% of open chromatin regions. Approximately 38% of all Hi-C chromatin loops are fully embedded within hierarchical TAD-like domains, while the remaining ones share anchors with domain boundaries or with distinct domains. Those various loop types exhibit specific patterns of overlap for open chromatin regions and expressed genes, but no apparent pattern of gene expression. In addition, up to 63% of all unique variants derived from a prior public maize eQTL dataset overlap with Hi-C loop anchors. Anchor annotation suggests that < 7% of all loops detected here are potentially devoid of any genes or regulatory elements. The overall organization of chromatin loop anchors in the maize genome suggest a loop modeling system hypothesized to resemble phase separation of repeat-rich regions. CONCLUSIONS: Sets of conserved chromatin loop anchors mapping to hierarchical domains contains core structural components of the gene expression machinery in maize. The data presented here will be a useful reference to further investigate their function in regard to the formation of transcriptional complexes and the regulation of transcriptional activity in the maize genome.


Subject(s)
Chromatin , Zea mays , Chromatin/genetics , Chromatin Assembly and Disassembly , Gene Expression , Genome, Plant , Zea mays/genetics
3.
Mol Med Rep ; 23(2)2021 02.
Article in English | MEDLINE | ID: mdl-33313948

ABSTRACT

Alzheimer's disease (AD) is a global health issue, but the precise underlying mechanism has not yet been elucidated. The present study aimed to integrate microRNA (miRNA or miR) and mRNA profiles of AD and identify hub genes via bioinformatics analysis. Datasets associated with AD (GSE113141, GSE104249 and GSE138382) were integrated. Bioinformatics analysis was used to identify the hub mRNAs. TargetScan was used to predict miRNAs that have binding sites for the hub genes. Reverse transcription­quantitative (RT­q)PCR and western blot analysis was performed to assess miRNA and mRNA expression levels in APP/PS1 transgenic mice and human U251 cells. Luciferase reporter assay and RNA interference were utilized to verify the functions of these miRNAs in vitro. Bioinformatics analysis demonstrated that expression levels of the gene encoding transmembrane immune signaling adaptor TYROBP were upregulated in both the GSE113141 and GSE104249 datasets; TYROBP also served as the hub gene in AD. miR­628­5p was predicted to have binding sites for TYROBP and was downregulated in GSE138382. RT­qPCR confirmed low miR­628­5p and high TYROBP expression levels in APP/PS1 transgenic mice and human U251 cells. Western blot analysis demonstrated high protein expression levels of amyloid ß (Aß) precursor protein, Aß and TYROBP in APP/PS1 transgenic mice and U251 cells. Dual luciferase reporter assay confirmed that TYROBP was targeted by miR­628­5p. miR­628­5p/TYROBP may inhibit progressive neurodegeneration in AD and could be used as novel biomarkers and candidate drug targets.


Subject(s)
Adaptor Proteins, Signal Transducing/metabolism , Alzheimer Disease/genetics , Alzheimer Disease/metabolism , Computational Biology , Membrane Proteins/metabolism , MicroRNAs/genetics , Aged , Aged, 80 and over , Amyloid beta-Protein Precursor/metabolism , Animals , Cell Line, Tumor , Databases, Genetic , Female , Gene Expression Regulation , Gene Ontology , Humans , Male , Mice , Mice, Transgenic , MicroRNAs/biosynthesis , Protein Interaction Maps , Up-Regulation
4.
Nat Commun ; 9(1): 4844, 2018 11 19.
Article in English | MEDLINE | ID: mdl-30451840

ABSTRACT

Long-read sequencing technologies have greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer are combined with Bionano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 of 33.28 Mbps and covers 90% of the expected genome length. A sequence accuracy of 99.85% is obtained after aligning the assembly against Illumina Tx430 data and 99.6% of the 34,211 public gene models align to the assembly. Comparisons of Tx430 and BTx623 DLS maps against the public BTx623 v3.0.1 genome assembly suggest substantial discrepancies whose origin remains to be determined. In summary, this study demonstrates that informative assemblies of complex plant genomes can be generated by combining nanopore sequencing with DLS optical maps.


Subject(s)
Genome, Plant , High-Throughput Nucleotide Sequencing , Physical Chromosome Mapping/methods , Sorghum/genetics , Genome Size , Microsatellite Repeats , Nanopores , Staining and Labeling/methods
5.
Plant Genome ; 8(1): eplantgenome2014.08.0037, 2015 Mar.
Article in English | MEDLINE | ID: mdl-33228291

ABSTRACT

Molecular characterization of events is an integral part of the advancement process during genetically modified (GM) crop product development. Assessment of these events is traditionally accomplished by polymerase chain reaction (PCR) and Southern blot analyses. Southern blot analysis can be time-consuming and comparatively expensive and does not provide sequence-level detail. We have developed a sequence-based application, Southern-by-Sequencing (SbS), utilizing sequence capture coupled with next-generation sequencing (NGS) technology to replace Southern blot analysis for event selection in a high-throughput molecular characterization environment. SbS is accomplished by hybridizing indexed and pooled whole-genome DNA libraries from GM plants to biotinylated probes designed to target the sequence of transformation plasmids used to generate events within the pool. This sequence capture process enriches the sequence data obtained for targeted regions of interest (transformation plasmid DNA). Taking advantage of the DNA adjacent to the targeted bases (referred to as next-to-target sequence) that accompanies the targeted transformation plasmid sequence, the data analysis detects plasmid-to-genome and plasmid-to-plasmid junctions introduced during insertion into the plant genome. Analysis of these junction sequences provides sequence-level information as to the following: the number of insertion loci including detection of unlinked, independently segregating, small DNA fragments; copy number; rearrangements, truncations, or deletions of the intended insertion DNA; and the presence of transformation plasmid backbone sequences. This molecular evidence from SbS analysis is used to characterize and select GM plants meeting optimal molecular characterization criteria. SbS technology has proven to be a robust event screening tool for use in a high-throughput molecular characterization environment.

6.
Plant Cell ; 25(12): 4827-43, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24368787

ABSTRACT

Branched-chain amino acids (BCAAs) are three of the nine essential amino acids in human and animal diets and are important for numerous processes in development and growth. However, seed BCAA levels in major crops are insufficient to meet dietary requirements, making genetic improvement for increased and balanced seed BCAAs an important nutritional target. Addressing this issue requires a better understanding of the genetics underlying seed BCAA content and composition. Here, a genome-wide association study and haplotype analysis for seed BCAA traits in Arabidopsis thaliana revealed a strong association with a chromosomal interval containing two branched-chain amino acid transferases, BCAT1 and BCAT2. Linkage analysis, reverse genetic approaches, and molecular complementation analysis demonstrated that allelic variation at BCAT2 is responsible for the natural variation of seed BCAAs in this interval. Complementation analysis of a bcat2 null mutant with two significantly different alleles from accessions Bayreuth-0 and Shahdara is consistent with BCAT2 contributing to natural variation in BCAA levels, glutamate recycling, and free amino acid homeostasis in seeds in an allele-dependent manner. The seed-specific phenotype of bcat2 null alleles, its strong transcription induction during late seed development, and its subcellular localization to the mitochondria are consistent with a unique, catabolic role for BCAT2 in BCAA metabolism in seeds.


Subject(s)
Amino Acids, Branched-Chain/metabolism , Arabidopsis Proteins/genetics , Arabidopsis/metabolism , Genome, Plant , Transaminases/genetics , Amino Acids, Branched-Chain/genetics , Arabidopsis/genetics , Arabidopsis Proteins/metabolism , Arabidopsis Proteins/physiology , Chromosome Mapping , Genetic Association Studies , Genetic Linkage , Haplotypes , Nutritive Value , Seeds/genetics , Seeds/metabolism , Transaminases/metabolism , Transaminases/physiology
7.
Plant Cell ; 25(12): 4812-26, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24368792

ABSTRACT

Experimental approaches targeting carotenoid biosynthetic enzymes have successfully increased the seed ß-carotene content of crops. However, linkage analysis of seed carotenoids in Arabidopsis thaliana recombinant inbred populations showed that only 21% of quantitative trait loci, including those for ß-carotene, encode carotenoid biosynthetic enzymes in their intervals. Thus, numerous loci remain uncharacterized and underutilized in biofortification approaches. Linkage mapping and genome-wide association studies of Arabidopsis seed carotenoids identified CAROTENOID cleavage dioxygenase4 (CCD4) as a major negative regulator of seed carotenoid content, especially ß-carotene. Loss of CCD4 function did not affect carotenoid homeostasis during seed development but greatly reduced carotenoid degradation during seed desiccation, increasing ß-carotene content 8.4-fold relative to the wild type. Allelic complementation of a ccd4 null mutant demonstrated that single-nucleotide polymorphisms and insertions and deletions at the locus affect dry seed carotenoid content, due at least partly to differences in CCD4 expression. CCD4 also plays a major role in carotenoid turnover during dark-induced leaf senescence, with ß-carotene accumulation again most strongly affected in the ccd4 mutant. These results demonstrate that CCD4 plays a major role in ß-carotene degradation in drying seeds and senescing leaves and suggest that CCD4 orthologs would be promising targets for stabilizing and increasing the level of provitamin A carotenoids in seeds of major food crops.


Subject(s)
Arabidopsis Proteins/physiology , Arabidopsis/enzymology , Dioxygenases/physiology , Plant Proteins/physiology , beta Carotene/biosynthesis , Arabidopsis/metabolism , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Cellular Senescence , Chromosome Mapping , Dioxygenases/genetics , Dioxygenases/metabolism , Homeostasis , Mutagenesis, Insertional , Plant Leaves/metabolism , Plant Proteins/genetics , Plant Proteins/metabolism , Plants, Genetically Modified/metabolism , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Seeds/genetics , Seeds/metabolism , Sequence Deletion
8.
Rice (N Y) ; 6(1): 4, 2013 Feb 06.
Article in English | MEDLINE | ID: mdl-24280374

ABSTRACT

BACKGROUND: Rice research has been enabled by access to the high quality reference genome sequence generated in 2005 by the International Rice Genome Sequencing Project (IRGSP). To further facilitate genomic-enabled research, we have updated and validated the genome assembly and sequence for the Nipponbare cultivar of Oryza sativa (japonica group). RESULTS: The Nipponbare genome assembly was updated by revising and validating the minimal tiling path of clones with the optical map for rice. Sequencing errors in the revised genome assembly were identified by re-sequencing the genome of two different Nipponbare individuals using the Illumina Genome Analyzer II/IIx platform. A total of 4,886 sequencing errors were identified in 321 Mb of the assembled genome indicating an error rate in the original IRGSP assembly of only 0.15 per 10,000 nucleotides. A small number (five) of insertions/deletions were identified using longer reads generated using the Roche 454 pyrosequencing platform. As the re-sequencing data were generated from two different individuals, we were able to identify a number of allelic differences between the original individual used in the IRGSP effort and the two individuals used in the re-sequencing effort. The revised assembly, termed Os-Nipponbare-Reference-IRGSP-1.0, is now being used in updated releases of the Rice Annotation Project and the Michigan State University Rice Genome Annotation Project, thereby providing a unified set of pseudomolecules for the rice community. CONCLUSIONS: A revised, error-corrected, and validated assembly of the Nipponbare cultivar of rice was generated using optical map data, re-sequencing data, and manual curation that will facilitate on-going and future research in rice. Detection of polymorphisms between three different Nipponbare individuals highlights that allelic differences between individuals should be considered in diversity studies.

9.
G3 (Bethesda) ; 3(8): 1287-99, 2013 Aug 07.
Article in English | MEDLINE | ID: mdl-23733887

ABSTRACT

Tocopherols and tocotrienols, collectively known as tocochromanols, are the major lipid-soluble antioxidants in maize (Zea mays L.) grain. Given that individual tocochromanols differ in their degree of vitamin E activity, variation for tocochromanol composition and content in grain from among diverse maize inbred lines has important nutritional and health implications for enhancing the vitamin E and antioxidant contents of maize-derived foods through plant breeding. Toward this end, we conducted a genome-wide association study of six tocochromanol compounds and 14 of their sums, ratios, and proportions with a 281 maize inbred association panel that was genotyped for 591,822 SNP markers. In addition to providing further insight into the association between ZmVTE4 (γ-tocopherol methyltransferase) haplotypes and α-tocopherol content, we also detected a novel association between ZmVTE1 (tocopherol cyclase) and tocotrienol composition. In a pathway-level analysis, we assessed the genetic contribution of 60 a priori candidate genes encoding the core tocochromanol pathway (VTE genes) and reactions for pathways supplying the isoprenoid tail and aromatic head group of tocochromanols. This analysis identified two additional genes, ZmHGGT1 (homogentisate geranylgeranyltransferase) and one prephenate dehydratase parolog (of four in the genome) that also modestly contribute to tocotrienol variation in the panel. Collectively, our results provide the most favorable ZmVTE4 haplotype and suggest three new gene targets for increasing vitamin E and antioxidant levels through marker-assisted selection.


Subject(s)
Genome-Wide Association Study , Tocopherols/metabolism , Tocotrienols/metabolism , Zea mays/genetics , Alkyl and Aryl Transferases/genetics , Alkyl and Aryl Transferases/metabolism , Genotype , Haplotypes , Intramolecular Transferases/genetics , Intramolecular Transferases/metabolism , Linkage Disequilibrium , Methyltransferases/genetics , Methyltransferases/metabolism , Plant Proteins/genetics , Plant Proteins/metabolism , Polymorphism, Single Nucleotide , Zea mays/metabolism
10.
Plant J ; 71(3): 492-502, 2012 Aug.
Article in English | MEDLINE | ID: mdl-22443345

ABSTRACT

The Poaceae family, also known as the grasses, includes agronomically important cereal crops such as rice, maize, sorghum, and wheat. Previous comparative studies have shown that much of the gene content is shared among the grasses; however, functional conservation of orthologous genes has yet to be explored. To gain an understanding of the genome-wide patterns of evolution of gene expression across reproductive tissues, we employed a sequence-based approach to compare analogous transcriptomes in species representing three Poaceae subgroups including the Pooideae (Brachypodium distachyon), the Panicoideae (sorghum), and the Ehrhartoideae (rice). Our transcriptome analyses reveal that only a fraction of orthologous genes exhibit conserved expression patterns. A high proportion of conserved orthologs include genes that are upregulated in physiologically similar tissues such as leaves, anther, pistil, and embryo, while orthologs that are highly expressed in seeds show the most diverged expression patterns. More generally, we show that evolution of gene expression profiles and coding sequences in the grasses may be linked. Genes that are highly and broadly expressed tend to be conserved at the coding sequence level while genes with narrow expression patterns show accelerated rates of sequence evolution. We further show that orthologs in syntenic genomic blocks are more likely to share correlated expression patterns compared with non-syntenic orthologs. These findings are important for agricultural improvement because sequence information is transferred from model species, such as Brachypodium, rice, and sorghum to crop plants without sequenced genomes.


Subject(s)
Evolution, Molecular , Gene Expression/genetics , Genome, Plant/genetics , Poaceae/genetics , Synteny/genetics , Transcriptome/genetics , Brachypodium/genetics , Brachypodium/growth & development , Cluster Analysis , Flowers/genetics , Flowers/growth & development , Gene Expression Profiling , Genomics , Open Reading Frames/genetics , Oryza/genetics , Oryza/growth & development , Phylogeny , Plant Leaves/genetics , Plant Leaves/growth & development , Poaceae/growth & development , RNA, Plant/genetics , Seeds/genetics , Seeds/growth & development , Sequence Analysis, RNA , Sorghum/genetics , Sorghum/growth & development
11.
PLoS One ; 6(10): e26801, 2011.
Article in English | MEDLINE | ID: mdl-22046362

ABSTRACT

Advances in molecular breeding in potato have been limited by its complex biological system, which includes vegetative propagation, autotetraploidy, and extreme heterozygosity. The availability of the potato genome and accompanying gene complement with corresponding gene structure, location, and functional annotation are powerful resources for understanding this complex plant and advancing molecular breeding efforts. Here, we report a reference for the potato transcriptome using 32 tissues and growth conditions from the doubled monoploid Solanum tuberosum Group Phureja clone DM1-3 516R44 for which a genome sequence is available. Analysis of greater than 550 million RNA-Seq reads permitted the detection and quantification of expression levels of over 22,000 genes. Hierarchical clustering and principal component analyses captured the biological variability that accounts for gene expression differences among tissues suggesting tissue-specific gene expression, and genes with tissue or condition restricted expression. Using gene co-expression network analysis, we identified 18 gene modules that represent tissue-specific transcriptional networks of major potato organs and developmental stages. This information provides a powerful resource for potato research as well as studies on other members of the Solanaceae family.


Subject(s)
Genome, Plant/genetics , Solanum tuberosum/genetics , Solanum tuberosum/standards , Transcriptome/genetics , Clone Cells , Gene Expression , Genes, Plant , Organ Specificity , Reference Standards
12.
Plant J ; 66(4): 553-63, 2011 May.
Article in English | MEDLINE | ID: mdl-21299659

ABSTRACT

Maize is an important model species and a major constituent of human and animal diets. It has also emerged as a potential feedstock and model system for bioenergy research due to recent worldwide interest in developing plant biomass-based, carbon-neutral liquid fuels. To understand how the underlying genome sequence results in specific plant phenotypes, information on the temporal and spatial transcription patterns of genes is crucial. Here we present a comprehensive atlas of global transcription profiles across developmental stages and plant organs. We used a NimbleGen microarray containing 80,301 probe sets to profile transcription patterns in 60 distinct tissues representing 11 major organ systems of inbred line B73. Of the 30,892 probe sets representing the filtered B73 gene models, 91.4% were expressed in at least one tissue. Interestingly, 44.5% of the probe sets were expressed in all tissues, indicating a substantial overlap of gene expression among plant organs. Clustering of maize tissues based on global gene expression profiles resulted in formation of groups of biologically related tissues. We utilized this dataset to examine the expression of genes that encode enzymes in the lignin biosynthetic pathway, and found that expansion of distinct gene families was accompanied by divergent, tissue-specific transcription patterns of the paralogs. This comprehensive expression atlas represents a valuable resource for gene discovery and functional characterization in maize.


Subject(s)
Gene Expression Profiling , Lignin/biosynthesis , Plant Roots/genetics , Zea mays/genetics , Biosynthetic Pathways , Chromosome Mapping , DNA, Complementary , Gene Expression Regulation, Developmental , Gene Expression Regulation, Plant , Genes, Plant , Germination , Lignin/genetics , Plant Leaves/genetics , Plant Leaves/metabolism , Plant Roots/metabolism , Principal Component Analysis , Seedlings/genetics , Seedlings/metabolism , Seeds/growth & development , Seeds/metabolism , Zea mays/growth & development , Zea mays/metabolism
13.
Genome Biol ; 11(7): R73, 2010.
Article in English | MEDLINE | ID: mdl-20626842

ABSTRACT

BACKGROUND: Pythium ultimum is a ubiquitous oomycete plant pathogen responsible for a variety of diseases on a broad range of crop and ornamental species. RESULTS: The P. ultimum genome (42.8 Mb) encodes 15,290 genes and has extensive sequence similarity and synteny with related Phytophthora species, including the potato blight pathogen Phytophthora infestans. Whole transcriptome sequencing revealed expression of 86% of genes, with detectable differential expression of suites of genes under abiotic stress and in the presence of a host. The predicted proteome includes a large repertoire of proteins involved in plant pathogen interactions, although, surprisingly, the P. ultimum genome does not encode any classical RXLR effectors and relatively few Crinkler genes in comparison to related phytopathogenic oomycetes. A lower number of enzymes involved in carbohydrate metabolism were present compared to Phytophthora species, with the notable absence of cutinases, suggesting a significant difference in virulence mechanisms between P. ultimum and more host-specific oomycete species. Although we observed a high degree of orthology with Phytophthora genomes, there were novel features of the P. ultimum proteome, including an expansion of genes involved in proteolysis and genes unique to Pythium. We identified a small gene family of cadherins, proteins involved in cell adhesion, the first report of these in a genome outside the metazoans. CONCLUSIONS: Access to the P. ultimum genome has revealed not only core pathogenic mechanisms within the oomycetes but also lineage-specific genes associated with the alternative virulence and lifestyles found within the pythiaceous lineages compared to the Peronosporaceae.


Subject(s)
Genome/genetics , Plants/microbiology , Proteins/genetics , Pythium/genetics , Pythium/pathogenicity , Antifungal Agents/pharmacology , Base Sequence , Cadherins/genetics , Carbohydrate Metabolism/drug effects , Carbohydrate Metabolism/genetics , Gene Order/genetics , Gene Rearrangement/genetics , Genome, Mitochondrial/genetics , Genomics , Host-Pathogen Interactions/drug effects , Host-Pathogen Interactions/genetics , Humans , Multigene Family/genetics , Phylogeny , Proteins/metabolism , Pythium/drug effects , Pythium/growth & development , Repetitive Sequences, Nucleic Acid/genetics , Sequence Alignment , Sequence Analysis, DNA , Synteny/genetics
14.
BMC Evol Biol ; 10: 41, 2010 Feb 12.
Article in English | MEDLINE | ID: mdl-20152032

ABSTRACT

BACKGROUND: The availability of genome and transcriptome sequences for a number of species permits the identification and characterization of conserved as well as divergent genes such as lineage-specific genes which have no detectable sequence similarity to genes from other lineages. While genes conserved among taxa provide insight into the core processes among species, lineage-specific genes provide insights into evolutionary processes and biological functions that are likely clade or species specific. RESULTS: Comparative analyses using the Arabidopsis thaliana genome and sequences from 178 other species within the Plant Kingdom enabled the identification of 24,624 A. thaliana genes (91.7%) that were termed Evolutionary Conserved (EC) as defined by sequence similarity to a database entry as well as two sets of lineage-specific genes within A. thaliana. One of the A. thaliana lineage-specific gene sets share sequence similarity only to sequences from species within the Brassicaceae family and are termed Conserved Brassicaceae-Specific Genes (914, 3.4%, CBSG). The other set of A. thaliana lineage-specific genes, the Arabidopsis Lineage-Specific Genes (1,324, 4.9%, ALSG), lack sequence similarity to any sequence outside A. thaliana. While many CBSGs (76.7%) and ALSGs (52.9%) are transcribed, the majority of the CBSGs (76.1%) and ALSGs (94.4%) have no annotated function. Co-expression analysis indicated significant enrichment of the CBSGs and ALSGs in multiple functional categories suggesting their involvement in a wide range of biological functions. Subcellular localization prediction revealed that the CBSGs were significantly enriched in proteins targeted to the secretory pathway (412, 45.1%). Among the 107 putatively secreted CBSGs with known functions, 67 encode a putative pollen coat protein or cysteine-rich protein with sequence similarity to the S-locus cysteine-rich protein that is the pollen determinant controlling allele specific pollen rejection in self-incompatible Brassicaceae species. Overall, the ALSGs and CBSGs were more highly methylated in floral tissue compared to the ECs. Single Nucleotide Polymorphism (SNP) analysis showed an elevated ratio of non-synonymous to synonymous SNPs within the ALSGs (1.99) and CBSGs (1.65) relative to the EC set (0.92), mainly caused by an elevated number of non-synonymous SNPs, indicating that they are fast-evolving at the protein sequence level. CONCLUSIONS: Our analyses suggest that while a significant fraction of the A. thaliana proteome is conserved within the Plant Kingdom, evolutionarily distinct sets of genes that may function in defining biological processes unique to these lineages have arisen within the Brassicaceae and A. thaliana.


Subject(s)
Arabidopsis Proteins/analysis , Arabidopsis/genetics , Brassicaceae/genetics , Arabidopsis Proteins/genetics , DNA Methylation , Polymorphism, Single Nucleotide
15.
BMC Plant Biol ; 8: 18, 2008 Feb 19.
Article in English | MEDLINE | ID: mdl-18284697

ABSTRACT

BACKGROUND: High gene numbers in plant genomes reflect polyploidy and major gene duplication events. Oryza sativa, cultivated rice, is a diploid monocotyledonous species with a ~390 Mb genome that has undergone segmental duplication of a substantial portion of its genome. This, coupled with other genetic events such as tandem duplications, has resulted in a substantial number of its genes, and resulting proteins, occurring in paralogous families. RESULTS: Using a computational pipeline that utilizes Pfam and novel protein domains, we characterized paralogous families in rice and compared these with paralogous families in the model dicotyledonous diploid species, Arabidopsis thaliana. Arabidopsis, which has undergone genome duplication as well, has a substantially smaller genome (~120 Mb) and gene complement compared to rice. Overall, 53% and 68% of the non-transposable element-related rice and Arabidopsis proteins could be classified into paralogous protein families, respectively. Singleton and paralogous family genes differed substantially in their likelihood of encoding a protein of known or putative function; 26% and 66% of singleton genes compared to 73% and 96% of the paralogous family genes encode a known or putative protein in rice and Arabidopsis, respectively. Furthermore, a major skew in the distribution of specific gene function was observed; a total of 17 Gene Ontology categories in both rice and Arabidopsis were statistically significant in their differential distribution between paralogous family and singleton proteins. In contrast to mammalian organisms, we found that duplicated genes in rice and Arabidopsis tend to have more alternative splice forms. Using data from Massively Parallel Signature Sequencing, we show that a significant portion of the duplicated genes in rice show divergent expression although a correlation between sequence divergence and correlation of expression could be seen in very young genes. CONCLUSION: Collectively, these data suggest that while co-regulation and conserved function are present in some paralogous protein family members, evolutionary pressures have resulted in functional divergence with differential expression patterns.


Subject(s)
Genes, Plant/genetics , Multigene Family/genetics , Oryza/genetics , Plant Proteins/genetics , Arabidopsis/genetics , Expressed Sequence Tags , Gene Duplication , Phylogeny , Protein Isoforms
16.
Plant Physiol ; 145(4): 1311-22, 2007 Dec.
Article in English | MEDLINE | ID: mdl-17951464

ABSTRACT

Using the rice (Oryza sativa) sp. japonica genome annotation, along with genomic sequence and clustered transcript assemblies from 184 species in the plant kingdom, we have identified a set of 861 rice genes that are evolutionarily conserved among six diverse species within the Poaceae yet lack significant sequence similarity with plant species outside the Poaceae. This set of evolutionarily conserved and lineage-specific rice genes is termed conserved Poaceae-specific genes (CPSGs) to reflect the presence of significant sequence similarity across three separate Poaceae subfamilies. The vast majority of rice CPSGs (86.6%) encode proteins with no putative function or functionally characterized protein domain. For the remaining CPSGs, 8.8% encode an F-box domain-containing protein and 4.5% encode a protein with a putative function. On average, the CPSGs have fewer exons, shorter total gene length, and elevated GC content when compared with genes annotated as either transposable elements (TEs) or those genes having significant sequence similarity in a species outside the Poaceae. Multiple sequence alignments of the CPSGs with sequences from other Poaceae species show conservation across a putative domain, a novel domain, or the entire coding length of the protein. At the genome level, syntenic alignments between sorghum (Sorghum bicolor) and 103 of the 861 rice CPSGs (12.0%) could be made, demonstrating an additional level of conservation for this set of genes within the Poaceae. The extensive sequence similarity in evolutionarily distinct species within the Poaceae family and an additional screen for TE-related structural characteristics and sequence discounts these CPSGs as being misannotated TEs. Collectively, these data confirm that we have identified a specific set of genes that are highly conserved within, as well as specific to, the Poaceae.


Subject(s)
Conserved Sequence , Genes, Plant , Poaceae/genetics , Synteny , Amino Acid Sequence , Base Sequence , Molecular Sequence Data , Oryza/genetics , Sorghum/genetics
17.
Nucleic Acids Res ; 35(Database issue): D883-7, 2007 Jan.
Article in English | MEDLINE | ID: mdl-17145706

ABSTRACT

In The Institute for Genomic Research Rice Genome Annotation project (http://rice.tigr.org), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42,653 non-transposable element-related genes encoding 49,472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13,237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31,739 gene models), representing approximately 50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads.


Subject(s)
Databases, Genetic , Genome, Plant , Oryza/genetics , DNA Transposable Elements , DNA, Complementary/chemistry , Expressed Sequence Tags/chemistry , Gene Expression , Internet , Oryza/metabolism , Proteomics , User-Computer Interface
18.
Genome Biol ; 7(5): R41, 2006.
Article in English | MEDLINE | ID: mdl-16719932

ABSTRACT

BACKGROUND: Introns are under less selection pressure than exons, and consequently, intronic sequences have a higher rate of gain and loss than exons. In a number of plant species, a large portion of the genome has been segmentally duplicated, giving rise to a large set of duplicated genes. The recent completion of the rice genome in which segmental duplication has been documented has allowed us to investigate intron evolution within rice, a diploid monocotyledonous species. RESULTS: Analysis of segmental duplication in rice revealed that 159 Mb of the 371 Mb genome and 21,570 of the 43,719 non-transposable element-related genes were contained within a duplicated region. In these duplicated regions, 3,101 collinear paired genes were present. Using this set of segmentally duplicated genes, we investigated intron evolution from full-length cDNA-supported non-transposable element-related gene models of rice. Using gene pairs that have an ortholog in the dicotyledonous model species Arabidopsis thaliana, we identified more intron loss (49 introns within 35 gene pairs) than intron gain (5 introns within 5 gene pairs) following segmental duplication. We were unable to demonstrate preferential intron loss at the 3' end of genes as previously reported in mammalian genomes. However, we did find that the four nucleotides of exons that flank lost introns had less frequently used 4-mers. CONCLUSION: We observed that intron evolution within rice following segmental duplication is largely dominated by intron loss. In two of the five cases of intron gain within segmentally duplicated genes, the gained sequences were similar to transposable elements.


Subject(s)
Evolution, Molecular , Gene Duplication , Genes, Plant , Introns , Oryza/genetics , Amino Acid Sequence , Conserved Sequence , Exons , Genomics , Molecular Sequence Data , RNA Splice Sites
19.
Plant Physiol ; 138(1): 18-26, 2005 May.
Article in English | MEDLINE | ID: mdl-15888674

ABSTRACT

We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.


Subject(s)
Databases, Genetic , Genes, Plant , Genome, Plant , Oryza/genetics , Computational Biology
SELECTION OF CITATIONS
SEARCH DETAIL
...