Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Sci Rep ; 8(1): 8529, 2018 06 04.
Article in English | MEDLINE | ID: mdl-29867103

ABSTRACT

Many Cactaceae species exhibit determinate growth of the primary root as a consequence of root apical meristem (RAM) exhaustion. The genetic regulation of this growth pattern is unknown. Here, we de novo assembled and annotated the root apex transcriptome of the Pachycereus pringlei primary root at three developmental stages, with active or exhausted RAM. The assembled transcriptome is robust and comprehensive, and was used to infer a transcriptional regulatory network of the primary root apex. Putative orthologues of Arabidopsis regulators of RAM maintenance, as well as putative lineage-specific transcripts were identified. The transcriptome revealed putative orthologues of most proteins involved in housekeeping processes, hormone signalling, and metabolic pathways. Our results suggest that specific transcriptional programs operate in the root apex at specific developmental time points. Moreover, the transcriptional state of the P. pringlei root apex as the RAM becomes exhausted is comparable to the transcriptional state of cells from the meristematic, elongation, and differentiation zones of Arabidopsis roots along the root axis. We suggest that the transcriptional program underlying the drought stress response is induced during Cactaceae root development, and that lineage-specific transcripts could contribute to RAM exhaustion in Cactaceae.


Subject(s)
Cactaceae/growth & development , Gene Expression Profiling , Gene Expression Regulation, Plant/physiology , Meristem/growth & development , Signal Transduction/physiology , Arabidopsis/growth & development
2.
Plant Genome ; 8(2): eplantgenome2014.10.0068, 2015 Jul.
Article in English | MEDLINE | ID: mdl-33228299

ABSTRACT

Upland cotton (Gossypium hirsutum L.) has a narrow germplasm base, which constrains marker development and hampers intraspecific breeding. A pressing need exists for high-throughput single nucleotide polymorphism (SNP) markers that can be readily applied to germplasm in breeding and breeding-related research programs. Despite progress made in developing new sequencing technologies during the past decade, the cost of sequencing remains substantial when one is dealing with numerous samples and large genomes. Several strategies have been proposed to lower the cost of sequencing for multiple genotypes of large-genome species like cotton, such as transcriptome sequencing and reduced-representation DNA sequencing. This paper reports the development of a transcriptome assembly of the inbred line Texas Marker-1 (TM-1), a genetic standard for cotton, its usefulness as a reference for RNA sequencing (RNA-seq)-based SNP identification, and the availability of transcriptome sequences of four other cotton cultivars. An assembly of TM-1 was made using Roche 454 transcriptome reads combined with an assembly of all available public expressed sequence tag (EST) sequences of TM-1. The TM-1 assembly consists of 72,450 contigs with a total of 70 million bp. Functional predictions of the transcripts were estimated by alignment to selected protein databases. Transcriptome sequences of the five lines, including TM-1, were obtained using an Illumina Genome Analyzer-II, and the short reads were mapped to the TM-1 assembly to discover SNPs among the five lines. We identified >14,000 unfiltered allelic SNPs, of which ∼3,700 SNPs were retained for assay development after applying several rigorous filters. This paper reports availability of the reference transcriptome assembly and shows its utility in developing intraspecific SNP markers in upland cotton.

3.
BMC Genomics ; 15: 945, 2014 Oct 30.
Article in English | MEDLINE | ID: mdl-25359292

ABSTRACT

BACKGROUND: Cotton (Gossypium spp.) is the largest producer of natural fibers for textile and is an important crop worldwide. Crop production is comprised primarily of G. hirsutum L., an allotetraploid. However, elite cultivars express very small amounts of variation due to the species monophyletic origin, domestication and further bottlenecks due to selection. Conversely, wild cotton species harbor extensive genetic diversity of prospective utility to improve many beneficial agronomic traits, fiber characteristics, and resistance to disease and drought. Introgression of traits from wild species can provide a natural way to incorporate advantageous traits through breeding to generate higher-producing cotton cultivars and more sustainable production systems. Interspecific introgression efforts by conventional methods are very time-consuming and costly, but can be expedited using marker-assisted selection. RESULTS: Using transcriptome sequencing we have developed the first gene-associated single nucleotide polymorphism (SNP) markers for wild cotton species G. tomentosum, G. mustelinum, G. armourianum and G. longicalyx. Markers were also developed for a secondary cultivated species G. barbadense cv. 3-79. A total of 62,832 non-redundant SNP markers were developed from the five wild species which can be utilized for interspecific germplasm introgression into cultivated G. hirsutum and are directly associated with genes. Over 500 of the G. barbadense markers have been validated by whole-genome radiation hybrid mapping. Overall 1,060 SNPs from the five different species have been screened and shown to produce acceptable genotyping assays. CONCLUSIONS: This large set of 62,832 SNPs relative to cultivated G. hirsutum will allow for the first high-density mapping of genes from five wild species that affect traits of interest, including beneficial agronomic and fiber characteristics. Upon mapping, the markers can be utilized for marker-assisted introgression of new germplasm into cultivated cotton and in subsequent breeding of agronomically adapted types, including cultivar development.


Subject(s)
Breeding , Chromosome Mapping , Genes, Plant , Gossypium/genetics , Polymorphism, Single Nucleotide , Chromosomes, Plant , Computational Biology , Crosses, Genetic , Genetic Markers , Genome, Plant , Genotyping Techniques , Reproducibility of Results , Sequence Deletion , Transcriptome
4.
RNA ; 20(12): 1987-99, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25344399

ABSTRACT

The experimental induction of RNA silencing in plants often involves expression of transgenes encoding inverted repeat (IR) sequences to produce abundant dsRNAs that are processed into small RNAs (sRNAs). These sRNAs are key mediators of post-transcriptional gene silencing (PTGS) and determine its specificity. Despite its application in agriculture and broad utility in plant research, the mechanism of IR-PTGS is incompletely understood. We generated four sets of 60 Arabidopsis plants, each containing IR transgenes expressing different configurations of uidA and CHALCONE Synthase (At-CHS) gene fragments. Levels of PTGS were found to depend on the orientation and position of the fragment in the IR construct. Deep sequencing and mapping of sRNAs to corresponding transgene-derived and endogenous transcripts identified distinctive patterns of differential sRNA accumulation that revealed similarities among sRNAs associated with IR-PTGS and endogenous sRNAs linked to uncapped mRNA decay. Detailed analyses of poly-A cleavage products from At-CHS mRNA confirmed this hypothesis. We also found unexpected associations between sRNA accumulation and the presence of predicted open reading frames in the trigger sequence. In addition, strong IR-PTGS affected the prevalence of endogenous sRNAs, which has implications for the use of PTGS for experimental or applied purposes.


Subject(s)
Gene Silencing , RNA Interference , RNA, Messenger/genetics , RNA, Small Interfering/genetics , Acyltransferases/genetics , Arabidopsis/genetics , Gene Expression Regulation, Plant , High-Throughput Nucleotide Sequencing , Inverted Repeat Sequences/genetics , Plants, Genetically Modified/genetics , RNA Stability/genetics , RNA, Double-Stranded/genetics , Signal Transduction
5.
Ann Bot ; 112(2): 239-52, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23666887

ABSTRACT

BACKGROUND AND AIMS: Species of Cactaceae are well adapted to arid habitats. Determinate growth of the primary root, which involves early and complete root apical meristem (RAM) exhaustion and differentiation of cells at the root tip, has been reported for some Cactoideae species as a root adaptation to aridity. In this study, the primary root growth patterns of Cactaceae taxa from diverse habitats are classified as being determinate or indeterminate, and the molecular mechanisms underlying RAM maintenance in Cactaceae are explored. Genes that were induced in the primary root of Stenocereus gummosus before RAM exhaustion are identified. METHODS: Primary root growth was analysed in Cactaceae seedlings cultivated in vertically oriented Petri dishes. Differentially expressed transcripts were identified after reverse northern blots of clones from a suppression subtractive hybridization cDNA library. KEY RESULTS: All species analysed from six tribes of the Cactoideae subfamily that inhabit arid and semi-arid regions exhibited determinate primary root growth. However, species from the Hylocereeae tribe, which inhabit mesic regions, exhibited mostly indeterminate primary root growth. Preliminary results suggest that seedlings of members of the Opuntioideae subfamily have mostly determinate primary root growth, whereas those of the Maihuenioideae and Pereskioideae subfamilies have mostly indeterminate primary root growth. Seven selected transcripts encoding homologues of heat stress transcription factor B4, histone deacetylase, fibrillarin, phosphoethanolamine methyltransferase, cytochrome P450 and gibberellin-regulated protein were upregulated in S. gummosus root tips during the initial growth phase. CONCLUSIONS: Primary root growth in Cactoideae species matches their environment. The data imply that determinate growth of the primary root became fixed after separation of the Cactiodeae/Opuntioideae and Maihuenioideae/Pereskioideae lineages, and that the genetic regulation of RAM maintenance and its loss in Cactaceae is orchestrated by genes involved in the regulation of gene expression, signalling, and redox and hormonal responses.


Subject(s)
Adaptation, Physiological , Biological Evolution , Cactaceae/physiology , Plant Roots/physiology , Cactaceae/cytology , Cactaceae/genetics , Cactaceae/growth & development , Cell Differentiation , DNA, Complementary/genetics , Ecosystem , Gene Expression Regulation, Plant , Gene Library , Meristem/cytology , Meristem/genetics , Meristem/growth & development , Meristem/physiology , Oxidation-Reduction , Phenotype , Phylogeny , Plant Growth Regulators , Plant Proteins/genetics , Plant Roots/cytology , Plant Roots/genetics , Plant Roots/growth & development , RNA, Plant/genetics , Seedlings/cytology , Seedlings/genetics , Seedlings/growth & development , Seedlings/physiology , Signal Transduction , Stress, Physiological
6.
PLoS One ; 8(2): e55913, 2013.
Article in English | MEDLINE | ID: mdl-23409088

ABSTRACT

Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.


Subject(s)
Gene Library , Genome, Plant , High-Throughput Nucleotide Sequencing , Arabidopsis/drug effects , Arabidopsis/genetics , Base Composition , Computational Biology/methods , Deoxyribonucleases , Gene Expression Profiling , Genes, Chloroplast , High-Throughput Nucleotide Sequencing/methods , Lactuca/drug effects , Lactuca/genetics , Open Reading Frames , Quaternary Ammonium Compounds/pharmacology , Repetitive Sequences, Nucleic Acid , Transcriptome
7.
Cytometry A ; 81(7): 627-34, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22674817

ABSTRACT

Here we report a new variant of AmCyan fluorescent protein that has been specifically designed for multicolor cell analysis. AmCyan is one of the existing violet fluorochromes for use in flow cytometers equipped with a violet (405 nm) laser. It is also widely used as a label in fluorescent spectroscopy. Limitations on its use are due to the significant AmCyan fluorescence spillover into the FITC detector, due to excitation of AmCyan by the blue (488 nm) laser. In order to resolve this problem, we modified the excitation profile of AmCyan. The new fluorescent protein that we developed, AmCyan100, has an emission profile similar to AmCyan with an emission maximum at 500 nm, but its excitation maximum is shifted to 395 nm, which coincides more closely with the violet laser line and decreases the excitation with the blue laser, thus reducing the spillover observed with the original AmCyan. Moreover, this new protein has a Stokes shift of more than 100 nm compared to the Stokes shift of 31 nm in its precursor. Our data also suggests that AmCyan100-mAb conjugates have brightness similar to AmCyan-mAb conjugates. In summary, AmCyan100 conjugates have minimum spillover into the FITC detector, and can potentially replace existing AmCyan conjugates in multicolor flow cytometry without any changes in instrumental setup and existing reagent panel design.


Subject(s)
Fluorescent Dyes/chemistry , Green Fluorescent Proteins/chemistry , Amino Acid Substitution , CD4-Positive T-Lymphocytes/cytology , CD4-Positive T-Lymphocytes/metabolism , CD8-Positive T-Lymphocytes/cytology , CD8-Positive T-Lymphocytes/metabolism , Cloning, Molecular , Cytokines/metabolism , Escherichia coli , Fixatives/chemistry , Flow Cytometry , Fluorescence , Fluorescent Antibody Technique, Direct , Formaldehyde/chemistry , Green Fluorescent Proteins/genetics , Humans , Mutagenesis , Polymers/chemistry , Protein Engineering , Tissue Fixation
8.
Am J Bot ; 99(2): 209-18, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22058181

ABSTRACT

PREMISE OF STUDY: Weeds cause considerable environmental and economic damage. However, genomic characterization of weeds has lagged behind that of model plants and crop species. Here we describe the development of genomic tools and resources for 11 weeds from the Compositae family that will serve as a basis for subsequent population and comparative genomic analyses. Because hybridization has been suggested as a stimulus for the evolution of invasiveness, we also analyze these genomic data for evidence of hybridization. METHODS: We generated 22 expressed sequence tag (EST) libraries for the 11 targeted weeds using Sanger, 454, and Illumina sequencing, compared the coverage and quality of sequence assemblies, and developed NimbleGen microarrays for expression analyses in five taxa. When possible, we also compared the distributions of Ks values between orthologs of congeneric taxa to detect and quantify hybridization and introgression. RESULTS: Gene discovery was enhanced by sequencing from multiple tissues, normalization of cDNA libraries, and especially greater sequencing depth. However, assemblies from short sequence reads sometimes failed to resolve close paralogs. Substantial introgression was detected in Centaurea and Helianthus, but not in Ambrosia and Lactuca. CONCLUSIONS: Transcriptome sequencing using next-generation platforms has greatly reduced the cost of genomic studies of nonmodel organisms, and the ESTs and microarrays reported here will accelerate evolutionary and molecular investigations of Compositae weeds. Our study also shows how ortholog comparisons can be used to approximately estimate the genome-wide extent of introgression and to identify genes that have been exchanged between hybridizing taxa.


Subject(s)
Asteraceae/genetics , Expressed Sequence Tags , Genomics/methods , Hybridization, Genetic , DNA, Complementary/genetics , Databases, Genetic , Evolution, Molecular , Gene Expression Profiling , Gene Library , Genetic Variation , Oligonucleotide Array Sequence Analysis , RNA, Plant/genetics
9.
BMC Genomics ; 12: 389, 2011 Aug 02.
Article in English | MEDLINE | ID: mdl-21810238

ABSTRACT

BACKGROUND: Among next generation sequence technologies, platforms such as Illumina and SOLiD produce short reads but with higher coverage and lower cost per sequenced nucleotide than 454 or Sanger. A challenge now is to develop efficient strategies to use short-read length platforms for de novo assembly and marker development. The scope of this study was to develop a de novo assembly of carrot ESTs from multiple genotypes using the Illumina platform, and to identify polymorphisms. RESULTS: A de novo assembly of transcriptome sequence from four genetic backgrounds produced 58,751 contigs and singletons. Over 50% of these assembled sequences were annotated allowing detection of transposable elements and new carrot anthocyanin genes. Presence of multiple genetic backgrounds in our assembly allowed the identification of 114 computationally polymorphic SSRs, and 20,058 SNPs at a depth of coverage of 20× or more. Polymorphisms were predominantly between inbred lines except for the cultivated x wild RIL pool which had high intra-sample polymorphism. About 90% and 88% of tested SSR and SNP primers amplified a product, of which 70% and 46%, respectively, were of the expected size. Out of verified SSR and SNP markers 84% and 82% were polymorphic. About 25% of SNPs genotyped were polymorphic in two diverse mapping populations. CONCLUSIONS: This study confirmed the potential of short read platforms for de novo EST assembly and identification of genetic polymorphisms in carrot. In addition we produced the first large-scale transcriptome of carrot, a species lacking genomic resources.


Subject(s)
Daucus carota/genetics , Expressed Sequence Tags , Genetic Variation , Transcriptome , Contig Mapping , DNA, Plant/genetics , Gene Expression Profiling/methods , Genes, Plant , Genetic Markers , Genotype , Introns , Molecular Sequence Annotation , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods
10.
BMC Genomics ; 11: 408, 2010 Jun 29.
Article in English | MEDLINE | ID: mdl-20584339

ABSTRACT

BACKGROUND: More than 80% of the wheat genome is composed of transposable elements (TEs). Since active TEs can move to different locations and potentially impose a significant mutational load, their expression is suppressed in the genome via small non-coding RNAs (sRNAs). sRNAs guide silencing of TEs at the transcriptional (mainly 24-nt sRNAs) and post-transcriptional (mainly 21-nt sRNAs) levels. In this study, we report the distribution of these two types of sRNAs among the different classes of wheat TEs, the regions targeted within the TEs, and their impact on the methylation patterns of the targeted regions. RESULTS: We constructed an sRNA library from hexaploid wheat and developed a database that included our library and three other publicly available sRNA libraries from wheat. For five completely-sequenced wheat BAC contigs, most perfectly matching sRNAs represented TE sequences, suggesting that a large fraction of the wheat sRNAs originated from TEs. An analysis of all wheat TEs present in the Triticeae Repeat Sequence database showed that sRNA abundance was correlated with the estimated number of TEs within each class. Most of the sRNAs perfectly matching miniature inverted repeat transposable elements (MITEs) belonged to the 21-nt class and were mainly targeted to the terminal inverted repeats (TIRs). In contrast, most of the sRNAs matching class I and class II TEs belonged to the 24-nt class and were mainly targeted to the long terminal repeats (LTRs) in the class I TEs and to the terminal repeats in CACTA transposons. An analysis of the mutation frequency in potentially methylated sites revealed a three-fold increase in TE mutation frequency relative to intron and untranslated genic regions. This increase is consistent with wheat TEs being preferentially methylated, likely by sRNA targeting. CONCLUSIONS: Our study examines the wheat epigenome in relation to known TEs. sRNA-directed transcriptional and post-transcriptional silencing plays important roles in the short-term suppression of TEs in the wheat genome, whereas DNA methylation and increased mutation rates may provide a long-term mechanism to inactivate TEs.


Subject(s)
DNA Methylation , DNA Transposable Elements/genetics , RNA, Untranslated/genetics , Triticum/genetics , Chromosomes, Artificial, Bacterial/genetics , DNA, Intergenic/genetics , Databases, Nucleic Acid , Genome, Plant/genetics , INDEL Mutation , Kinetics , Repetitive Sequences, Nucleic Acid/genetics
11.
Theor Appl Genet ; 120(1): 85-91, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19820913

ABSTRACT

Due to their highly polymorphic and codominant nature, simple-sequence repeat (SSR) markers are a common choice for assaying genetic diversity and genetic mapping. In this paper, we describe the generation of an expressed-sequence tag (EST) collection for the oilseed crop safflower and the subsequent development of EST-SSR markers for the genetic analysis of safflower and related species. We assembled 40,874 reads into 19,395 unigenes, of which 4,416 (22.8%) contained at least one SSR. Primer pairs were developed and tested for 384 of these loci, resulting in a collection of 104 polymorphic markers that amplify reliably across 27 accessions (3 species) of the genus Carthamus. These markers exhibited a high level of polymorphism, with an average of 6.0 +/- 0.4 alleles per locus and an average gene diversity of 0.54 +/- 0.03 across Carthamus species. In terms of cross-taxon transferability, 50% of these primer pairs produced an amplicon in at least one other species in the Asteraceae, and 28% produced an amplicon in at least one species outside the safflower subfamily (i.e., lettuce, sunflower, and/or Gerbera). These markers represent a valuable resource for the genetic analysis of safflower and related species, and also have the potential to facilitate comparative map-based analyses across a broader array of taxa within the Asteraceae.


Subject(s)
Carthamus tinctorius/classification , Carthamus tinctorius/genetics , Expressed Sequence Tags , Polymorphism, Genetic , Gene Library , Molecular Sequence Data , Phylogeny
12.
Mol Biol Evol ; 25(11): 2445-55, 2008 Nov.
Article in English | MEDLINE | ID: mdl-18728074

ABSTRACT

Of the approximately 250,000 species of flowering plants, nearly one in ten are members of the Compositae (Asteraceae), a diverse family found in almost every habitat on all continents except Antarctica. With an origin in the mid Eocene, the Compositae is also a relatively young family with remarkable diversifications during the last 40 My. Previous cytologic and systematic investigations suggested that paleopolyploidy may have occurred in at least one Compositae lineage, but a recent analysis of genomic data was equivocal. We tested for evidence of paleopolyploidy in the evolutionary history of the family using recently available expressed sequence tag (EST) data from the Compositae Genome Project. Combined with data available on GenBank, we analyzed nearly 1 million ESTs from 18 species representing seven genera and four tribes. Our analyses revealed at least three ancient whole-genome duplications in the Compositae-a paleopolyploidization shared by all analyzed taxa and placed near the origin of the family just prior to the rapid radiation of its tribes and independent genome duplications near the base of the tribes Mutisieae and Heliantheae. These results are consistent with previous research implicating paleopolyploidy in the evolution and diversification of the Heliantheae. Further, we observed parallel retention of duplicate genes from the basal Compositae genome duplication across all tribes, despite divergence times of 33-38 My among these lineages. This pattern of retention was also repeated for the paleologs from the Heliantheae duplication. Intriguingly, the categories of genes retained in duplicate were substantially different from those in Arabidopsis. In particular, we found that genes annotated to structural components or cellular organization Gene Ontology categories were significantly enriched among paleologs, whereas genes associated with transcription and other regulatory functions were significantly underrepresented. Our results suggest that paleopolyploidy can yield strikingly consistent signatures of gene retention in plant genomes despite extensive lineage radiations and recurrent genome duplications but that these patterns vary substantially among higher taxonomic categories.


Subject(s)
Asteraceae/genetics , Gene Duplication , Genes, Plant , Polyploidy , Evolution, Molecular , Expressed Sequence Tags
13.
Theor Appl Genet ; 117(7): 1021-9, 2008 Nov.
Article in English | MEDLINE | ID: mdl-18633591

ABSTRACT

Simple sequence repeats (SSRs) are abundant and frequently highly polymorphic in transcribed sequences and widely targeted for marker development in eukaryotes. Sunflower (Helianthus annuus) transcript assemblies were built and mined to identify SSRs and insertions-deletions (INDELs) for marker development, comparative mapping, and other genomics applications in sunflower. We describe the spectrum and frequency of SSRs identified in the sunflower EST database, a catalog of 16,643 EST-SSRs, a collection of 484 EST-SSR and 43 EST-INDEL markers developed from common sunflower ESTs, polymorphisms of the markers among the parents of several intraspecific and interspecific mapping populations, and the transferability of the markers to closely and distantly related species in the Compositae. Of 17,904 unigenes in the transcript assembly, 1,956 (10.9%) harbored one or more SSRs with repeat counts of n > or = 5. EST-SSR markers were 1.6-fold more polymorphic among exotic than elite genotypes and 0.7-fold less polymorphic than non-genic SSR markers. Of 466 EST-SSR or INDEL markers screened for cross-species amplification and polymorphisms, 413 (88.6%) amplified alleles from one or more wild species (H. argophyllus, H. tuberosus, H. anomalus, H. paradoxus, and H. deserticola), whereas 69 (14.8%) amplified alleles from safflower (Carthamus tinctorius) and 67 (14.4%) amplified alleles from lettuce (Lactuca sativa); hence, only a fraction were transferable to distantly related genera in the Compositae, whereas most were transferable to wild relatives of H. annuus. Several thousand additional SSRs were identified in the EST database and supply a wealth of templates for EST-SSR marker development in sunflower.


Subject(s)
Expressed Sequence Tags , Helianthus/genetics , INDEL Mutation , Minisatellite Repeats , Polymorphism, Genetic , Asteraceae/classification , Computational Biology , Databases, Genetic , Genetic Markers , Species Specificity
14.
Nat Biotechnol ; 22(8): 1006-11, 2004 Aug.
Article in English | MEDLINE | ID: mdl-15247925

ABSTRACT

Large-scale sequencing of short mRNA-derived tags can establish the qualitative and quantitative characteristics of a complex transcriptome. We sequenced 12,304,362 tags from five diverse libraries of Arabidopsis thaliana using massively parallel signature sequencing (MPSS). A total of 48,572 distinct signatures, each representing a different transcript, were expressed at significant levels. These signatures were compared to the annotation of the A. thaliana genomic sequence; in the five libraries, this comparison yielded between 17,353 and 18,361 genes with sense expression, and between 5,487 and 8,729 genes with antisense expression. An additional 6,691 MPSS signatures mapped to unannotated regions of the genome. Expression was demonstrated for 1,168 genes for which expression data were previously unknown. Alternative polyadenylation was observed for more than 25% of A. thaliana genes transcribed in these libraries. The MPSS expression data suggest that the A. thaliana transcriptome is complex and contains many as-yet uncharacterized variants of normal coding transcripts.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Arabidopsis/genetics , Arabidopsis/metabolism , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Transcription, Genetic/genetics , Computing Methodologies , Expressed Sequence Tags , Gene Expression Profiling/methods , Gene Expression Regulation, Plant/genetics , Genome, Plant , Peptide Library
SELECTION OF CITATIONS
SEARCH DETAIL
...