ABSTRACT
The efficacy of using Serial Analysis of Gene Expression (SAGE) to analyze the transcriptome of the model dicotyledonous plant Arabidopsis was assessed. We describe an iterative tag-to-gene matching process that exploits the availability of the whole genome sequence of Arabidopsis. The expression patterns of 98% of the annotated Arabidopsis genes could theoretically be evaluated through SAGE and using an iterative matching process 79% could be identified by a tag found at a unique site in the genome. A total of 145,170 reliable experimental tags from two Arabidopsis leaf tissue SAGE libraries were analyzed, of which 29,632 were distinct. The majority (93%) of the 12,988 experimental tags observed greater than once could be matched within the Arabidopsis genome. However, only 78% were matched to a single locus within the genome, reflecting the complexities associated with working in a highly duplicated genome. In addition to a comprehensive assessment of gene expression in Arabidopsis leaf tissue, we describe evidence of transcription from pseudo-genes as well as evidence of alternative mRNA processing and anti-sense transcription. This collection of experimental SAGE tags could be exploited to assist in the on-going annotation of the Arabidopsis genome.