ABSTRACT
This is the first report of a systematic study of genes expressed by means of expressed sequence tag (EST) analysis in oil palm, a species of the Arecales order, a phylogenetically key clade of monocotyledons that is not widely represented in the sequence databases. Five different cDNA libraries were generated from male and female inflorescences, shoot apices and zygotic embryos and unidirectional systematic sequencing was performed. A total of 2411 valid EST sequences were thus obtained. Cluster analysis enabled the identification of 209 groups of related sequences and 1874 singletons. Putative functions were assigned to 1252 of the set of 2083 non-redundant ESTs obtained. The EST database described here is a first step towards gene discovery and cDNA array-based expression analysis in oil palm.
Subject(s)
Arecaceae/genetics , Expressed Sequence Tags , Base Sequence , Cluster Analysis , DNA, Complementary , DNA, Plant , Flowers/genetics , Flowers/growth & development , Gene Expression Profiling , Gene Library , Molecular Sequence Data , Plant Stems/genetics , Plant Stems/growth & development , Seeds/genetics , Seeds/growth & development , Sequence Analysis, DNAABSTRACT
Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Prot.