Mining microorganism EST databases in the quest for new proteins

Faria-Campos, A. C; Cerqueira, G. C; Anacleto, C; Carvalho, C. M. de; Ortega, J. M

Faria-Campos, A. C; Cerqueira, G. C; Anacleto, C; Carvalho, C. M. de; Ortega, J. M.

Faria-Campos, A. C; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR
Cerqueira, G. C; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR
Anacleto, C; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR
Carvalho, C. M. de; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR
Ortega, J. M; Universidade Federal de Minas Gerais. Instituto de Ciências Biológicas. Departamento de Bioquímica e Imunologia. Belo Horizonte. BR

Genet. mol. res. (Online) ; 2(1): 169-177, Mar. 2003.

Article in English | LILACS | ID: lil-417613

RESUMO

RESUMO

Microorganisms with large genomes are commonly the subjects of single-round partial sequencing of cDNA, generating expressed sequence tags (ESTs). Usually there is a great distance between gene discovery by EST projects and submission of amino acid sequences to public databases. We analyzed the relationship between available ESTs and protein sequences and used the sequences available in the secondary database, clusters of orthologous groups (COG), to investigate ESTs from eight microorganisms of medical and/or economic relevance, selecting for candidate ESTs that may be further pursued for protein characterization. The organisms chosen were Paracoccidioides brasiliensis, Dictyostelium discoideum, Fusarium graminearum, Plasmodium yoelii, Magnaporthe grisea, Emericella nidulans, Chlamydomonas reinhardtii and Eimeria tenella, which have more than 10,000 ESTs available in dbEST. A total of 77,114 protein sequences from COG were used, corresponding to 3,201 distinct genes. At least 212 of these were capable of identifying candidate ESTs for further studies (E. tenella). This number was extended to over 700 candidate ESTs (C. reinhardtii, F. graminearum). Remarkably, even the organism that presents the highest number of ESTs corresponding to known proteins, P. yoelii, showed a considerable number of candidate ESTs for protein characterization (477). For some organisms, such as P. brasiliensis, M. grisea and F. graminearum, bioinformatics has allowed for automatic annotation of up to about 20 of the ESTs that did not correspond to proteins already characterized in the organism. In conclusion, 4093 ESTs from these eight organisms that are homologous to COG genes were selected as candidates for protein characterization

Subject(s)

Animals; Databases, Protein; Expressed Sequence Tags; Sequence Analysis, Protein; Chlamydomonas reinhardtii/genetics; Dictyostelium/genetics; Eimeria tenella/genetics; Emericella/genetics; Fusarium/genetics; Genome; Magnaporthe/genetics; Paracoccidioides/genetics; Plasmodium yoelii/genetics; Proteins/genetics; Sequence Homology, Amino Acid

Fulltext

XML

Search on Google

Full text: Available Index: LILACS (Americas) Main subject: Expressed Sequence Tags / Sequence Analysis, Protein / Databases, Protein Limits: Animals Language: English Journal: Genet. mol. res. (Online) Journal subject: Molecular Biology / Genetics Year: 2003 Type: Article Affiliation country: Brazil Institution/Affiliation country: Universidade Federal de Minas Gerais/BR

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

Search on Google