Mining ORESTES no-match database: can we still contribute to cancer transcriptome?

Fonseca, R. da S; Carraro, D. M; Brentani, H

Fonseca, R. da S; Carraro, D. M; Brentani, H.

Afiliación

Fonseca, R. da S; Hospital do Câncer. Laboratório de Bioinformática. São Paulo. BR
Carraro, D. M; Instituto Ludwig para Pesquisa sobre o Câncer. Laboratório de Análise de Expressão Gênica. São Paulo. BR
Brentani, H; Hospital do Câncer. Laboratório de Bioinformática. São Paulo. BR

Genet. mol. res. (Online) ; 5(1): 24-32, Mar. 31, 2006.

Article en En | LILACS | ID: lil-449149

Biblioteca responsable: BR1.1

ABSTRACT

ABSTRACT

The Human Cancer Genome Project generated about 1 million expressed sequence tags by the ORESTES method, principally with the aim of obtaining data from cancer. Of this total, 341,680 showed no similarity with sequences in the public transcript databases, referred to as [quot ]no-match[quot ]. Some of them represent low abundance or difficult to detect human transcripts, but part of these sequences represent genomic contamination or immature mRNA. We performed a bioinformatics pipeline to determine the novelty of ORESTES [quot ]no-match[quot ] datasets from prostate or breast tissues. We started with 14,908 clusters mapped on the human genome. A total of 2226 clusters originating from more than two libraries or singletons with gaps upon genome alignment were selected. Ninety-four clusters with canonical splice sites representing the most stringent criteria to be considered a gene were subjected to manual inspection regarding genomic hits. Of the manually inspected clusters, 49.6% contained new sequences where 42.2% were probable low-expression alternative forms of the characterized genes and 7.4% unpredicted genes. RT-PCR followed by sequencing was performed to validate the largest spliced sequence from 8 clusters, resulting in the confirmation of five sequences as true human transcript fragments. Some of them were differentially expressed between tumor and normal tissue by an in silico analysis. We can conclude that after clean up of the no-match dataset, we still have about 939 new exons and 165 unpredicted genes that could complete the prostate or breast transcriptome.

Asunto(s)

Humanos; Masculino; Femenino; Etiquetas de Secuencia Expresada; Sistemas de Lectura Abierta/genética; Neoplasias de la Mama/genética; Neoplasias de la Próstata/genética; Transcripción Genética/genética; Análisis por Conglomerados; Bases de Datos Genéticas; Genoma Humano/genética; Reacción en Cadena de la Polimerasa de Transcriptasa Inversa

Texto completo

Añadir a Mi BVS

Imprimir

XML

Buscar en Google

Texto completo: 1 Índice: LILACS Asunto principal: Neoplasias de la Próstata / Transcripción Genética / Neoplasias de la Mama / Sistemas de Lectura Abierta / Etiquetas de Secuencia Expresada Tipo de estudio: Prognostic_studies Límite: Female / Humans / Male Idioma: En Revista: Genet. mol. res. (Online) Asunto de la revista: BIOLOGIA MOLECULAR / GENETICA Año: 2006 Tipo del documento: Article

Texto completo

Añadir a Mi BVS

Imprimir

XML

Buscar en Google