RESUMO
Most studies concerning expression data analyses usually exploit information on the variability of gene intensity across samples. This information is sensitive to initial data processing, which affects the final conclusions. However expression data contains scale-free information, which is directly comparable between different samples. We propose to use the pairwise ratio of gene expression values rather than their absolute intensities for a classification of expression data. This information is stable to data processing and thus more attractive for classification analyses. In proposed schema of data analyses only information on relative gene expression levels in each sample is exploited. Testing on publicly available datasets leads to superior classification results.
Assuntos
Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Perfilação da Expressão Gênica , Glioma/classificação , Glioma/genética , Neoplasias/classificação , Neoplasias/genética , Bases de Dados como Assunto/estatística & dados numéricos , Feminino , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação Neoplásica da Expressão Gênica , Humanos , Análise em MicrossériesRESUMO
The Maximal Margin (MAMA) linear programming classification algorithm has recently been proposed and tested for cancer classification based on expression data. It demonstrated sound performance on publicly available expression datasets. We developed a web interface to allow potential users easy access to the MAMA classification tool. Basic and advanced options provide flexibility in exploitation. The input data format is the same as that used in most publicly available datasets. This makes the web resource particularly convenient for non-expert machine learning users working in the field of expression data analysis.
Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Internet , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Programação Linear , Software , Inteligência Artificial , Interface Usuário-ComputadorRESUMO
The PEDANT genome database (http://pedant.gsf.de) provides exhaustive automatic analysis of genomic sequences by a large variety of established bioinformatics tools through a comprehensive Web-based user interface. One hundred and seventy seven completely sequenced and unfinished genomes have been processed so far, including large eukaryotic genomes (mouse, human) published recently. In this contribution, we describe the current status of the PEDANT database and novel analytical features added to the PEDANT server in 2002. Those include: (i) integration with the BioRS data retrieval system which allows fast text queries, (ii) pre-computed sequence clusters in each complete genome, (iii) a comprehensive set of tools for genome comparison, including genome comparison tables and protein function prediction based on genomic context, and (iv) computation and visualization of protein-protein interaction (PPI) networks based on experimental data. The availability of functional and structural predictions for 650 000 genomic proteins in well organized form makes PEDANT a useful resource for both functional and structural genomics.