Pesquisa | Portal Regional da BVS

Impact of missing data imputation methods on gene expression clustering and classification.

de Souto, Marcilio C P; Jaskowiak, Pablo A; Costa, Ivan G.

BMC Bioinformatics ; 16: 64, 2015 Feb 26.

Artigo em Inglês | MEDLINE | ID: mdl-25888091

RESUMO

BACKGROUND: Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. RESULTS AND CONCLUSIONS: We performed a broad analysis of the impact of five well-known missing value imputation methods on three clustering and four classification methods, in the context of 12 cancer gene expression datasets. We employed a statistical framework, for the first time in this field, to assess whether different imputation methods improve the performance of the clustering/classification methods. Our results suggest that the imputation methods evaluated have a minor impact on the classification and downstream clustering analyses. Simple methods such as replacing the missing values by mean or the median values performed as well as more complex strategies. The datasets analyzed in this study are available at http://costalab.org/Imputation/ .

Assuntos

Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Interpretação Estatística de Dados , Humanos

Guest editorial for special section on BSB 2012.

de Souto, Marcilio C P; Kann, Maricel.

IEEE/ACM Trans Comput Biol Bioinform ; 10(4): 817-8, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24490260

Assuntos

Biologia Computacional , Brasil , Humanos , Sociedades Científicas

Clustering cancer gene expression data: a comparative study.

de Souto, Marcilio C P; Costa, Ivan G; de Araujo, Daniel S A; Ludermir, Teresa B; Schliep, Alexander.

BMC Bioinformatics ; 9: 497, 2008 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-19038021

RESUMO

BACKGROUND: The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using "classic" clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. RESULTS/CONCLUSION: We present the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Our results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods. The data sets analyzed in this study are available at http://algorithmics.molgen.mpg.de/Supplements/CompCancer/.

Assuntos

Biologia Computacional/métodos , Perfilação da Expressão Gênica , Neoplasias/diagnóstico , Algoritmos , Análise por Conglomerados , DNA Complementar/metabolismo , Regulação Neoplásica da Expressão Gênica , Genes Neoplásicos , Humanos , Modelos Biológicos , Modelos Estatísticos , Família Multigênica , Neoplasias/genética , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos , Reconhecimento Automatizado de Padrão/métodos

Equivalence between RAM-based neural networks and probabilistic automata.

de Souto, Marcilio C P; Ludermir, Teresa B; de Oliveira, Wilson R.

IEEE Trans Neural Netw ; 16(4): 996-9, 2005 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-16121742

RESUMO

In this letter, the computational power of a class of random access memory (RAM)-based neural networks, called general single-layer sequential weightless neural networks (GSSWNNs), is analyzed. The theoretical results presented, besides helping the understanding of the temporal behavior of these networks, could also provide useful insights for the developing of new learning algorithms.

Assuntos

Algoritmos , Modelos Estatísticos , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador

The VIIth Brazilian Symposium on Artificial Neural Networks. Introduction by guest editors.

Ludermir, Teresa B; De Souto, Marcilio C P.

Int J Neural Syst ; 13(2): 55-7, 2003 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-12923917

Assuntos

Inteligência Artificial , Simulação por Computador , Congressos como Assunto , Redes Neurais de Computação , Humanos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA