RESUMO
In this report, we address the question of combining nonlinearities of neurons into networks for modeling increasingly varying and progressively more complex functions. A fundamental approach is the use of higher-level representations devised by restricted Boltzmann machines and (denoising) autoencoders. We present the Denoising Autoencoder Self-Organizing Map (DASOM) that integrates the latter into a hierarchically organized hybrid model where the front-end component is a grid of topologically ordered neurons. The approach is to interpose a layer of hidden representations between the input space and the neural lattice of the self-organizing map. In so doing the parameters are adjusted by the proposed unsupervised learning algorithm. The model therefore maintains the clustering properties of its predecessor, whereas by extending and enhancing its visualization capacity enables an inclusion and an analysis of the intermediate representation space. A comprehensive series of experiments comprising optical recognition of text and images, and cancer type clustering and categorization is used to demonstrate DASOM's efficiency, performance and projection capabilities.
Assuntos
Aprendizado de Máquina não Supervisionado/normas , Análise por Conglomerados , Humanos , Razão Sinal-RuídoRESUMO
The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.
Assuntos
Algoritmos , Biologia Computacional/métodos , Cadeias de Markov , Modelos Estatísticos , Proteínas/química , Análise por Conglomerados , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Redes Neurais de Computação , Conformação ProteicaRESUMO
A hybrid approach combining the Self-Organizing Map (SOM) and the Hidden Markov Model (HMM) is presented. The Self-Organizing Hidden Markov Model Map (SOHMMM) establishes a cross-section between the theoretic foundations and algorithmic realizations of its constituents. The respective architectures and learning methodologies are fused in an attempt to meet the increasing requirements imposed by the properties of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein chain molecules. The fusion and synergy of the SOM unsupervised training and the HMM dynamic programming algorithms bring forth a novel on-line gradient descent unsupervised learning algorithm, which is fully integrated into the SOHMMM. Since the SOHMMM carries out probabilistic sequence analysis with little or no prior knowledge, it can have a variety of applications in clustering, dimensionality reduction and visualization of large-scale sequence spaces, and also, in sequence discrimination, search and classification. Two series of experiments based on artificial sequence data and splice junction gene sequences demonstrate the SOHMMM's characteristics and capabilities.