Search | VHL Regional Portal

1.

Reducing the dimensionality of data with neural networks.

Hinton, G E; Salakhutdinov, R R.

Science ; 313(5786): 504-7, 2006 Jul 28.

Article in English | MEDLINE | ID: mdl-16873662

ABSTRACT

High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

2.

Classical and Bayesian inference in neuroimaging: theory.

Friston, K J; Penny, W; Phillips, C; Kiebel, S; Hinton, G; Ashburner, J.

Neuroimage ; 16(2): 465-83, 2002 Jun.

Article in English | MEDLINE | ID: mdl-12030832

ABSTRACT

This paper reviews hierarchical observation models, used in functional neuroimaging, in a Bayesian light. It emphasizes the common ground shared by classical and Bayesian methods to show that conventional analyses of neuroimaging data can be usefully extended within an empirical Bayesian framework. In particular we formulate the procedures used in conventional data analysis in terms of hierarchical linear models and establish a connection between classical inference and parametric empirical Bayes (PEB) through covariance component estimation. This estimation is based on an expectation maximization or EM algorithm. The key point is that hierarchical models not only provide for appropriate inference at the highest level but that one can revisit lower levels suitably equipped to make Bayesian inferences. Bayesian inferences eschew many of the difficulties encountered with classical inference and characterize brain responses in a way that is more directly predicated on what one is interested in. The motivation for Bayesian approaches is reviewed and the theoretical background is presented in a way that relates to conventional methods, in particular restricted maximum likelihood (ReML). This paper is a technical and theoretical prelude to subsequent papers that deal with applications of the theory to a range of important issues in neuroimaging. These issues include; (i) Estimating nonsphericity or variance components in fMRI time-series that can arise from serial correlations within subject, or are induced by multisubject (i.e., hierarchical) studies. (ii) Spatiotemporal Bayesian models for imaging data, in which voxels-specific effects are constrained by responses in other voxels. (iii) Bayesian estimation of nonlinear models of hemodynamic responses and (iv) principled ways of mixing structural and functional priors in EEG source reconstruction. Although diverse, all these estimation problems are accommodated by the PEB framework described in this paper.

Subject(s)

Bayes Theorem , Brain/physiology , Diagnostic Imaging , Algorithms , Humans , Likelihood Functions , Linear Models , Magnetic Resonance Imaging , Models, Neurological , Statistics as Topic/methods , Tomography, Emission-Computed

3.

Computation by neural networks.

Hinton, G E.

Nat Neurosci ; 3 Suppl: 1170, 2000 Nov.

Article in English | MEDLINE | ID: mdl-11127833

Subject(s)

Learning/physiology , Models, Neurological , Nerve Net/physiology , Neurons/physiology , Perception/physiology , Synapses/physiology , Animals , History, 20th Century , Humans , Nerve Net/cytology , Neurons/cytology

4.

SMEM algorithm for mixture models.

Ueda, N; Nakano, R; Ghahramani, Z; Hinton, G E.

Neural Comput ; 12(9): 2109-28, 2000 Sep.

Article in English | MEDLINE | ID: mdl-10976141

ABSTRACT

We present a split-and-merge expectation-maximization (SMEM) algorithm to overcome the local maxima problem in parameter estimation of finite mixture models. In the case of mixture models, local maxima often involve having too many components of a mixture model in one part of the space and too few in another, widely separated part of the space. To escape from such configurations, we repeatedly perform simultaneous split-and-merge operations using a new criterion for efficiently selecting the split-and-merge candidates. We apply the proposed algorithm to the training of gaussian mixtures and mixtures of factor analyzers using synthetic and real data and show the effectiveness of using the split-and-merge operations to improve the likelihood of both the training data and of held-out test data. We also show the practical usefulness of the proposed algorithm by applying it to image compression and pattern recognition problems.

Subject(s)

Algorithms , Pattern Recognition, Automated , Image Processing, Computer-Assisted , Likelihood Functions , Models, Neurological , Models, Statistical , Pattern Recognition, Visual

5.

Variational learning for switching state-space models.

Ghahramani, Z; Hinton, G E.

Neural Comput ; 12(4): 831-64, 2000 Apr.

Article in English | MEDLINE | ID: mdl-10770834

ABSTRACT

We introduce a new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learnsthe parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time-series models -- hidden Markov models and linear dynamical systems -- and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs, Jordan, Nowlan, & Hinton, 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact expectation maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log-likelihood and makes use of both the forward and backward recursions for hidden Markov models and the Kalman filter recursions for linear dynamical systems. We tested the algorithm on artificial data sets and a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching state-space models.

Subject(s)

Computer Simulation , Algorithms , Artificial Intelligence , Humans , Linear Models , Markov Chains , Models, Econometric , Models, Statistical , Neural Networks, Computer , Respiratory Mechanics/physiology , Sleep Apnea Syndromes/physiopathology , Stochastic Processes

6.

Variational learning in nonlinear gaussian belief networks.

Frey, B J; Hinton, G E.

Neural Comput ; 11(1): 193-213, 1999 Jan 01.

Article in English | MEDLINE | ID: mdl-9950729

ABSTRACT

We view perceptual tasks such as vision and speech recognition as inference problems where the goal is to estimate the posterior distribution over latent variables (e.g., depth in stereo vision) given the sensory input. The recent flurry of research in independent component analysis exemplifies the importance of inferring the continuous-valued latent variables of input data. The latent variables found by this method are linearly related to the input, but perception requires nonlinear inferences such as classification and depth estimation. In this article, we present a unifying framework for stochastic neural networks with nonlinear latent variables. Nonlinear units are obtained by passing the outputs of linear gaussian units through various nonlinearities. We present a general variational method that maximizes a lower bound on the likelihood of a training set and give results on two visual feature extraction problems. We also show how the variational method can be used for pattern classification and compare the performance of these nonlinear networks with other methods on the problem of handwritten digit recognition.

Subject(s)

Learning/physiology , Neural Networks, Computer , Nonlinear Dynamics , Depth Perception/physiology , Handwriting , Pattern Recognition, Automated , Pattern Recognition, Visual/physiology , Stochastic Processes

7.

Cascaded redundancy reduction.

de Sa, V R; Hinton, G E.

Network ; 9(1): 73-84, 1998 Feb.

Article in English | MEDLINE | ID: mdl-9861979

ABSTRACT

We describe a method for incrementally constructing a hierarchical generative model of an ensemble of binary data vectors. The model is composed of stochastic, binary, logistic units. Hidden units are added to the model one at a time with the goal of minimizing the information required to describe the data vectors using the model. In addition to the top-down generative weights that define the model, there are bottom-up recognition weights that determine the binary states of the hidden units given a data vector. Even though the stochastic generative model can produce each data vector in many ways, the recognition model is forced to pick just one of these ways. The recognition model therefore underestimates the ability of the generative model to predict the data, but this underestimation greatly simplifies the process of searching for the generative and recognition weights of a new hidden unit.

Subject(s)

Neural Networks, Computer , Algorithms , Stochastic Processes

8.

A comparison of statistical learning methods on the Gusto database.

Ennis, M; Hinton, G; Naylor, D; Revow, M; Tibshirani, R.

Stat Med ; 17(21): 2501-8, 1998 Nov 15.

Article in English | MEDLINE | ID: mdl-9819841

ABSTRACT

We apply a battery of modern, adaptive non-linear learning methods to a large real database of cardiac patient data. We use each method to predict 30 day mortality from a large number of potential risk factors, and we compare their performances. We find that none of the methods could outperform a relatively simple logistic regression model previously developed for this problem.

Subject(s)

Databases as Topic , Logistic Models , Myocardial Infarction/drug therapy , Myocardial Infarction/mortality , Neural Networks, Computer , Thrombolytic Therapy , Humans , Risk Factors , Survival Rate

9.

Aspartylglucosaminuria in a Canadian family.

Gordon, B A; Rupar, C A; Rip, J W; Haust, M D; Coulter-Mackie, M B; Scott, E; Hinton, G G.

Clin Invest Med ; 21(3): 114-23, 1998 Jun.

Article in English | MEDLINE | ID: mdl-9627765

ABSTRACT

Aspartylglucosaminuria (McKusick 208400) is a lysosomopathy associated with aspartylglucosaminidase (L-aspartamido-beta-N-acetylglucosamine amidohydrolase, EC 3.5.1.26) deficiency. It has been most frequently encountered in Finland, where the regional incidence may be as high as 1 in 3600 births. In North America it is very rare, having been reported in only 8 patients. We encountered 4 patients with aspartylglucosaminuria in a Canadian family of 12 siblings. The 4 siblings affected--2 brothers and 2 sisters--were apparently normal at birth; however, their developmental milestones, particularly speech, were slow, and they acquired only a simple vocabulary. Throughout life, there was a progressive coarsening of facial features; 3 had inguinal hernia and recurrent diarrhea; all became severely retarded and by the 4th decade showed evident deterioration of both cognitive and motor skills; 2 exhibited cyclical behavioural changes. Three of the siblings have died, at 33, 39 and 44 years of age. Two died of bronchopneumonia and 1 of asphyxiation following aspiration. In the urine of all 4 siblings, and in the 1 liver examined, we found 2-acetamido-1-N-(4-L-aspartyl)-2-deoxy-beta-D-glucosamine (GlcNAc-Asn) and alpha-D-mannose-(1,6)-beta-D-mannose-(1,4)-2-acetamido- 2-deoxy-beta-D-glucose-(1,4)-2-acetamido-1-N-(4-L-aspartyl)-2-deoxy-beta - D-glucosamine (Man2-GlcNAc2-Asn). Compared with the level of activity in controls, aspartylglucosaminidase activity was less than 2% in fibroblasts from 3 of the siblings, less than 0.5% in leukocytes from 1 sibling, and less than 1% in the liver of 1 sibling, whereas other acid hydrolase activities in these tissues were normal. Ultrastructural studies of skin showed that fibroblasts, endothelial cells and pericytes contained vacuoles with fine reticulo-floccular material. Glial and neuronal cells of the central nervous system showed similar inclusions as well as others composed of concentric or parallel membranous arrays intermingled with lipid droplets.

Subject(s)

Acetylglucosamine/analogs & derivatives , Aspartylglucosaminuria , Lysosomal Storage Diseases/genetics , Acetylglucosamine/urine , Adult , Aspartylglucosylaminase/genetics , Canada , Child , Female , Humans , Lysosomal Storage Diseases/urine , Male , Middle Aged , Pedigree

10.

Glove-TalkII--a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

Fels, S S; Hinton, G E.

IEEE Trans Neural Netw ; 9(1): 205-12, 1998.

Article in English | MEDLINE | ID: mdl-18252442

ABSTRACT

Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.

11.

Generative models for discovering sparse distributed representations.

Hinton, G E; Ghahramani, Z.

Philos Trans R Soc Lond B Biol Sci ; 352(1358): 1177-90, 1997 Aug 29.

Article in English | MEDLINE | ID: mdl-9304685

ABSTRACT

We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottom-up, top-down and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has been performed the connection strengths can be updated using a very simple learning rule that only requires locally available information. We demonstrate that the network learns to extract sparse, distributed, hierarchical representations.

Subject(s)

Logistic Models , Neural Networks, Computer , Perception , Algorithms , Cerebral Cortex/physiology , Humans , Normal Distribution , Sleep , Wakefulness

12.

Modeling the manifolds of images of handwritten digits.

Hinton, G E; Dayan, P; Revow, M.

IEEE Trans Neural Netw ; 8(1): 65-74, 1997.

Article in English | MEDLINE | ID: mdl-18255611

ABSTRACT

This paper describes two new methods for modeling the manifolds of digitized images of handwritten digits. The models allow a priori information about the structure of the manifolds to be combined with empirical data. Accurate modeling of the manifolds allows digits to be discriminated using the relative probability densities under the alternative models. One of the methods is grounded in principal components analysis, the other in factor analysis. Both methods are based on locally linear low-dimensional approximations to the underlying data manifold. Links with other methods that model the manifold are discussed.

13.

Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

Fels, S S; Hinton, G E.

IEEE Trans Neural Netw ; 8(5): 977-84, 1997.

Article in English | MEDLINE | ID: mdl-18255700

ABSTRACT

Glove-Talk II is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-Talk II uses several input devices, a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. With Glove-Talk II, the subject can speak slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.

14.

The Helmholtz machine.

Dayan, P; Hinton, G E; Neal, R M; Zemel, R S.

Neural Comput ; 7(5): 889-904, 1995 Sep.

Article in English | MEDLINE | ID: mdl-7584891

ABSTRACT

Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical self-supervised learning that may relate to the function of bottom-up and top-down cortical processing pathways.

Subject(s)

Pattern Recognition, Automated , Algorithms , Feedback , Humans , Models, Psychological , Pattern Recognition, Visual , Perception/physiology , Stochastic Processes

15.

The "wake-sleep" algorithm for unsupervised neural networks.

Hinton, G E; Dayan, P; Frey, B J; Neal, R M.

Science ; 268(5214): 1158-61, 1995 May 26.

Article in English | MEDLINE | ID: mdl-7761831

ABSTRACT

An unsupervised learning algorithm for a multilayer network of stochastic neurons is described. Bottom-up "recognition" connections convert the input into representations in successive hidden layers, and top-down "generative" connections reconstruct the representation in one layer from the representation in the layer above. In the "wake" phase, neurons are driven by recognition connections, and generative connections are adapted to increase the probability that they would reconstruct the correct activity vector in the layer below. In the "sleep" phase, neurons are driven by generative connections, and recognition connections are adapted to increase the probability that they would produce the correct activity vector in the layer above.

Subject(s)

Algorithms , Neural Networks, Computer , Probability , Stochastic Processes

16.

Simulating brain damage.

Hinton, G E; Plaut, D C; Shallice, T.

Sci Am ; 269(4): 76-82, 1993 Oct.

Article in English | MEDLINE | ID: mdl-8235551

Subject(s)

Brain Injuries/physiopathology , Computer Simulation , Nerve Net , Brain/physiology , Humans , Neurons/physiology

17.

Successful treatment of hereditary trembling chin with botulinum toxin.

Gordon, K; Cadera, W; Hinton, G.

J Child Neurol ; 8(2): 154-6, 1993 Apr.

Article in English | MEDLINE | ID: mdl-8505478

ABSTRACT

Hereditary trembling chin is an autosomal dominant condition characterized by recurrent bouts of tremor involving the chin. These episodes are precipitated by emotional upset. There has been considerable debate about the gravity of this condition. This may be a benign movement disorder; however, the rhythmic trembling of mentalis at rest or during times of stress in these patients is often misinterpreted as betraying an incipient emotional upset. For this reason, some patients with this condition may find it socially disabling. We have recently successfully treated one such family with regular botulinum toxin injections to the mentalis muscle.

Subject(s)

Botulinum Toxins/administration & dosage , Chin , Tremor/genetics , Adult , Child , Humans , Injections, Intramuscular , Male , Neurologic Examination/drug effects , Pedigree , Tremor/drug therapy

18.

Glove-Talk: a neural network interface between a data-glove and a speech synthesizer.

Fels, S S; Hinton, G E.

IEEE Trans Neural Netw ; 4(1): 2-8, 1993.

Article in English | MEDLINE | ID: mdl-18267698

ABSTRACT

To illustrate the potential of multilayer neural networks for adaptive interfaces, a VPL Data-Glove connected to a DECtalk speech synthesizer via five neural networks was used to implement a hand-gesture to speech system. Using minor variations of the standard backpropagation learning procedure, the complex mapping of hand movements to speech is learned using data obtained from a single ;speaker' in a simple training phase. With a 203 gesture-to-word vocabulary, the wrong word is produced less than 1% of the time, and no word is produced about 5% of the time. Adaptive control of the speaking rate and word stress is also available. The training times and final performance speed are improved by using small, separate networks for each naturally defined subtask. The system demonstrates that neural networks can be used to develop the complex mappings required in a high bandwidth interface that adapts to the individual user.

19.

How neural networks learn from experience.

Hinton, G E.

Sci Am ; 267(3): 144-51, 1992 Sep.

Article in English | MEDLINE | ID: mdl-1502516

Subject(s)

Learning , Neural Networks, Computer , Algorithms , Brain/physiology , Humans

20.

Self-organizing neural network that discovers surfaces in random-dot stereograms.

Becker, S; Hinton, G E.

Nature ; 355(6356): 161-3, 1992 Jan 09.

Article in English | MEDLINE | ID: mdl-1729650

ABSTRACT

The standard form of back-propagation learning is implausible as a model of perceptual learning because it requires an external teacher to specify the desired output of the network. We show how the external teacher can be replaced by internally derived teaching signals. These signals are generated by using the assumption that different parts of the perceptual input have common causes in the external world. Small modules that look at separate but related parts of the perceptual input discover these common causes by striving to produce outputs that agree with each other. The modules may look at different modalities (such as vision and touch), or the same modality at different times (for example, the consecutive two-dimensional views of a rotating three-dimensional object), or even spatially adjacent parts of the same image. Our simulations show that when our learning procedure is applied to adjacent patches of two-dimensional images, it allows a neural network that has no prior knowledge of the third dimension to discovery depth in random dot stereograms of curved surfaces.

Subject(s)

Artificial Intelligence , Neural Networks, Computer , Mathematics

ABSTRACT

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL