Search | VHL Regional Portal

1.

Adaptive importance sampling to accelerate training of a neural probabilistic language model.

Bengio, Y; Senecal, J S.

IEEE Trans Neural Netw ; 19(4): 713-22, 2008 Apr.

Article in English | MEDLINE | ID: mdl-18390314

ABSTRACT

Previous work on statistical language modeling has shown that it is possible to train a feedforward neural network to approximate probabilities over sequences of words, resulting in significant error reduction when compared to standard baseline models based on n-grams. However, training the neural network model with the maximum-likelihood criterion requires computations proportional to the number of words in the vocabulary. In this paper, we introduce adaptive importance sampling as a way to accelerate training of the model. The idea is to use an adaptive n-gram model to track the conditional distributions produced by the neural network. We show that a very significant speedup can be obtained on standard problems.

Subject(s)

Language , Models, Statistical , Neural Networks, Computer , Programming Languages , Computer Simulation , Humans , Markov Chains

2.

Locally linear embedding for dimensionality reduction in QSAR.

L'Heureux, P J; Carreau, J; Bengio, Y; Delalleau, O; Yue, S Y.

J Comput Aided Mol Des ; 18(7-9): 475-82, 2004.

Article in English | MEDLINE | ID: mdl-15729847

ABSTRACT

Current practice in Quantitative Structure Activity Relationship (QSAR) methods usually involves generating a great number of chemical descriptors and then cutting them back with variable selection techniques. Variable selection is an effective method to reduce the dimensionality but may discard some valuable information. This paper introduces Locally Linear Embedding (LLE), a local non-linear dimensionality reduction technique, that can statistically discover a low-dimensional representation of the chemical data. LLE is shown to create more stable representations than other non-linear dimensionality reduction algorithms, and to be capable of capturing non-linearity in chemical data.

Subject(s)

Quantitative Structure-Activity Relationship , Artificial Intelligence , Humans , Learning , Least-Squares Analysis , Neural Networks, Computer

3.

Bias learning, knowledge sharing.

Ghosn, J; Bengio, Y.

IEEE Trans Neural Netw ; 14(4): 748-65, 2003.

Article in English | MEDLINE | ID: mdl-18238057

ABSTRACT

Biasing properly the hypothesis space of a learner has been shown to improve generalization performance. Methods for achieving this goal have been proposed, that range from designing and introducing a bias into a learner to automatically learning the bias. Multitask learning methods fall into the latter category. When several related tasks derived from the same domain are available, these methods use the domain-related knowledge coded in the training examples of all the tasks as a source of bias. We extend some of the ideas presented in this field and describe a new approach that identifies a family of hypotheses, represented by a manifold in hypothesis space, that embodies domain-related knowledge. This family is learned using training examples sampled from a group of related tasks. Learning models trained on these tasks are only allowed to select hypotheses that belong to the family. We show that the new approach encompasses a large variety of families which can be learned. A statistical analysis on a class of related tasks is performed that shows significantly improved performances when using this approach.

4.

Experiments on the application of IOHMMs to model financial returns series.

Bengio, Y; Lauzon, V P; Ducharme, R.

IEEE Trans Neural Netw ; 12(1): 113-23, 2001.

Article in English | MEDLINE | ID: mdl-18244367

ABSTRACT

Input-output hidden Markov models (IOHMM) are conditional hidden Markov models in which the emission (and possibly the transition) probabilities can be conditioned on an input sequence. For example, these conditional distributions can be linear, logistic, or nonlinear (using for example multilayer neural networks). We compare the generalization performance of several models which are special cases of input-output hidden Markov models on financial time-series prediction tasks: an unconditional Gaussian, a conditional linear Gaussian, a mixture of Gaussians, a mixture of conditional linear Gaussians, a hidden Markov model, and various IOHMMs. The experiments compare these models on predicting the conditional density of returns of market and sector indices. Note that the unconditional Gaussian estimates the first moment with the historical average. The results show that, although for the first moment the historical average gives the best results, for the higher moments, the IOHMMs yielded significantly better performance, as estimated by the out-of-sample likelihood.

5.

Cost functions and model combination for VaR-based asset allocation using neural networks.

Chapados, N; Bengio, Y.

IEEE Trans Neural Netw ; 12(4): 890-906, 2001.

Article in English | MEDLINE | ID: mdl-18249920

ABSTRACT

We introduce an asset-allocation framework based on the active control of the value-at-risk of the portfolio. Within this framework, we compare two paradigms for making the allocation using neural networks. The first one uses the network to make a forecast of asset behavior, in conjunction with a traditional mean-variance allocator for constructing the portfolio. The second paradigm uses the network to directly make the portfolio allocation decisions. We consider a method for performing soft input variable selection, and show its considerable utility. We use model combination (committee) methods to systematize the choice of hyperparameters during training. We show that committees using both paradigms are significantly outperforming the benchmark market performance.

6.

Boosting neural networks.

Schwenk, H; Bengio, Y.

Neural Comput ; 12(8): 1869-87, 2000 Aug.

Article in English | MEDLINE | ID: mdl-10953242

ABSTRACT

Boosting is a general method for improving the performance of learning algorithms. A recently proposed boosting algorithm, AdaBoost, has been applied with great success to several benchmark machine learning problems using mainly decision trees as base classifiers. In this article we investigate whether AdaBoost also works as well with neural networks, and we discuss the advantages and drawbacks of different versions of the AdaBoost algorithm. In particular, we compare training methods based on sampling the training set and weighting the cost function. The results suggest that random resampling of the training data is not the main explanation of the success of the improvements brought by AdaBoost. This is in contrast to bagging, which directly aims at reducing variance and for which random resampling is essential to obtain the reduction in generalization error. Our system achieves about 1.4% error on a data set of on-line handwritten digits from more than 200 writers. A boosted multilayer network achieved 1.5% error on the UCI letters and 8.1% error on the UCI satellite data set, which is significantly better than boosted decision trees.

Subject(s)

Algorithms , Neural Networks, Computer , Handwriting

7.

Gradient-based optimization of hyperparameters.

Bengio, Y.

Neural Comput ; 12(8): 1889-900, 2000 Aug.

Article in English | MEDLINE | ID: mdl-10953243

ABSTRACT

Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyperparameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyperparameter gradient involving second derivatives of the training criterion.

Subject(s)

Algorithms , Artificial Intelligence

8.

Taking on the curse of dimensionality in joint distributions using neural networks.

Bengio, S; Bengio, Y.

IEEE Trans Neural Netw ; 11(3): 550-7, 2000.

Article in English | MEDLINE | ID: mdl-18249784

ABSTRACT

The curse of dimensionality is severe when modeling high-dimensional discrete data: the number of possible combinations of the variables explodes exponentially. In this paper, we propose a new architecture for modeling high-dimensional data that requires resources (parameters and computations) that grow at most as the square of the number of variables, using a multilayer neural network to represent the joint distribution of the variables as the product of conditional distributions. The neural network can be interpreted as a graphical model without hidden random variables, but in which the conditional distributions are tied through the hidden units. The connectivity of the neural network can be pruned by using dependency tests between the variables (thus reducing significantly the number of parameters). Experiments on modeling the distribution of several discrete data sets show statistically significant improvements over other methods such as naive Bayes and comparable Bayesian networks and show that significant improvements can be obtained by pruning the network.

9.

Using a financial training criterion rather than a prediction criterion.

Bengio, Y.

Int J Neural Syst ; 8(4): 433-43, 1997 Aug.

Article in English | MEDLINE | ID: mdl-9730019

ABSTRACT

The application of this work is to decision making with financial time series, using learning algorithms. The traditional approach is to train a model using a prediction criterion, such as minimizing the squared error between predictions and actual values of a dependent variable, or maximizing the likelihood of a conditional model of the dependent variable. We find here with noisy time series that better results can be obtained when the model is directly trained in order to maximize the financial criterion of interest, here gains and losses (including those due to transactions) incurred during trading. Experiments were performed on portfolio selection with 35 Canadian stocks.

Subject(s)

Artificial Intelligence , Models, Economic , Neural Networks, Computer , Algorithms

10.

Input-output HMMs for sequence processing.

Bengio, Y; Frasconi, P.

IEEE Trans Neural Netw ; 7(5): 1231-49, 1996.

Article in English | MEDLINE | ID: mdl-18263517

ABSTRACT

We consider problems of sequence processing and propose a solution based on a discrete-state model in order to represent past context. We introduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call input-output hidden Markov model (IOHMM). It can be trained by the estimation-maximization (EM) or generalized EM (GEM) algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.

11.

LeRec: a NN/HMM hybrid for on-line handwriting recognition.

Bengio, Y; LeCun, Y; Nohl, C; Burges, C.

Neural Comput ; 7(6): 1289-303, 1995 Nov.

Article in English | MEDLINE | ID: mdl-7584903

ABSTRACT

We introduce a new approach for on-line recognition of handwritten words written in unconstrained mixed style. The preprocessor performs a word-level normalization by fitting a model of the word structure using the EM algorithm. Words are then coded into low resolution "annotated images" where each pixel contains information about trajectory direction and curvature. The recognizer is a convolution network that can be spatially replicated. From the network output, a hidden Markov model produces word scores. The entire system is globally trained to minimize word-level errors.

Subject(s)

Algorithms , Handwriting , Neural Networks, Computer , Pattern Recognition, Automated , Humans , Markov Chains , Reproducibility of Results

12.

Learning long-term dependencies with gradient descent is difficult.

Bengio, Y; Simard, P; Frasconi, P.

IEEE Trans Neural Netw ; 5(2): 157-66, 1994.

Article in English | MEDLINE | ID: mdl-18267787

ABSTRACT

Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered.

13.

Global optimization of a neural network-hidden Markov model hybrid.

Bengio, Y; De Mori, R; Flammia, G; Kompe, R.

IEEE Trans Neural Netw ; 3(2): 252-9, 1992.

Article in English | MEDLINE | ID: mdl-18276426

ABSTRACT

The integration of multilayered and recurrent artificial neural networks (ANNs) with hidden Markov models (HMMs) is addressed. ANNs are suitable for approximating functions that compute new acoustic parameters, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported.

14.

Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network.

Bengio, Y; Pouliot, Y.

Comput Appl Biosci ; 6(4): 319-24, 1990 Oct.

Article in English | MEDLINE | ID: mdl-2257492

ABSTRACT

A neural network was trained using back propagation to recognize immunoglobulin domains from amino acid sequences. The program was designed to identify proteins exhibiting such domains with minimal rates of false positives and false negatives. The National Biomedical Research Foundation NEW protein sequences database was scanned to evaluate the performance of the program in recognizing mouse immunoglobulin sequences. The program correctly recognized 55 out of 56 mouse immunoglobulin sequences, corresponding to a recognition efficiency of 98.2% with an overall false positive rate of 7.3%. These data demonstrate that neural network-based search programs are well suited to search for sequences characterized by only a few well-conserved subsequences.

Subject(s)

Algorithms , Artificial Intelligence , Immunoglobulins/chemistry , Amino Acid Sequence , Animals , Cattle , False Negative Reactions , Humans , Mice , Microcomputers , Molecular Sequence Data , Rats , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL