Search | VHL Regional Portal

Distributed Newton Methods for Deep Neural Networks.

Wang, Chien-Chih; Tan, Kent Loong; Chen, Chun-Ting; Lin, Yu-Hsiang; Keerthi, S Sathiya; Mahajan, Dhruv; Sundararajan, S; Lin, Chih-Jen.

Neural Comput ; 30(6): 1673-1724, 2018 06.

Article in English | MEDLINE | ID: mdl-29652589

ABSTRACT

Deep learning involves a difficult nonconvex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this letter, we focus on situations where the model is distributedly stored and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Compared with stochastic gradient methods, it is more robust and may give better test accuracy.

Support vector ordinal regression.

Chu, Wei; Keerthi, S Sathiya.

Neural Comput ; 19(3): 792-815, 2007 Mar.

Article in English | MEDLINE | ID: mdl-17298234

ABSTRACT

In this letter, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The sequential minimal optimization algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on some benchmark and real-world data sets, including applications of ordinal regression to information retrieval, verify the usefulness of these approaches.

Subject(s)

Algorithms , Artificial Intelligence , Logistic Models , Pattern Recognition, Automated/methods , Discriminant Analysis , Humans , Information Storage and Retrieval , Weights and Measures

Fast generalized cross-validation algorithm for sparse model learning.

Sundararajan, S; Shevade, Shirish; Keerthi, S Sathiya.

Neural Comput ; 19(1): 283-301, 2007 Jan.

Article in English | MEDLINE | ID: mdl-17134326

ABSTRACT

We propose a fast, incremental algorithm for designing linear regression models. The proposed algorithm generates a sparse model by optimizing multiple smoothing parameters using the generalized cross-validation approach. The performances on synthetic and real-world data sets are compared with other incremental algorithms such as Tipping and Faul's fast relevance vector machine, Chen et al.'s orthogonal least squares, and Orr's regularized forward selection. The results demonstrate that the proposed algorithm is competitive.

Subject(s)

Algorithms , Artificial Intelligence , Linear Models , Least-Squares Analysis

An improved conjugate gradient scheme to the solution of least squares SVM.

Chu, Wei; Ong, Chong Jin; Keerthi, S Sathiya.

IEEE Trans Neural Netw ; 16(2): 498-501, 2005 Mar.

Article in English | MEDLINE | ID: mdl-15787157

ABSTRACT

The least square support vector machines (LS-SVM) formulation corresponds to the solution of a linear system of equations. Several approaches to its numerical solutions have been proposed in the literature. In this letter, we propose an improved method to the numerical solution of LS-SVM and show that the problem can be solved using one reduced system of linear equations. Compared with the existing algorithm for LS-SVM, the approach used in this letter is about twice as efficient. Numerical results using the proposed method are provided for comparisons with other existing algorithms.

Subject(s)

Least-Squares Analysis

An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels.

Lee, Martin M S; Keerthi, S Sathiya; Ong, Chong Jin; DeCoste, Dennis.

IEEE Trans Neural Netw ; 15(3): 750-7, 2004 May.

Article in English | MEDLINE | ID: mdl-15384561

ABSTRACT

In this paper, we give an efficient method for computing the leave-one-out (LOO) error for support vector machines (SVMs) with Gaussian kernels quite accurately. It is particularly suitable for iterative decomposition methods of solving SVMs. The importance of various steps of the method is illustrated in detail by showing the performance on six benchmark datasets. The new method often leads to speedups of 10-50 times compared to standard LOO error computation. It has good promise for use in hyperparameter tuning and model comparison

Subject(s)

Computing Methodologies , Normal Distribution , Research Design/statistics & numerical data

Bayesian support vector regression using a unified loss function.

Chu, Wei; Keerthi, S Sathiya; Ong, Chong Jin.

IEEE Trans Neural Netw ; 15(1): 29-44, 2004 Jan.

Article in English | MEDLINE | ID: mdl-15387245

ABSTRACT

In this paper, we use a unified loss function, called the soft insensitive loss function, for Bayesian support vector regression. We follow standard Gaussian processes for regression to set up the Bayesian framework, in which the unified loss function is used in the likelihood evaluation. Under this framework, the maximum a posteriori estimate of the function values corresponds to the solution of an extended support vector regression problem. The overall approach has the merits of support vector regression such as convex quadratic programming and sparsity in solution representation. It also has the advantages of Bayesian methods for model adaptation and error bars of its predictions. Experimental results on simulated and real-world data sets indicate that the approach works well even on large data sets.

Subject(s)

Bayes Theorem , Regression Analysis

Asymptotic behaviors of support vector machines with Gaussian kernel.

Keerthi, S Sathiya; Lin, Chih-Jen.

Neural Comput ; 15(7): 1667-89, 2003 Jul.

Article in English | MEDLINE | ID: mdl-12816571

ABSTRACT

Support vector machines (SVMs) with the gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyperparameters: the penalty parameter C and the kernel width sigma. This letter analyzes the behavior of the SVM classifier when these hyperparameters take very small or very large values. Our results help in understanding the hyperparameter space that leads to an efficient heuristic method of searching for hyperparameter values with small generalization errors. The analysis also indicates that if complete model selection using the gaussian kernel has been conducted, there is no need to consider linear SVM.

Subject(s)

Models, Theoretical , Normal Distribution

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL