Pesquisa | Portal Regional da BVS

Distributed Newton Methods for Deep Neural Networks.

Wang, Chien-Chih; Tan, Kent Loong; Chen, Chun-Ting; Lin, Yu-Hsiang; Keerthi, S Sathiya; Mahajan, Dhruv; Sundararajan, S; Lin, Chih-Jen.

Neural Comput ; 30(6): 1673-1724, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29652589

RESUMO

Deep learning involves a difficult nonconvex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this letter, we focus on situations where the model is distributedly stored and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Compared with stochastic gradient methods, it is more robust and may give better test accuracy.

Support vector ordinal regression.

Chu, Wei; Keerthi, S Sathiya.

Neural Comput ; 19(3): 792-815, 2007 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-17298234

RESUMO

In this letter, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The sequential minimal optimization algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on some benchmark and real-world data sets, including applications of ordinal regression to information retrieval, verify the usefulness of these approaches.

Assuntos

Algoritmos , Inteligência Artificial , Modelos Logísticos , Reconhecimento Automatizado de Padrão/métodos , Análise Discriminante , Humanos , Armazenamento e Recuperação da Informação , Pesos e Medidas

Fast generalized cross-validation algorithm for sparse model learning.

Sundararajan, S; Shevade, Shirish; Keerthi, S Sathiya.

Neural Comput ; 19(1): 283-301, 2007 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-17134326

RESUMO

We propose a fast, incremental algorithm for designing linear regression models. The proposed algorithm generates a sparse model by optimizing multiple smoothing parameters using the generalized cross-validation approach. The performances on synthetic and real-world data sets are compared with other incremental algorithms such as Tipping and Faul's fast relevance vector machine, Chen et al.'s orthogonal least squares, and Orr's regularized forward selection. The results demonstrate that the proposed algorithm is competitive.

Assuntos

Algoritmos , Inteligência Artificial , Modelos Lineares , Análise dos Mínimos Quadrados

An improved conjugate gradient scheme to the solution of least squares SVM.

Chu, Wei; Ong, Chong Jin; Keerthi, S Sathiya.

IEEE Trans Neural Netw ; 16(2): 498-501, 2005 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-15787157

RESUMO

The least square support vector machines (LS-SVM) formulation corresponds to the solution of a linear system of equations. Several approaches to its numerical solutions have been proposed in the literature. In this letter, we propose an improved method to the numerical solution of LS-SVM and show that the problem can be solved using one reduced system of linear equations. Compared with the existing algorithm for LS-SVM, the approach used in this letter is about twice as efficient. Numerical results using the proposed method are provided for comparisons with other existing algorithms.

Assuntos

Análise dos Mínimos Quadrados

An efficient method for computing leave-one-out error in support vector machines with Gaussian kernels.

Lee, Martin M S; Keerthi, S Sathiya; Ong, Chong Jin; DeCoste, Dennis.

IEEE Trans Neural Netw ; 15(3): 750-7, 2004 May.

Artigo em Inglês | MEDLINE | ID: mdl-15384561

RESUMO

In this paper, we give an efficient method for computing the leave-one-out (LOO) error for support vector machines (SVMs) with Gaussian kernels quite accurately. It is particularly suitable for iterative decomposition methods of solving SVMs. The importance of various steps of the method is illustrated in detail by showing the performance on six benchmark datasets. The new method often leads to speedups of 10-50 times compared to standard LOO error computation. It has good promise for use in hyperparameter tuning and model comparison

Assuntos

Metodologias Computacionais , Distribuição Normal , Projetos de Pesquisa/estatística & dados numéricos

Bayesian support vector regression using a unified loss function.

Chu, Wei; Keerthi, S Sathiya; Ong, Chong Jin.

IEEE Trans Neural Netw ; 15(1): 29-44, 2004 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-15387245

RESUMO

In this paper, we use a unified loss function, called the soft insensitive loss function, for Bayesian support vector regression. We follow standard Gaussian processes for regression to set up the Bayesian framework, in which the unified loss function is used in the likelihood evaluation. Under this framework, the maximum a posteriori estimate of the function values corresponds to the solution of an extended support vector regression problem. The overall approach has the merits of support vector regression such as convex quadratic programming and sparsity in solution representation. It also has the advantages of Bayesian methods for model adaptation and error bars of its predictions. Experimental results on simulated and real-world data sets indicate that the approach works well even on large data sets.

Assuntos

Teorema de Bayes , Análise de Regressão

Asymptotic behaviors of support vector machines with Gaussian kernel.

Keerthi, S Sathiya; Lin, Chih-Jen.

Neural Comput ; 15(7): 1667-89, 2003 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-12816571

RESUMO

Support vector machines (SVMs) with the gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyperparameters: the penalty parameter C and the kernel width sigma. This letter analyzes the behavior of the SVM classifier when these hyperparameters take very small or very large values. Our results help in understanding the hyperparameter space that leads to an efficient heuristic method of searching for hyperparameter values with small generalization errors. The analysis also indicates that if complete model selection using the gaussian kernel has been conducted, there is no need to consider linear SVM.

Assuntos

Modelos Teóricos , Distribuição Normal

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA