Shuffling-type gradient method with bandwidth-based step sizes for finite-sum optimization.

Liang, Yuqing; Yang, Yang; Liu, Jinlan; Xu, Dongpo

Liang, Yuqing; Yang, Yang; Liu, Jinlan; Xu, Dongpo.

Afiliação

Liang Y; Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China.
Yang Y; Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China.
Liu J; Department of Mathematics, Changchun Normal University, Changchun 130032, China. Electronic address: liujinlan@ccsfu.edu.cn.
Xu D; Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China. Electronic address: xudp100@nenu.edu.cn.

Neural Netw ; 179: 106514, 2024 Nov.

Article em En | MEDLINE | ID: mdl-39024708

ABSTRACT

ABSTRACT

Shuffling-type gradient method is a popular machine learning algorithm that solves finite-sum optimization problems by randomly shuffling samples during iterations. In this paper, we explore the convergence properties of shuffling-type gradient method under mild assumptions. Specifically, we employ the bandwidth-based step size strategy that covers both monotonic and non-monotonic step sizes, thereby providing a unified convergence guarantee in terms of step size. Additionally, we replace the lower bound assumption of the objective function with that of the loss function, thereby eliminating the restrictions on the variance and the second-order moment of stochastic gradient that are difficult to verify in practice. For non-convex objectives, we recover the last iteration convergence of shuffling-type gradient algorithm with a less cumbersome proof. Meanwhile, we also establish the convergence rate for the minimum iteration of gradient norms. Under the Polyak-Lojasiewicz (PL) condition, we prove that the function value of last iteration converges to the lower bound of the objective function. By selecting appropriate boundary functions, we further improve the previous sublinear convergence rate results. Overall, this paper contributes to the understanding of shuffling-type gradient method and its convergence properties, providing insights for optimizing finite-sum problems in machine learning. Finally, numerical experiments demonstrate the efficiency of shuffling-type gradient method with bandwidth-based step size and validate our theoretical results.

Assuntos

Algoritmos; Aprendizado de Máquina; Redes Neurais de Computação

Palavras-chave

Bandwidth-based step size; Last iteration convergence; Non-convex objectives; PL condition; Shuffling-type gradient algorithm

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Aprendizado de Máquina Idioma: En Revista: Neural Netw / Neural netw / Neural networks Assunto da revista: NEUROLOGIA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China País de publicação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google