Pesquisa | Portal Regional da BVS

1.

Optimization and Learning With Randomly Compressed Gradient Updates.

Huang, Zhanliang; Lei, Yunwen; Kabán, Ata.

Neural Comput ; 35(7): 1234-1287, 2023 Jun 12.

Artigo em Inglês | MEDLINE | ID: mdl-37187168

RESUMO

Gradient descent methods are simple and efficient optimization algorithms with widespread applications. To handle high-dimensional problems, we study compressed stochastic gradient descent (SGD) with low-dimensional gradient updates. We provide a detailed analysis in terms of both optimization rates and generalization rates. To this end, we develop uniform stability bounds for CompSGD for both smooth and nonsmooth problems, based on which we develop almost optimal population risk bounds. Then we extend our analysis to two variants of SGD: batch and mini-batch gradient descent. Furthermore, we show that these variants achieve almost optimal rates compared to their high-dimensional gradient setting. Thus, our results provide a way to reduce the dimension of gradient updates without affecting the convergence rate in the generalization analysis. Moreover, we show that the same result also holds in the differentially private setting, which allows us to reduce the dimension of added noise with "almost free" cost.

2.

SGFNNs: Signed Graph Filtering-based Neural Networks for Predicting Drug-Drug Interactions.

Chen, Ming; Jiang, Wei; Pan, Yi; Dai, Jianhua; Lei, Yunwen; Ji, Chunyan.

J Comput Biol ; 29(10): 1104-1116, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-35723646

RESUMO

Capturing comprehensive information about drug-drug interactions (DDIs) is one of the key tasks in public health and drug development. Recently, graph neural networks (GNNs) have received increasing attention in the drug discovery domain due to their capability of integrating drugs profiles and the network structure into a low-dimensional feature space for predicting links and classification. Most of GNN models for DDI predictions are built on an unsigned graph, which tends to represent associated nodes with similar embedding results. However, semantic correlation between drugs, such as degressive effects, or even adverse side reactions should be disassortative. In this study, we put forward signed GNNs to model assortative and disassortative relationships within drug pairs. Since negative links exclude direct generalization of spectral filters on unsigned graph, we divide the signed graph into two unsigned subgraphs to dedicate two spectral filters, which captures both commonality and difference of drug pairs. For drug representations we derive two signed graph filtering-based neural networks (SGFNNs) which integrate signed graph structures and drug node attributes. Moreover, we use an end-to-end framework for learning DDIs, where an SGFNN together with a discriminator is jointly trained under a problem-specific loss function. The experimental results on two prediction problems show that our framework can obtain significant improvements compared with baselines. The case study further verifies the validation of our method.

Assuntos

Descoberta de Drogas , Redes Neurais de Computação , Interações Medicamentosas , Semântica

3.

Learning Rates for Stochastic Gradient Descent With Nonconvex Objectives.

Lei, Yunwen; Tang, Ke.

IEEE Trans Pattern Anal Mach Intell ; 43(12): 4505-4511, 2021 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-33755555

RESUMO

Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.

4.

Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions.

Lei, Yunwen; Hu, Ting; Li, Guiying; Tang, Ke.

IEEE Trans Neural Netw Learn Syst ; 31(10): 4394-4400, 2020 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-31831449

RESUMO

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this article, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Hölder continuity of gradients. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.

5.

Analysis of Online Composite Mirror Descent Algorithm.

Lei, Yunwen; Zhou, Ding-Xuan.

Neural Comput ; 29(3): 825-860, 2017 03.

Artigo em Inglês | MEDLINE | ID: mdl-28095196

RESUMO

We study the convergence of the online composite mirror descent algorithm, which involves a mirror map to reflect the geometry of the data and a convex objective function consisting of a loss and a regularizer possibly inducing sparsity. Our error analysis provides convergence rates in terms of properties of the strongly convex differentiable mirror map and the objective function. For a class of objective functions with Hölder continuous gradients, the convergence rates of the excess (regularized) risk under polynomially decaying step sizes have the order [Formula: see text] after [Formula: see text] iterates. Our results improve the existing error analysis for the online composite mirror descent algorithm by avoiding averaging and removing boundedness assumptions, and they sharpen the existing convergence rates of the last iterate for online gradient descent without any boundedness assumptions. Our methodology mainly depends on a novel error decomposition in terms of an excess Bregman distance, refined analysis of self-bounding properties of the objective function, and the resulting one-step progress bounds.

6.

Generalization performance of radial basis function networks.

Lei, Yunwen; Ding, Lixin; Zhang, Wensheng.

IEEE Trans Neural Netw Learn Syst ; 26(3): 551-64, 2015 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-25720010

RESUMO

This paper studies the generalization performance of radial basis function (RBF) networks using local Rademacher complexities. We propose a general result on controlling local Rademacher complexities with the L1 -metric capacity. We then apply this result to estimate the RBF networks' complexities, based on which a novel estimation error bound is obtained. An effective approximation error bound is also derived by carefully investigating the Hölder continuity of the lp loss function's derivative. Furthermore, it is demonstrated that the RBF network minimizing an appropriately constructed structural risk admits a significantly better learning rate when compared with the existing results. An empirical study is also performed to justify the application of our structural risk in model selection.

7.

Refined rademacher chaos complexity bounds with applications to the multikernel learning problem.

Lei, Yunwen; Ding, Lixin.

Neural Comput ; 26(4): 739-60, 2014 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-24479777

RESUMO

Estimating the Rademacher chaos complexity of order two is important for understanding the performance of multikernel learning (MKL) machines. In this letter, we develop a novel entropy integral for Rademacher chaos complexities. As compared to the previous bounds, our result is much improved in that it introduces an adjustable parameter Îµ to prohibit the divergence of the involved integral. With the use of the iteration technique in Steinwart and Scovel (2007), we also apply our Rademacher chaos complexity bound to the MKL problems and improve existing learning rates.

Assuntos

Inteligência Artificial , Aprendizagem/fisiologia , Dinâmica não Linear , Reconhecimento Automatizado de Padrão , Algoritmos , Humanos

8.

Generalization ability of fractional polynomial models.

Lei, Yunwen; Ding, Lixin; Ding, Yiming.

Neural Netw ; 49: 59-73, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24140985

RESUMO

In this paper, the problem of learning the functional dependency between input and output variables from scattered data using fractional polynomial models (FPM) is investigated. The estimation error bounds are obtained by calculating the pseudo-dimension of FPM, which is shown to be equal to that of sparse polynomial models (SPM). A linear decay of the approximation error is obtained for a class of target functions which are dense in the space of continuous functions. We derive a structural risk analogous to the Schwartz Criterion and demonstrate theoretically that the model minimizing this structural risk can achieve a favorable balance between estimation and approximation errors. An empirical model selection comparison is also performed to justify the usage of this structural risk in selecting the optimal complexity index from the data. We show that the construction of FPM can be efficiently addressed by the variable projection method. Furthermore, our empirical study implies that FPM could attain better generalization performance when compared with SPM and cubic splines.

Assuntos

Algoritmos , Inteligência Artificial , Generalização Psicológica , Simulação por Computador , Análise dos Mínimos Quadrados , Modelos Neurológicos , Modelos Estatísticos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA