Search | VHL Regional Portal

Asynchronous Parallel Large-Scale Gaussian Process Regression.

Dang, Zhiyuan; Gu, Bin; Deng, Cheng; Huang, Heng.

IEEE Trans Neural Netw Learn Syst ; 35(6): 8683-8694, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38587955

ABSTRACT

Gaussian process regression (GPR) is an important nonparametric learning method in machine learning research with many real-world applications. It is well known that training large-scale GPR is a challenging task due to the required heavy computational cost and large volume memory. To address this challenging problem, in this article, we propose an asynchronous doubly stochastic gradient algorithm to handle the large-scale training of GPR. We formulate the GPR to a convex optimization problem, i.e., kernel ridge regression. After that, in order to efficiently solve this convex kernel problem, we first use the random feature mapping method to approximate the kernel model and then utilize two unbiased stochastic approximations, i.e., stochastic variance reduced gradient and stochastic coordinate descent, to update the solution asynchronously and in parallel. In this way, our algorithm scales well in both sample size and dimensionality, and speeds up the training computation. More importantly, we prove that our algorithm has a global linear convergence rate. Our experimental results on eight large-scale benchmark datasets with both regression and classification tasks show that the proposed algorithm outperforms the existing state-of-the-art GPR methods.

Deep Multiview Collaborative Clustering.

Yang, Xu; Deng, Cheng; Dang, Zhiyuan; Tao, Dacheng.

IEEE Trans Neural Netw Learn Syst ; 34(1): 516-526, 2023 Jan.

Article in English | MEDLINE | ID: mdl-34370671

ABSTRACT

The clustering methods have absorbed even-increasing attention in machine learning and computer vision communities in recent years. In this article, we focus on the real-world applications where a sample can be represented by multiple views. Traditional methods learn a common latent space for multiview samples without considering the diversity of multiview representations and use K -means to obtain the final results, which are time and space consuming. On the contrary, we propose a novel end-to-end deep multiview clustering model with collaborative learning to predict the clustering results directly. Specifically, multiple autoencoder networks are utilized to embed multi-view data into various latent spaces and a heterogeneous graph learning module is employed to fuse the latent representations adaptively, which can learn specific weights for different views of each sample. In addition, intraview collaborative learning is framed to optimize each single-view clustering task and provide more discriminative latent representations. Simultaneously, interview collaborative learning is employed to obtain complementary information and promote consistent cluster structure for a better clustering solution. Experimental results on several datasets show that our method significantly outperforms several state-of-the-art clustering approaches.

Large-Scale Nonlinear AUC Maximization via Triply Stochastic Gradients.

Dang, Zhiyuan; Li, Xiang; Gu, Bin; Deng, Cheng; Huang, Heng.

IEEE Trans Pattern Anal Mach Intell ; 44(3): 1385-1398, 2022 Mar.

Article in English | MEDLINE | ID: mdl-32946382

ABSTRACT

Learning to improve AUC performance for imbalanced data is an important machine learning research problem. Most methods of AUC maximization assume that the model function is linear in the original feature space. However, this assumption is not suitable for nonlinear separable problems. Although there have been some nonlinear methods of AUC maximization, scaling up nonlinear AUC maximization is still an open question. To address this challenging problem, in this paper, we propose a novel large-scale nonlinear AUC maximization method (named as TSAM) based on the triply stochastic gradient descents. Specifically, we first use the random Fourier feature to approximate the kernel function. After that, we use the triply stochastic gradients w.r.t. the pairwise loss and random feature to iteratively update the solution. Finally, we prove that TSAM converges to the optimal solution with the rate of O(1/t) after t iterations. Experimental results on a variety of benchmark datasets not only confirm the scalability of TSAM, but also show a significant reduction of computational time compared with existing batch learning algorithms, while retaining the similar generalization performance.

Scaling Up Generalized Kernel Methods.

Gu, Bin; Dang, Zhiyuan; Huo, Zhouyuan; Deng, Cheng; Huang, Heng.

IEEE Trans Pattern Anal Mach Intell ; 44(7): 3767-3778, 2022 Jul.

Article in English | MEDLINE | ID: mdl-33591910

ABSTRACT

Kernel methods have achieved tremendous success in the past two decades. In the current big data era, data collection has grown tremendously. However, existing kernel methods are not scalable enough both at the training and predicting steps. To address this challenge, in this paper, we first introduce a general sparse kernel learning formulation based on the random feature approximation, where the loss functions are possibly non-convex. In order to reduce the scale of random features required in experiment, we also use that formulation based on the orthogonal random feature approximation. Then we propose a new asynchronous parallel doubly stochastic algorithm for large scale sparse kernel learning (AsyDSSKL). To the best our knowledge, AsyDSSKL is the first algorithm with the techniques of asynchronous parallel computation and doubly stochastic optimization. We also provide a comprehensive convergence guarantee to AsyDSSKL. Importantly, the experimental results on various large-scale real-world datasets show that, our AsyDSSKL method has the significant superiority on the computational efficiency at the training and predicting steps over the existing kernel methods.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL