Búsqueda | Portal Regional de la BVS

1.

Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition.

Zhang, Jiong; Yen, Ian E H; Ravikumar, Pradeep; Dhillon, Inderjit S.

JMLR Workshop Conf Proc ; 54: 1514-1522, 2017 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-28871272

RESUMEN

Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm [23], which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.

2.

Greedy Direction Method of Multiplier for MAP Inference of Large Output Domain.

Huang, Xiangru; Huang, Qixing; Yen, Ian E H; Ravikumar, Pradeep; Zhang, Ruohan; Dhillon, Inderjit S.

JMLR Workshop Conf Proc ; 54: 1550-1559, 2017 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-28871273

RESUMEN

Maximum-a-Posteriori (MAP) inference lies at the heart of Graphical Models and Structured Prediction. Despite the intractability of exact MAP inference, approximate methods based on LP relaxations have exhibited superior performance across a wide range of applications. Yet for problems involving large output domains (i.e., the state space for each variable is large), standard LP relaxations can easily give rise to a large number of variables and constraints which are beyond the limit of existing optimization algorithms. In this paper, we introduce an effective MAP inference method for problems with large output domains. The method builds upon alternating minimization of an Augmented Lagrangian that exploits the sparsity of messages through greedy optimization techniques. A key feature of our greedy approach is to introduce variables in an on-demand manner with a pre-built data structure over local factors. This results in a single-loop algorithm of sublinear cost per iteration and O(log(1/Îµ))-type iteration complexity to achieve Îµ sub-optimality. In addition, we introduce a variant of GDMM for binary MAP inference problems with a large number of factors. Empirically, the proposed algorithms demonstrate orders of magnitude speedup over state-of-the-art MAP inference techniques on MAP inference problems including Segmentation, Protein Folding, Graph Matching, and Multilabel prediction with pairwise interaction.

3.

Latent Feature Lasso.

Yen, Ian E H; Lee, Wei-Cheng; Chang, Sung-En; Suggala, Arun S; Lin, Shou-De; Ravikumar, Pradeep.

Proc Mach Learn Res ; 70: 3949-3957, 2017 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-31225526

RESUMEN

The latent feature model (LFM), proposed in (Griffiths & Ghahramani, 2005), but possibly with earlier origins, is a generalization of a mixture model, where each instance is generated not from a single latent class but from a combination of latent features. Thus, each instance has an associated latent binary feature incidence vector indicating the presence or absence of a feature. Due to its combinatorial nature, inference of LFMs is considerably intractable, and accordingly, most of the attention has focused on nonparametric LFMs, with priors such as the Indian Buffet Process (IBP) on infinite binary matrices. Recent efforts to tackle this complexity either still have computational complexity that is exponential, or sample complexity that is high-order polynomial w.r.t. the number of latent features. In this paper, we address this outstanding problem of tractable estimation of LFMs via a novel atomic-norm regularization, which gives an algorithm with polynomial run-time and sample complexity without impractical assumptions on the data distribution.

4.

A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery.

Yen, Ian E H; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

JMLR Workshop Conf Proc ; 48: 2272-2280, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27559428

RESUMEN

Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems.

5.

Dual Decomposed Learning with Factorwise Oracles for Structural SVMs of Large Output Domain.

Yen, Ian E H; Huang, Xiangru; Zhong, Kai; Zhang, Ruohan; Ravikumar, Pradeep; Dhillon, Inderjit S.

Adv Neural Inf Process Syst ; 29: 5030-5038, 2016 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-29657512

RESUMEN

Many applications of machine learning involve structured outputs with large domains, where learning of a structured predictor is prohibitive due to repetitive calls to an expensive inference oracle. In this work, we show that by decomposing training of a Structural Support Vector Machine (SVM) into a series of multiclass SVM problems connected through messages, one can replace an expensive structured oracle with Factorwise Maximization Oracles (FMOs) that allow efficient implementation of complexity sublinear to the factor domain. A Greedy Direction Method of Multiplier (GDMM) algorithm is then proposed to exploit the sparsity of messages while guarantees convergence to Îµ sub-optimality after O(log(1/Îµ)) passes of FMOs over every factor. We conduct experiments on chain-structured and fully-connected problems of large output domains, where the proposed approach is orders-of-magnitude faster than current state-of-the-art algorithms for training Structural SVMs.

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA