Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
1.
J Comput Graph Stat ; 32(3): 938-949, 2023.
Article in English | MEDLINE | ID: mdl-37822489

ABSTRACT

Proximal Markov Chain Monte Carlo is a novel construct that lies at the intersection of Bayesian computation and convex optimization, which helped popularize the use of nondifferentiable priors in Bayesian statistics. Existing formulations of proximal MCMC, however, require hyperparameters and regularization parameters to be prespecified. In this work, we extend the paradigm of proximal MCMC through introducing a novel new class of nondifferentiable priors called epigraph priors. As a proof of concept, we place trend filtering, which was originally a nonparametric regression problem, in a parametric setting to provide a posterior median fit along with credible intervals as measures of uncertainty. The key idea is to replace the nonsmooth term in the posterior density with its Moreau-Yosida envelope, which enables the application of the gradient-based MCMC sampler Hamiltonian Monte Carlo. The proposed method identifies the appropriate amount of smoothing in a data-driven way, thereby automating regularization parameter selection. Compared with conventional proximal MCMC methods, our method is mostly tuning free, achieving simultaneous calibration of the mean, scale and regularization parameters in a fully Bayesian framework.

2.
Technometrics ; 65(1): 117-126, 2023.
Article in English | MEDLINE | ID: mdl-37448596

ABSTRACT

Building on previous research of Chi and Chi (2022), the current paper revisits estimation in robust structured regression under the L2E criterion. We adopt the majorization-minimization (MM) principle to design a new algorithm for updating the vector of regression coefficients. Our sharp majorization achieves faster convergence than the previous alternating proximal gradient descent algorithm (Chi and Chi, 2022). In addition, we reparameterize the model by substituting precision for scale and estimate precision via a modified Newton's method. This simplifies and accelerates overall estimation. We also introduce distance-to-set penalties to enable constrained estimation under nonconvex constraint sets. This tactic also improves performance in coefficient estimation and structure recovery. Finally, we demonstrate the merits of our improved tactics through a rich set of simulation examples and a real data application.

3.
Technometrics ; 65(4): 537-552, 2023.
Article in English | MEDLINE | ID: mdl-38213317

ABSTRACT

The growing prevalence of tensor data, or multiway arrays, in science and engineering applications motivates the need for tensor decompositions that are robust against outliers. In this paper, we present a robust Tucker decomposition estimator based on the L2 criterion, called the Tucker-L2E. Our numerical experiments demonstrate that Tucker-L2E has empirically stronger recovery performance in more challenging high-rank scenarios compared with existing alternatives. The appropriate Tucker-rank can be selected in a data-driven manner with cross-validation or hold-out validation. The practical effectiveness of Tucker-L2E is validated on real data applications in fMRI tensor denoising, PARAFAC analysis of fluorescence data, and feature extraction for classification of corrupted images.

4.
Stat Anal Data Min ; 15(3): 303-313, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35756358

ABSTRACT

Many machine learning algorithms depend on weights that quantify row and column similarities of a data matrix. The choice of weights can dramatically impact the effectiveness of the algorithm. Nonetheless, the problem of choosing weights has arguably not been given enough study. When a data matrix is completely observed, Gaussian kernel affinities can be used to quantify the local similarity between pairs of rows and pairs of columns. Computing weights in the presence of missing data, however, becomes challenging. In this paper, we propose a new method to construct row and column affinities even when data are missing by building off a co-clustering technique. This method takes advantage of solving the optimization problem for multiple pairs of cost parameters and filling in the missing values with increasingly smooth estimates. It exploits the coupled similarity structure among both the rows and columns of a data matrix. We show these affinities can be used to perform tasks such as data imputation, clustering, and matrix completion on graphs.

5.
J Comput Graph Stat ; 31(4): 1051-1062, 2022.
Article in English | MEDLINE | ID: mdl-36721836

ABSTRACT

We introduce a user-friendly computational framework for implementing robust versions of a wide variety of structured regression methods with the L2 criterion. In addition to introducing an algorithm for performing L2E regression, our framework enables robust regression with the L2 criterion for additional structural constraints, works without requiring complex tuning procedures on the precision parameter, can be used to identify heterogeneous subpopulations, and can incorporate readily available non-robust structured regression solvers. We provide convergence guarantees for the framework and demonstrate its flexibility with some examples. Supplementary materials for this article are available online.

6.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 4432-4435, 2021 11.
Article in English | MEDLINE | ID: mdl-34892203

ABSTRACT

Coronary bifurcation lesions are a leading cause of Coronary Artery Disease (CAD). Despite its prevalence, coronary bifurcation lesions remain difficult to treat due to our incomplete understanding of how various features of lesion anatomy synergistically disrupt normal hemodynamic flow. In this work, we employ an interpretable machine learning algorithm, the Classification and Regression Tree (CART), to model the impact of these geometric features on local hemodynamic quantities. We generate a synthetic arterial database via computational fluid dynamic simulations and apply the CART approach to predict the time averaged wall shear stress (TAWSS) at two different locations within the cardiac vasculature. Our experimental results show that CART can estimate a simple, interpretable, yet accurately predictive nonlinear model of TAWSS as a function of such features.Clinical relevance- The fitted tree models have the potential to refine predictions of disturbed hemodynamic flow based on an individual's cardiac and lesion anatomy and consequently makes progress towards personalized treatment planning for CAD patients.


Subject(s)
Coronary Artery Disease , Hemodynamics , Heart , Humans , Machine Learning , Stress, Mechanical
7.
J Comput Graph Stat ; 30(1): 115-124, 2021.
Article in English | MEDLINE | ID: mdl-34025100

ABSTRACT

Joint models are popular for analyzing data with multivariate responses. We propose a sparse multivariate single index model, where responses and predictors are linked by unspecified smooth functions and multiple matrix level penalties are employed to select predictors and induce low-rank structures across responses. An alternating direction method of multipliers (ADMM) based algorithm is proposed for model estimation. We demonstrate the effectiveness of proposed model in simulation studies and an application to a genetic association study.

8.
Sci Rep ; 11(1): 8145, 2021 04 14.
Article in English | MEDLINE | ID: mdl-33854076

ABSTRACT

Conventional invasive diagnostic imaging techniques do not adequately resolve complex Type B and C coronary lesions, which present unique challenges, require personalized treatment and result in worsened patient outcomes. These lesions are often excluded from large-scale non-invasive clinical trials and there does not exist a validated approach to characterize hemodynamic quantities and guide percutaneous intervention for such lesions. This work identifies key biomarkers that differentiate complex Type B and C lesions from simple Type A lesions by introducing and validating a coronary angiography-based computational fluid dynamic (CFD-CA) framework for intracoronary assessment in complex lesions at ultrahigh resolution. Among 14 patients selected in this study, 7 patients with Type B and C lesions were included in the complex lesion group including ostial, bifurcation, serial lesions and lesion where flow was supplied by collateral bed. Simple lesion group included 7 patients with lesions that were discrete, [Formula: see text] long and readily accessible. Intracoronary assessment was performed using CFD-CA framework and validated by comparing to clinically measured pressure-based index, such as FFR. Local pressure, endothelial shear stress (ESS) and velocity profiles were derived for all patients. We validates the accuracy of our CFD-CA framework and report excellent agreement with invasive measurements ([Formula: see text]). Ultra-high resolution achieved by the model enable physiological assessment in complex lesions and quantify hemodynamic metrics in all vessels up to 1mm in diameter. Importantly, we demonstrate that in contrast to traditional pressure-based metrics, there is a significant difference in the intracoronary hemodynamic forces, such as ESS, in complex lesions compared to simple lesions at both resting and hyperemic physiological states [n = 14, [Formula: see text]]. Higher ESS was observed in the complex lesion group ([Formula: see text] Pa) than in simple lesion group ([Formula: see text] Pa). Complex coronary lesions have higher ESS compared to simple lesions, such differential hemodynamic evaluation can provide much the needed insight into the increase in adverse outcomes for such patients and has incremental prognostic value over traditional pressure-based indices, such as FFR.


Subject(s)
Coronary Angiography/methods , Coronary Disease/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Computer Simulation , Coronary Disease/classification , Diagnosis, Differential , Hemodynamics , Humans , Shear Strength
9.
Bioinformatics ; 37(20): 3667-3669, 2021 Oct 25.
Article in English | MEDLINE | ID: mdl-33904580

ABSTRACT

SUMMARY: Biclustering is a generalization of clustering used to identify simultaneous grouping patterns in observations (rows) and features (columns) of a data matrix. Recently, the biclustering task has been formulated as a convex optimization problem. While this convex recasting of the problem has attractive properties, existing algorithms do not scale well. To address this problem and make convex biclustering a practical tool for analyzing larger data, we propose an implementation of fast convex biclustering called COBRAC to reduce the computing time by iteratively compressing problem size along with the solution path. We apply COBRAC to several gene expression datasets to demonstrate its effectiveness and efficiency. Besides the standalone version for COBRAC, we also developed a related online web server for online calculation and visualization of the downloadable interactive results. AVAILABILITY AND IMPLEMENTATION: The source code and test data are available at https://github.com/haidyi/cvxbiclustr or https://zenodo.org/record/4620218. The web server is available at https://cvxbiclustr.ericchi.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

10.
Comput Sci Eng ; 23(6): 42-51, 2021.
Article in English | MEDLINE | ID: mdl-35784398

ABSTRACT

Modern technologies produce a deluge of complicated data. In neuroscience, for example, minimally invasive experimental methods can take recordings of large populations of neurons at high resolution under a multitude of conditions. Such data arrays possess non-trivial interdependencies along each of their axes. Insights into these data arrays may lay the foundations of advanced treatments for nervous system disorders. The potential impacts of such data, however, will not be fully realized unless the techniques for analyzing them keep pace. Specifically, there is an urgent, growing need for methods for estimating the low-dimensional structure and geometry in big and noisy data arrays. This article reviews a framework for identifying complicated underlying patterns in such data and also recounts the key role that the Department of Energy Computational Sciences Graduate Fellowship played in setting the stage for this work to be done by the author.

11.
Article in English | MEDLINE | ID: mdl-33312074

ABSTRACT

Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and its computational and storage costs are polynomial in the size of the data. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising "blessing of dimensionality" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.

12.
Stat ; 8(1)2020.
Article in English | MEDLINE | ID: mdl-32655193

ABSTRACT

Canonical correlation analysis (CCA) is a multivariate analysis technique for estimating a linear relationship between two sets of measurements. Modern acquisition technologies, for example, those arising in neuroimaging and remote sensing, produce data in the form of multidimensional arrays or tensors. Classic CCA is not appropriate for dealing with tensor data due to the multidimensional structure and ultrahigh dimensionality of such modern data. In this paper, we present tensor CCA (TCCA) to discover relationships between two tensors while simultaneously preserving multidimensional structure of the tensors and utilizing substantially fewer parameters. Furthermore, we show how to employ a parsimonious covariance structure to gain additional stability and efficiency. We delineate population and sample problems for each model and propose efficient estimation algorithms with global convergence guarantees. Also we describe a probabilistic model for TCCA that enables the generation of synthetic data with desired canonical variates and correlations. Simulation studies illustrate the performance of our methods.

13.
IEEE Signal Process Mag ; 37(6): 160-173, 2020 Nov.
Article in English | MEDLINE | ID: mdl-33473243

ABSTRACT

Graph signal processing (GSP) is an important methodology for studying data residing on irregular structures. As acquired data is increasingly taking the form of multi-way tensors, new signal processing tools are needed to maximally utilize the multi-way structure within the data. In this paper, we review modern signal processing frameworks generalizing GSP to multi-way data, starting from graph signals coupled to familiar regular axes such as time in sensor networks, and then extending to general graphs across all tensor modes. This widely applicable paradigm motivates reformulating and improving upon classical problems and approaches to creatively address the challenges in tensor-based data. We synthesize common themes arising from current efforts to combine GSP with tensor analysis and highlight future directions in extending GSP to the multi-way paradigm.

14.
Biometrics ; 73(1): 10-19, 2017 03.
Article in English | MEDLINE | ID: mdl-27163413

ABSTRACT

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.


Subject(s)
Cluster Analysis , Data Interpretation, Statistical , Gene Regulatory Networks , Algorithms , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis
15.
PLoS Comput Biol ; 11(5): e1004228, 2015 May.
Article in English | MEDLINE | ID: mdl-25965340

ABSTRACT

The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.


Subject(s)
Cluster Analysis , Computational Biology/methods , Pattern Recognition, Automated/methods , Algorithms , Databases, Genetic , Gene Expression Profiling/methods , Humans , Software
16.
J Comput Graph Stat ; 24(4): 994-1013, 2015.
Article in English | MEDLINE | ID: mdl-27087770

ABSTRACT

Clustering is a fundamental problem in many scientific applications. Standard methods such as k-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of k-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work we present two splitting methods for solving the convex clustering problem. The first is an instance of the alternating direction method of multipliers (ADMM); the second is an instance of the alternating minimization algorithm (AMA). In contrast to previously considered algorithms, our ADMM and AMA formulations provide simple and unified frameworks for solving the convex clustering problem under the previously studied norms and open the door to potentially novel norms. We demonstrate the performance of our algorithm on both simulated and real data examples. While the differences between the two algorithms appear to be minor on the surface, complexity analysis and numerical experiments show AMA to be significantly more efficient. This article has supplemental materials available online.

17.
Math Program ; 146: 409-436, 2014 Aug 01.
Article in English | MEDLINE | ID: mdl-25392563

ABSTRACT

The problem of minimizing a continuously differentiable convex function over an intersection of closed convex sets is ubiquitous in applied mathematics. It is particularly interesting when it is easy to project onto each separate set, but nontrivial to project onto their intersection. Algorithms based on Newton's method such as the interior point method are viable for small to medium-scale problems. However, modern applications in statistics, engineering, and machine learning are posing problems with potentially tens of thousands of parameters or more. We revisit this convex programming problem and propose an algorithm that scales well with dimensionality. Our proposal is an instance of a sequential unconstrained minimization technique and revolves around three ideas: the majorization-minimization principle, the classical penalty method for constrained optimization, and quasi-Newton acceleration of fixed-point algorithms. The performance of our distance majorization algorithms is illustrated in several applications.

18.
Am Math Mon ; 121(2): 95-108, 2014 Feb.
Article in English | MEDLINE | ID: mdl-25242816

ABSTRACT

In a recent issue of this journal, Mordukhovich, Nam, and Salinas pose and solve an interesting non-differentiable generalization of the Heron problem in the framework of modern convex analysis. In the generalized Heron problem, one is given k + 1 closed convex sets in ℝ d equipped with its Euclidean norm and asked to find the point in the last set such that the sum of the distances to the first k sets is minimal. In later work, the authors generalize the Heron problem even further, relax its convexity assumptions, study its theoretical properties, and pursue subgradient algorithms for solving the convex case. Here, we revisit the original problem solely from the numerical perspective. By exploiting the majorization-minimization (MM) principle of computational statistics and rudimentary techniques from differential calculus, we are able to construct a very fast algorithm for solving the Euclidean version of the generalized Heron problem.

19.
Int Stat Rev ; 82(1): 46-70, 2014 Apr 01.
Article in English | MEDLINE | ID: mdl-25242858

ABSTRACT

Modern computational statistics is turning more and more to high-dimensional optimization to handle the deluge of big data. Once a model is formulated, its parameters can be estimated by optimization. Because model parsimony is important, models routinely include nondifferentiable penalty terms such as the lasso. This sober reality complicates minimization and maximization. Our broad survey stresses a few important principles in algorithm design. Rather than view these principles in isolation, it is more productive to mix and match them. A few well chosen examples illustrate this point. Algorithm derivation is also emphasized, and theory is downplayed, particularly the abstractions of the convex calculus. Thus, our survey should be useful and accessible to a broad audience.

20.
Comput Stat Data Anal ; 80: 117-128, 2014 Dec 01.
Article in English | MEDLINE | ID: mdl-25143662

ABSTRACT

Estimation of a covariance matrix or its inverse plays a central role in many statistical methods. For these methods to work reliably, estimated matrices must not only be invertible but also well-conditioned. The current paper introduces a novel prior to ensure a well-conditioned maximum a posteriori (MAP) covariance estimate. The prior shrinks the sample covariance estimator towards a stable target and leads to a MAP estimator that is consistent and asymptotically efficient. Thus, the MAP estimator gracefully transitions towards the sample covariance matrix as the number of samples grows relative to the number of covariates. The utility of the MAP estimator is demonstrated in two standard applications - discriminant analysis and EM clustering - in this sampling regime.

SELECTION OF CITATIONS
SEARCH DETAIL
...