Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
J Mach Learn Res ; 18(1): 110-114, 2017.
Article in English | MEDLINE | ID: mdl-29599649

ABSTRACT

SnapVX is a high-performance solver for convex optimization problems defined on networks. For problems of this form, SnapVX provides a fast and scalable solution with guaranteed global convergence. It combines the capabilities of two open source software packages: Snap.py and CVXPY. Snap.py is a large scale graph processing library, and CVXPY provides a general modeling framework for small-scale subproblems. SnapVX offers a customizable yet easy-to-use Python interface with "out-of-the-box" functionality. Based on the Alternating Direction Method of Multipliers (ADMM), it is able to efficiently store, analyze, parallelize, and solve large optimization problems from a variety of different applications. Documentation, examples, and more can be found on the SnapVX website at http://snap.stanford.edu/snapvx.

2.
KDD ; 2017: 205-213, 2017 Aug.
Article in English | MEDLINE | ID: mdl-29770256

ABSTRACT

Many important problems can be modeled as a system of interconnected entities, where each entity is recording time-dependent observations or measurements. In order to spot trends, detect anomalies, and interpret the temporal dynamics of such data, it is essential to understand the relationships between the different entities and how these relationships evolve over time. In this paper, we introduce the time-varying graphical lasso (TVGL), a method of inferring time-varying networks from raw time series data. We cast the problem in terms of estimating a sparse time-varying inverse covariance matrix, which reveals a dynamic network of interdependencies between the entities. Since dynamic network inference is a computationally expensive task, we derive a scalable message-passing algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in an efficient way. We also discuss several extensions, including a streaming algorithm to update the model and incorporate new observations in real time. Finally, we evaluate our TVGL algorithm on both real and synthetic datasets, obtaining interpretable results and outperforming state-of-the-art baselines in terms of both accuracy and scalability.

3.
KDD ; 2017: 215-223, 2017 Aug.
Article in English | MEDLINE | ID: mdl-29770257

ABSTRACT

Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios.

4.
Proc Mach Learn Res ; 54: 1302-1310, 2017 Apr.
Article in English | MEDLINE | ID: mdl-30931433

ABSTRACT

Markov random fields (MRFs) are a useful tool for modeling relationships present in large and high-dimensional data. Often, this data comes from various sources and can have diverse distributions, for example a combination of numerical, binary, and categorical variables. Here, we define the pairwise exponential Markov random field (PE-MRF), an approach capable of modeling exponential family distributions in heterogeneous domains. We develop a scalable method of learning the graphical structure across the variables by solving a regularized approximated maximum likelihood problem. Specifically, we first derive a tractable upper bound on the log-partition function. We then use this upper bound to derive the group graphical lasso, a generalization of the classic graphical lasso problem to heterogeneous domains. To solve this problem, we develop a fast algorithm based on the alternating direction method of multipliers (ADMM). We also prove that our estimator is sparsistent, with guaranteed recovery of the true underlying graphical structure, and that it has a polynomially faster runtime than the current state-of-the-art method for learning such distributions. Experiments on synthetic and real-world examples demonstrate that our approach is both efficient and accurate at uncovering the structure of heterogeneous data.

5.
KDD ; 2015: 387-396, 2015 Aug.
Article in English | MEDLINE | ID: mdl-27398260

ABSTRACT

Convex optimization is an essential tool for modern data analysis, as it provides a framework to formulate and solve many problems in machine learning and data mining. However, general convex optimization solvers do not scale well, and scalable solvers are often specialized to only work on a narrow class of problems. Therefore, there is a need for simple, scalable algorithms that can solve many common optimization problems. In this paper, we introduce the network lasso, a generalization of the group lasso to a network setting that allows for simultaneous clustering and optimization on graphs. We develop an algorithm based on the Alternating Direction Method of Multipliers (ADMM) to solve this problem in a distributed and scalable manner, which allows for guaranteed global convergence even on large graphs. We also examine a non-convex extension of this approach. We then demonstrate that many types of problems can be expressed in our framework. We focus on three in particular - binary classification, predicting housing prices, and event detection in time series data - comparing the network lasso to baseline approaches and showing that it is both a fast and accurate method of solving large optimization problems.

6.
Environ Manage ; 49(2): 502-15, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22037616

ABSTRACT

Ecosystem restoration in south Florida is a state and national priority centered on the Everglades wetlands. However, urban development pressures affect the restoration potential and remaining habitat functions of the natural undeveloped areas. Land use (LU) planning often focuses at the local level, but a better understanding of the cumulative effects of small projects at the landscape level is needed to support ecosystem restoration and preservation. The South Florida Ecosystem Portfolio Model (SFL EPM) is a regional LU planning tool developed to help stakeholders visualize LU scenario evaluation and improve communication about regional effects of LU decisions. One component of the SFL EPM is ecological value (EV), which is evaluated through modeled ecological criteria related to ecosystem services using metrics for (1) biodiversity potential, (2) threatened and endangered species, (3) rare and unique habitats, (4) landscape pattern and fragmentation, (5) water quality buffer potential, and (6) ecological restoration potential. In this article, we demonstrate the calculation of EV using two case studies: (1) assessing altered EV in the Biscayne Gateway area by comparing 2004 LU to potential LU in 2025 and 2050, and (2) the cumulative impact of adding limestone mines south of Miami. Our analyses spatially convey changing regional EV resulting from conversion of local natural and agricultural areas to urban, industrial, or extractive use. Different simulated local LU scenarios may result in different alterations in calculated regional EV. These case studies demonstrate methods that may facilitate evaluation of potential future LU patterns and incorporate EV into decision making.


Subject(s)
Environment , Models, Theoretical , Urbanization , Calcium Carbonate , Decision Making , Florida , Mining , Planning Techniques , Water Quality
SELECTION OF CITATIONS
SEARCH DETAIL
...