Search | VHL Regional Portal

1.

Local hypergraph clustering using capacity releasing diffusion.

Ibrahim, Rania; Gleich, David F.

PLoS One ; 15(12): e0243485, 2020.

Article in English | MEDLINE | ID: mdl-33362247

ABSTRACT

Local graph clustering is an important machine learning task that aims to find a well-connected cluster near a set of seed nodes. Recent results have revealed that incorporating higher order information significantly enhances the results of graph clustering techniques. The majority of existing research in this area focuses on spectral graph theory-based techniques. However, an alternative perspective on local graph clustering arises from using max-flow and min-cut on the objectives, which offer distinctly different guarantees. For instance, a new method called capacity releasing diffusion (CRD) was recently proposed and shown to preserve local structure around the seeds better than spectral methods. The method was also the first local clustering technique that is not subject to the quadratic Cheeger inequality by assuming a good cluster near the seed nodes. In this paper, we propose a local hypergraph clustering technique called hypergraph CRD (HG-CRD) by extending the CRD process to cluster based on higher order patterns, encoded as hyperedges of a hypergraph. Moreover, we theoretically show that HG-CRD gives results about a quantity called motif conductance, rather than a biased version used in previous experiments. Experimental results on synthetic datasets and real world graphs show that HG-CRD enhances the clustering quality.

Subject(s)

Machine Learning , Algorithms , Cluster Analysis , Diffusion

2.

Non-Exhaustive, Overlapping Clustering.

Whang, Joyce Jiyoung; Hou, Yangyang; Gleich, David F; Dhillon, Inderjit S.

IEEE Trans Pattern Anal Mach Intell ; 41(11): 2644-2659, 2019 Nov.

Article in English | MEDLINE | ID: mdl-30080141

ABSTRACT

Traditional clustering algorithms, such as K-Means, output a clustering that is disjoint and exhaustive, i.e., every single data point is assigned to exactly one cluster. However, in many real-world datasets, clusters can overlap and there are often outliers that do not belong to any cluster. While this is a well-recognized problem, most existing algorithms address either overlap or outlier detection and do not tackle the problem in a unified way. In this paper, we propose an intuitive objective function, which we call the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective, that captures the issues of overlap and non-exhaustiveness in a unified manner. Our objective function can be viewed as a reformulation of the traditional K-Means objective, with easy-to-understand parameters that capture the degrees of overlap and non-exhaustiveness. By considering an extension to weighted kernel K-Means, we show that we can also apply our NEO-K-Means idea to overlapping community detection, which is an important task in network analysis. To optimize the NEO-K-Means objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using low-rank semidefinite programming techniques. Our experimental results show that the new objective and algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness; for the case of graphs, we show that our method outperforms state-of-the-art overlapping community detection algorithms.

3.

Multimodal network diffusion predicts future disease-gene-chemical associations.

Lin, Chih-Hsu; Konecki, Daniel M; Liu, Meng; Wilson, Stephen J; Nassar, Huda; Wilkins, Angela D; Gleich, David F; Lichtarge, Olivier.

Bioinformatics ; 35(9): 1536-1543, 2019 05 01.

Article in English | MEDLINE | ID: mdl-30304494

ABSTRACT

MOTIVATION: Precision medicine is an emerging field with hopes to improve patient treatment and reduce morbidity and mortality. To these ends, computational approaches have predicted associations among genes, chemicals and diseases. Such efforts, however, were often limited to using just some available association types. This lowers prediction coverage and, since prior evidence shows that integrating heterogeneous data is likely beneficial, it may limit accuracy. Therefore, we systematically tested whether using more association types improves prediction. RESULTS: We study multimodal networks linking diseases, genes and chemicals (drugs) by applying three diffusion algorithms and varying information content. Ten-fold cross-validation shows that these networks are internally consistent, both within and across association types. Also, diffusion methods recovered missing edges, even if all the edges from an entire mode of association were removed. This suggests that information is transferable between these association types. As a realistic validation, time-stamped experiments simulated the predictions of future associations based solely on information known prior to a given date. The results show that many future published results are predictable from current associations. Moreover, in most cases, using more association types increases prediction coverage without significantly decreasing sensitivity and specificity. In case studies, literature-supported validation shows that these predictions mimic human-formulated hypotheses. Overall, this study suggests that diffusion over a more comprehensive multimodal network will generate more useful hypotheses of associations among diseases, genes and chemicals, which may guide the development of precision therapies. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/LichtargeLab/multimodal-network-diffusion. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Computational Biology , Diffusion , Humans , Precision Medicine

4.

Gauss's law for networks directly reveals community boundaries.

Sinha, Ayan; Gleich, David F; Ramani, Karthik.

Sci Rep ; 8(1): 11909, 2018 08 09.

Article in English | MEDLINE | ID: mdl-30093660

ABSTRACT

The study of network topology provides insight into the function and behavior of physical, social, and biological systems. A natural step towards discovering the organizing principles of these complex topologies is to identify a reduced network representation using cohesive subgroups or communities. This procedure often uncovers the underlying mechanisms governing the functional assembly of complex networks. A community is usually defined as a subgraph or a set of nodes that has more edges than would be expected from a simple, null distribution of edges over the graph. This view drives objective such as modularity. Another perspective, corresponding to objectives like conductance or density, is that communities are groups of nodes that have extremal properties with respect to the number of internal edges and cut edges. Here we show that identifying community boundaries rather than communities results in a more accurate decomposition of the network into informative components. We derive a network analog of Gauss's law that relates a measure of flux through a subgraph's boundary to the connectivity among the subgraph's nodes. Our Gauss's law for networks naturally characterizes a community as a subgraph with high flux through its boundary. Aggregating flux over these boundaries gives rise to a Laplacian and forms the basis of our "Laplacian modularity" quality function for community detection that is applicable to general network types. This technique allows us to determine communities that are both overlapping and hierarchically organized.

5.

A geometric approach to characterize the functional identity of single cells.

Mohammadi, Shahin; Ravindra, Vikram; Gleich, David F; Grama, Ananth.

Nat Commun ; 9(1): 1516, 2018 04 17.

Article in English | MEDLINE | ID: mdl-29666373

ABSTRACT

Single-cell transcriptomic data has the potential to radically redefine our view of cell-type identity. Cells that were previously believed to be homogeneous are now clearly distinguishable in terms of their expression phenotype. Methods for automatically characterizing the functional identity of cells, and their associated properties, can be used to uncover processes involved in lineage differentiation as well as sub-typing cancer cells. They can also be used to suggest personalized therapies based on molecular signatures associated with pathology. We develop a new method, called ACTION, to infer the functional identity of cells from their transcriptional profile, classify them based on their dominant function, and reconstruct regulatory networks that are responsible for mediating their identity. Using ACTION, we identify novel Melanoma subtypes with differential survival rates and therapeutic responses, for which we provide biomarkers along with their underlying regulatory networks.

Subject(s)

Cell Differentiation/genetics , Gene Expression Profiling/methods , Models, Genetic , Single-Cell Analysis/methods , Transcriptome/physiology , Animals , Biomarkers, Tumor/genetics , Cell Line, Tumor , Datasets as Topic , Gene Regulatory Networks/physiology , Humans , Melanoma/genetics , Melanoma/therapy , Mice , Phenotype , Survival Rate , Treatment Outcome , Tumor Microenvironment/genetics

6.

AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs.

Jiang, Biaobin; Kloster, Kyle; Gleich, David F; Gribskov, Michael.

Bioinformatics ; 33(12): 1829-1836, 2017 Jun 15.

Article in English | MEDLINE | ID: mdl-28200073

ABSTRACT

MOTIVATION: Diffusion-based network models are widely used for protein function prediction using protein network data and have been shown to outperform neighborhood-based and module-based methods. Recent studies have shown that integrating the hierarchical structure of the Gene Ontology (GO) data dramatically improves prediction accuracy. However, previous methods usually either used the GO hierarchy to refine the prediction results of multiple classifiers, or flattened the hierarchy into a function-function similarity kernel. No study has taken the GO hierarchy into account together with the protein network as a two-layer network model. RESULTS: We first construct a Bi-relational graph (Birg) model comprised of both protein-protein association and function-function hierarchical networks. We then propose two diffusion-based methods, BirgRank and AptRank, both of which use PageRank to diffuse information on this two-layer graph model. BirgRank is a direct application of traditional PageRank with fixed decay parameters. In contrast, AptRank utilizes an adaptive diffusion mechanism to improve the performance of BirgRank. We evaluate the ability of both methods to predict protein function on yeast, fly and human protein datasets, and compare with four previous methods: GeneMANIA, TMC, ProteinRank and clusDCA. We design four different validation strategies: missing function prediction, de novo function prediction, guided function prediction and newly discovered function prediction to comprehensively evaluate predictability of all six methods. We find that both BirgRank and AptRank outperform the previous methods, especially in missing function prediction when using only 10% of the data for training. AVAILABILITY AND IMPLEMENTATION: The MATLAB code is available at https://github.rcac.purdue.edu/mgribsko/aptrank . CONTACT: gribskov@purdue.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Computational Biology/methods , Gene Ontology , Proteins/metabolism , Software , Algorithms , Animals , Drosophila/metabolism , Humans , Proteins/physiology , Saccharomyces cerevisiae/metabolism

7.

Local Higher-Order Graph Clustering.

Yin, Hao; Benson, Austin R; Leskovec, Jure; Gleich, David F.

KDD ; 2017: 555-564, 2017 Aug.

Article in English | MEDLINE | ID: mdl-29770258

ABSTRACT

Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal motif conductance, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology.

8.

Triangular Alignment (TAME): A Tensor-Based Approach for Higher-Order Network Alignment.

Mohammadi, Shahin; Gleich, David F; Kolda, Tamara G; Grama, Ananth.

IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1446-1458, 2017.

Article in English | MEDLINE | ID: mdl-27483461

ABSTRACT

Network alignment has extensive applications in comparative interactomics. Traditional approaches aim to simultaneously maximize the number of conserved edges and the underlying similarity of aligned entities. We propose a novel formulation of the network alignment problem that extends topological similarity to higher-order structures and provides a new objective function that maximizes the number of aligned substructures. This objective function corresponds to an integer programming problem, which is NP-hard. Consequently, we identify a closely related surrogate function whose maximization results in a tensor eigenvector problem. Based on this formulation, we present an algorithm called Triangular AlignMEnt (TAME), which attempts to maximize the number of aligned triangles across networks. Using a case study on the NAPAbench dataset, we show that triangular alignment is capable of producing mappings with high node correctness. We further evaluate our method by aligning yeast and human interactomes. Our results indicate that TAME outperforms the state-of-art alignment methods in terms of conserved triangles. In addition, we show that the number of conserved triangles is more significantly correlated, compared to the conserved edge, with node correctness and co-expression of edges. Our formulation and resulting algorithms can be easily extended to arbitrary motifs.

Subject(s)

Algorithms , Computational Biology/methods , Protein Interaction Mapping/methods , Sequence Alignment/methods , Gene Expression Profiling , Humans , Yeasts/genetics , Yeasts/metabolism

9.

Higher-order organization of complex networks.

Benson, Austin R; Gleich, David F; Leskovec, Jure.

Science ; 353(6295): 163-6, 2016 Jul 08.

Article in English | MEDLINE | ID: mdl-27387949

ABSTRACT

Networks are a fundamental tool for understanding and modeling complex systems in physics, biology, neuroscience, engineering, and social science. Many networks are known to exhibit rich, lower-order connectivity patterns that can be captured at the level of individual nodes and edges. However, higher-order organization of complex networks--at the level of small network subgraphs--remains largely unknown. Here, we develop a generalized framework for clustering networks on the basis of higher-order connectivity patterns. This framework provides mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges. The framework reveals higher-order organization in a number of networks, including information propagation units in neuronal networks and hub structure in transportation networks. Results show that networks exhibit rich higher-order organizational structures that are exposed by clustering based on higher-order connectivity patterns.

10.

Tensor Spectral Clustering for Partitioning Higher-order Network Structures.

Benson, Austin R; Gleich, David F; Leskovec, Jure.

Proc SIAM Int Conf Data Min ; 2015: 118-126, 2015.

Article in English | MEDLINE | ID: mdl-27812399

ABSTRACT

Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms.

11.

Dimensionality of social networks using motifs and eigenvalues.

Bonato, Anthony; Gleich, David F; Kim, Myunghwan; Mitsche, Dieter; Pralat, Pawel; Tian, Yanhua; Young, Stephen J.

PLoS One ; 9(9): e106052, 2014.

Article in English | MEDLINE | ID: mdl-25188391

ABSTRACT

We consider the dimensionality of social networks, and develop experiments aimed at predicting that dimension. We find that a social network model with nodes and links sampled from an m-dimensional metric space with power-law distributed influence regions best fits samples from real-world networks when m scales logarithmically with the number of nodes of the network. This supports a logarithmic dimension hypothesis, and we provide evidence with two different social networks, Facebook and LinkedIn. Further, we employ two different methods for confirming the hypothesis: the first uses the distribution of motif counts, and the second exploits the eigenvalue distribution.

Subject(s)

Social Networking , Computer Graphics , Humans , Mathematical Concepts , Models, Theoretical , Support Vector Machine

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL