Search | VHL Regional Portal

Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms.

Pividori, Milton; Lu, Sumei; Li, Binglan; Su, Chun; Johnson, Matthew E; Wei, Wei-Qi; Feng, Qiping; Namjou, Bahram; Kiryluk, Krzysztof; Kullo, Iftikhar J; Luo, Yuan; Sullivan, Blair D; Voight, Benjamin F; Skarke, Carsten; Ritchie, Marylyn D; Grant, Struan F A; Greene, Casey S.

Nat Commun ; 14(1): 5562, 2023 09 09.

Article in English | MEDLINE | ID: mdl-37689782

ABSTRACT

Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies. Transcriptome-wide association studies have helped uncover the role of individual genes in disease-relevant mechanisms. However, modern models of the architecture of complex traits predict that gene-gene interactions play a crucial role in disease origin and progression. Here we introduce PhenoPLIER, a computational approach that maps gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. This representation is based on modules of genes with similar expression patterns across the same conditions. We observe that diseases are significantly associated with gene modules expressed in relevant cell types, and our approach is accurate in predicting known drug-disease pairs and inferring mechanisms of action. Furthermore, using a CRISPR screen to analyze lipid regulation, we find that functionally important players lack associations but are prioritized in trait-associated modules by PhenoPLIER. By incorporating groups of co-expressed genes, PhenoPLIER can contextualize genetic associations and reveal potential targets missed by single-gene strategies.

Subject(s)

Clustered Regularly Interspaced Short Palindromic Repeats , Epistasis, Genetic , Causality , Gene Regulatory Networks , Transcriptome

Hetnet connectivity search provides rapid insights into how two biomedical entities are related.

Himmelstein, Daniel S; Zietz, Michael; Rubinetti, Vincent; Kloster, Kyle; Heil, Benjamin J; Alquaddoomi, Faisal; Hu, Dongbo; Nicholson, David N; Hao, Yun; Sullivan, Blair D; Nagle, Michael W; Greene, Casey S.

bioRxiv ; 2023 Jan 07.

Article in English | MEDLINE | ID: mdl-36711546

ABSTRACT

Hetnets, short for "heterogeneous networks", contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search . We provide an open source implementation of these methods in our new Python package named hetmatpy .

Hetnet connectivity search provides rapid insights into how biomedical entities are related.

Gigascience ; 122022 12 28.

Article in English | MEDLINE | ID: mdl-37503959

ABSTRACT

BACKGROUND: Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious about not only how metformin is related to breast cancer but also how a given gene might be involved in insomnia. FINDINGS: We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any 2 nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. CONCLUSION: We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open-source implementation of these methods in our new Python package named hetmatpy.

Subject(s)

Algorithms , Probability

Parameterized algorithms for identifying gene co-expression modules via weighted clique decomposition.

Cooley, Madison; Greene, Casey S; Issac, Davis; Pividori, Milton; Sullivan, Blair D.

Proc 2021 SIAM Conf Appl Comput Discret Algorithms (2021) ; 2021: 111-122, 2021.

Article in English | MEDLINE | ID: mdl-35391741

ABSTRACT

We present a new combinatorial model for identifying regulatory modules in gene co-expression data using a decomposition into weighted cliques. To capture complex interaction effects, we generalize the previously-studied weighted edge clique partition problem. As a first step, we restrict ourselves to the noise-free setting, and show that the problem is fixed parameter tractable when parameterized by the number of modules (cliques). We present two new algorithms for finding these decompositions, using linear programming and integer partitioning to determine the clique weights. Further, we implement these algorithms in Python and test them on a biologically-inspired synthetic corpus generated using real-world data from transcription factors and a latent variable analysis of co-expression in varying cell types.

Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity.

Brown, C Titus; Moritz, Dominik; O'Brien, Michael P; Reidl, Felix; Reiter, Taylor; Sullivan, Blair D.

Genome Biol ; 21(1): 164, 2020 07 06.

Article in English | MEDLINE | ID: mdl-32631445

ABSTRACT

Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/spacegraphcats under the 3-Clause BSD License.

Subject(s)

Algorithms , Genetic Variation , Genome , Metagenomics/methods , Software

Benchmarking treewidth as a practical component of tensor network simulations.

Dumitrescu, Eugene F; Fisher, Allison L; Goodrich, Timothy D; Humble, Travis S; Sullivan, Blair D; Wright, Andrew L.

PLoS One ; 13(12): e0207827, 2018.

Article in English | MEDLINE | ID: mdl-30562341

ABSTRACT

Tensor networks are powerful factorization techniques which reduce resource requirements for numerically simulating principal quantum many-body systems and algorithms. The computational complexity of a tensor network simulation depends on the tensor ranks and the order in which they are contracted. Unfortunately, computing optimal contraction sequences (orderings) in general is known to be a computationally difficult (NP-complete) task. In 2005, Markov and Shi showed that optimal contraction sequences correspond to optimal (minimum width) tree decompositions of a tensor network's line graph, relating the contraction sequence problem to a rich literature in structural graph theory. While treewidth-based methods have largely been ignored in favor of dataset-specific algorithms in the prior tensor networks literature, we demonstrate their practical relevance for problems arising from two distinct methods used in quantum simulation: multi-scale entanglement renormalization ansatz (MERA) datasets and quantum circuits generated by the quantum approximate optimization algorithm (QAOA). We exhibit multiple regimes where treewidth-based algorithms outperform domain-specific algorithms, while demonstrating that the optimal choice of algorithm has a complex dependence on the network density, expected contraction complexity, and user run time requirements. We further provide an open source software framework designed with an emphasis on accessibility and extendability, enabling replicable experimental evaluations and future exploration of competing methods by practitioners.

Subject(s)

Algorithms , Computer Simulation , Software , Benchmarking , Computer Graphics , Quantum Theory

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL