Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
1.
bioRxiv ; 2024 Feb 12.
Article in English | MEDLINE | ID: mdl-38405697

ABSTRACT

Clustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and complexity to bioinformatic workflows; second, they rely on post-selective differential expression analyses to identify marker genes driving cluster differences, which has been shown to be subject to inflated false discovery rates. We address these challenges by introducing nonparametric clustering of single-cell populations (NCLUSION): an infinite mixture model that leverages Bayesian sparse priors to identify marker genes while simultaneously performing clustering on single-cell expression data. NCLUSION uses a scalable variational inference algorithm to perform these analyses on datasets with up to millions of cells. By analyzing publicly available scRNA-seq studies, we demonstrate that NCLUSION (i) matches the performance of other state-of-the-art clustering techniques with significantly reduced runtime and (ii) provides statistically robust and biologically relevant transcriptomic signatures for each of the clusters it identifies. Overall, NCLUSION represents a reliable hypothesis-generating tool for understanding patterns of expression variation present in single-cell populations.

2.
bioRxiv ; 2024 Jan 11.
Article in English | MEDLINE | ID: mdl-38260340

ABSTRACT

Understanding morphological variation is an important task in many areas of computational biology. Recent studies have focused on developing computational tools for the task of sub-image selection which aims at identifying structural features that best describe the variation between classes of shapes. A major part in assessing the utility of these approaches is to demonstrate their performance on both simulated and real datasets. However, when creating a model for shape statistics, real data can be difficult to access and the sample sizes for these data are often small due to them being expensive to collect. Meanwhile, the current landscape of generative models for shapes has been mostly limited to approaches that use black-box inference-making it difficult to systematically assess the power and calibration of sub-image models. In this paper, we introduce the α-shape sampler: a probabilistic framework for generating realistic 2D and 3D shapes based on probability distributions which can be learned from real data. We demonstrate our framework using proof-of-concept examples and in two real applications in biology where we generate (i) 2D images of healthy and septic neutrophils and (ii) 3D computed tomography (CT) scans of primate mandibular molars. The α-shape sampler R package is open-source and can be downloaded at https://github.com/lcrawlab/ashapesampler.

3.
G3 (Bethesda) ; 13(8)2023 08 09.
Article in English | MEDLINE | ID: mdl-37243672

ABSTRACT

Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this study, we present the "multivariate MArginal ePIstasis Test" (mvMAPIT)-a multioutcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact-thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multitrait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogeneous stock of mice from the Wellcome Trust Centre for Human Genetics. The mvMAPIT R package can be downloaded at https://github.com/lcrawlab/mvMAPIT.


Subject(s)
Epistasis, Genetic , Genome-Wide Association Study , Humans , Animals , Mice , Phenotype , Quantitative Trait Loci , Algorithms
4.
PLoS Comput Biol ; 19(5): e1011162, 2023 05.
Article in English | MEDLINE | ID: mdl-37220151

ABSTRACT

Natural products are chemical compounds that form the basis of many therapeutics used in the pharmaceutical industry. In microbes, natural products are synthesized by groups of colocalized genes called biosynthetic gene clusters (BGCs). With advances in high-throughput sequencing, there has been an increase of complete microbial isolate genomes and metagenomes, from which a vast number of BGCs are undiscovered. Here, we introduce a self-supervised learning approach designed to identify and characterize BGCs from such data. To do this, we represent BGCs as chains of functional protein domains and train a masked language model on these domains. We assess the ability of our approach to detect BGCs and characterize BGC properties in bacterial genomes. We also demonstrate that our model can learn meaningful representations of BGCs and their constituent domains, detect BGCs in microbial genomes, and predict BGC product classes. These results highlight self-supervised neural networks as a promising framework for improving BGC prediction and classification.


Subject(s)
Biological Products , Genome, Bacterial , Metagenome , Multigene Family/genetics , Biological Products/metabolism , Supervised Machine Learning
5.
iScience ; 25(7): 104553, 2022 Jul 15.
Article in English | MEDLINE | ID: mdl-35769876

ABSTRACT

In this paper, we propose a new approach for variable selection using a collection of Bayesian neural networks with a focus on quantifying uncertainty over which variables are selected. Motivated by fine-mapping applications in statistical genetics, we refer to our framework as an "ensemble of single-effect neural networks" (ESNN) which generalizes the "sum of single effects" regression framework by both accounting for nonlinear structure in genotypic data (e.g., dominance effects) and having the capability to model discrete phenotypes (e.g., case-control studies). Through extensive simulations, we demonstrate our method's ability to produce calibrated posterior summaries such as credible sets and posterior inclusion probabilities, particularly for traits with genetic architectures that have significant proportions of non-additive variation driven by correlated variants. Lastly, we use real data to demonstrate that the ESNN framework improves upon the state of the art for identifying true effect variables underlying various complex traits.

6.
PLoS Comput Biol ; 18(5): e1010045, 2022 05.
Article in English | MEDLINE | ID: mdl-35500014

ABSTRACT

Identifying structural differences among proteins can be a non-trivial task. When contrasting ensembles of protein structures obtained from molecular dynamics simulations, biologically-relevant features can be easily overshadowed by spurious fluctuations. Here, we present SINATRA Pro, a computational pipeline designed to robustly identify topological differences between two sets of protein structures. Algorithmically, SINATRA Pro works by first taking in the 3D atomic coordinates for each protein snapshot and summarizing them according to their underlying topology. Statistically significant topological features are then projected back onto a user-selected representative protein structure, thus facilitating the visual identification of biophysical signatures of different protein ensembles. We assess the ability of SINATRA Pro to detect minute conformational changes in five independent protein systems of varying complexities. In all test cases, SINATRA Pro identifies known structural features that have been validated by previous experimental and computational studies, as well as novel features that are also likely to be biologically-relevant according to the literature. These results highlight SINATRA Pro as a promising method for facilitating the non-trivial task of pattern recognition in trajectories resulting from molecular dynamics simulations, with substantially increased resolution.


Subject(s)
Data Science , Molecular Dynamics Simulation , Biophysics , Protein Conformation , Proteins/chemistry
7.
Am J Hum Genet ; 109(5): 871-884, 2022 05 05.
Article in English | MEDLINE | ID: mdl-35349783

ABSTRACT

Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Humans , Multifactorial Inheritance , Phenotype , Polymorphism, Single Nucleotide/genetics , Racial Groups
8.
Cell ; 184(25): 6119-6137.e26, 2021 12 09.
Article in English | MEDLINE | ID: mdl-34890551

ABSTRACT

Prognostically relevant RNA expression states exist in pancreatic ductal adenocarcinoma (PDAC), but our understanding of their drivers, stability, and relationship to therapeutic response is limited. To examine these attributes systematically, we profiled metastatic biopsies and matched organoid models at single-cell resolution. In vivo, we identify a new intermediate PDAC transcriptional cell state and uncover distinct site- and state-specific tumor microenvironments (TMEs). Benchmarking models against this reference map, we reveal strong culture-specific biases in cancer cell transcriptional state representation driven by altered TME signals. We restore expression state heterogeneity by adding back in vivo-relevant factors and show plasticity in culture models. Further, we prove that non-genetic modulation of cell state can strongly influence drug responses, uncovering state-specific vulnerabilities. This work provides a broadly applicable framework for aligning cell states across in vivo and ex vivo settings, identifying drivers of transcriptional plasticity and manipulating cell state to target associated vulnerabilities.


Subject(s)
Biomarkers, Tumor/metabolism , Carcinoma, Pancreatic Ductal/metabolism , Pancreatic Neoplasms/metabolism , Tumor Microenvironment , Adult , Aged , Cell Line, Tumor , Female , Gene Expression Regulation, Neoplastic , Humans , Male , Middle Aged , Single-Cell Analysis
10.
PLoS Genet ; 17(8): e1009754, 2021 08.
Article in English | MEDLINE | ID: mdl-34411094

ABSTRACT

In this article, we present Biologically Annotated Neural Networks (BANNs), a nonlinear probabilistic framework for association mapping in genome-wide association (GWA) studies. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. We treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses variational inference to provide posterior summaries which allow researchers to simultaneously perform (i) mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art association mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from the Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a random subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations in high and low-density lipoprotein cholesterol content.


Subject(s)
Genome-Wide Association Study/methods , Molecular Sequence Annotation/methods , Animals , Genome/genetics , Genomics/methods , Genotype , Humans , Models, Genetic , Multifactorial Inheritance/genetics , Neural Networks, Computer , Phenotype , Polymorphism, Single Nucleotide/genetics , Software
11.
Genome Biol ; 22(1): 213, 2021 07 23.
Article in English | MEDLINE | ID: mdl-34301310

ABSTRACT

Large-scale phenotype data can enhance the power of genomic prediction in plant and animal breeding, as well as human genetics. However, the statistical foundation of multi-trait genomic prediction is based on the multivariate linear mixed effect model, a tool notorious for its fragility when applied to more than a handful of traits. We present MegaLMM, a statistical framework and associated software package for mixed model analyses of a virtually unlimited number of traits. Using three examples with real plant data, we show that MegaLMM can leverage thousands of traits at once to significantly improve genetic value prediction accuracy.


Subject(s)
Arabidopsis/genetics , Genome, Plant , Models, Genetic , Quantitative Trait, Heritable , Software , Triticum/genetics , Zea mays/genetics , Bayes Theorem , Gene-Environment Interaction , Genomics , Genotype , Humans , Phenotype , Plant Breeding
12.
PLoS Genet ; 17(3): e1008887, 2021 03.
Article in English | MEDLINE | ID: mdl-33735180

ABSTRACT

The winged insects of the order Diptera are colloquially named for their most recognizable phenotype: flight. These insects rely on flight for a number of important life history traits, such as dispersal, foraging, and courtship. Despite the importance of flight, relatively little is known about the genetic architecture of flight performance. Accordingly, we sought to uncover the genetic modifiers of flight using a measure of flies' reaction and response to an abrupt drop in a vertical flight column. We conducted a genome wide association study (GWAS) using 197 of the Drosophila Genetic Reference Panel (DGRP) lines, and identified a combination of additive and marginal variants, epistatic interactions, whole genes, and enrichment across interaction networks. Egfr, a highly pleiotropic developmental gene, was among the most significant additive variants identified. We functionally validated 13 of the additive candidate genes' (Adgf-A/Adgf-A2/CG32181, bru1, CadN, flapper (CG11073), CG15236, flippy (CG9766), CREG, Dscam4, form3, fry, Lasp/CG9692, Pde6, Snoo), and introduce a novel approach to whole gene significance screens: PEGASUS_flies. Additionally, we identified ppk23, an Acid Sensing Ion Channel (ASIC) homolog, as an important hub for epistatic interactions. We propose a model that suggests genetic modifiers of wing and muscle morphology, nervous system development and function, BMP signaling, sexually dimorphic neural wiring, and gene regulation are all important for the observed differences flight performance in a natural population. Additionally, these results represent a snapshot of the genetic modifiers affecting drop-response flight performance in Drosophila, with implications for other insects.


Subject(s)
Drosophila melanogaster/genetics , Drosophila/genetics , Gene Expression Regulation, Developmental , Genetic Variation , Neurogenesis/genetics , Animals , Drosophila/embryology , Drosophila melanogaster/metabolism , Epigenesis, Genetic , Female , Flight, Animal , Genetic Association Studies , Male , Phenotype , Polymorphism, Single Nucleotide
13.
Mol Cancer Ther ; 20(1): 183-190, 2021 01.
Article in English | MEDLINE | ID: mdl-33087512

ABSTRACT

Glycogen synthase kinase-3ß (GSK-3ß), a serine/threonine kinase, has been implicated in the pathogenesis of many cancers, with involvement in cell-cycle regulation, apoptosis, and immune response. Small-molecule GSK-3ß inhibitors are currently undergoing clinical investigation. Tumor sequencing has revealed genomic alterations in GSK-3ß, yet an assessment of the genomic landscape in malignancies is lacking. This study assessed >100,000 tumors from two databases to analyze GSK-3ß alterations. GSK-3ß expression and immune cell infiltrate data were analyzed across cancer types, and programmed death-ligand 1 (PD-L1) expression was compared between GSK-3ß-mutated and wild-type tumors. GSK-3ß was mutated at a rate of 1%. The majority of mutated residues were in the kinase domain, with frequent mutations occurring in a GSK-3ß substrate binding pocket. Uterine endometrioid carcinoma was the most commonly mutated (4%) tumor, and copy-number variations were most commonly observed in squamous histologies. Significant differences across cancer types for GSK-3ß-mutated tumors were observed for B cells (P = 0.018), monocytes (P = 0.002), dendritic cells (P = 0.005), neutrophils (P = 0.0003), and endothelial cells (P = 0.014). GSK-3ß mRNA expression was highest in melanoma. The frequency of PD-L1 expression was higher among GSK-3ß-mutated tumors compared with wild type in colorectal cancer (P = 0.03), endometrial cancer (P = 0.05), melanoma (P = 0.02), ovarian carcinoma (P = 0.0001), and uterine sarcoma (P = 0.002). Overall, GSK-3ß molecular alterations were detected in approximately 1% of solid tumors, tumors with GSK-3ß mutations displayed a microenvironment with increased infiltration of B cells, and GSK-3ß mutations were associated with increased PD-L1 expression in selected histologies. These results advance the understanding of GSK-3ß complex signaling network interfacing with key pathways involved in carcinogenesis and immune response.


Subject(s)
Genome, Human , Glycogen Synthase Kinase 3 beta/metabolism , Neoplasms/enzymology , Neoplasms/genetics , B7-H1 Antigen/metabolism , Cohort Studies , DNA Copy Number Variations/genetics , Glycogen Synthase Kinase 3 beta/genetics , Humans , Mutation/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Tumor Microenvironment/genetics
14.
PLoS Genet ; 16(6): e1008855, 2020 06.
Article in English | MEDLINE | ID: mdl-32542026

ABSTRACT

Traditional univariate genome-wide association studies generate false positives and negatives due to difficulties distinguishing associated variants from variants with spurious nonzero effects that do not directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways enriched for mutations in quantitative traits or case-control studies, but these can be computationally costly and hampered by strict model assumptions. Here, we present gene-ε, a new approach for identifying statistical associations between sets of variants and quantitative traits. Our key insight is that enrichment studies on the gene-level are improved when we reformulate the genome-wide SNP-level null hypothesis to identify spurious small-to-intermediate SNP effects and classify them as non-causal. gene-ε efficiently identifies enriched genes under a variety of simulated genetic architectures, achieving greater than a 90% true positive rate at 1% false positive rate for polygenic traits. Lastly, we apply gene-ε to summary statistics derived from six quantitative traits using European-ancestry individuals in the UK Biobank, and identify enriched genes that are in biologically relevant pathways.


Subject(s)
Genome-Wide Association Study/statistics & numerical data , Models, Genetic , Multifactorial Inheritance/genetics , Polymorphism, Single Nucleotide , Quantitative Trait Loci/genetics , Data Interpretation, Statistical , Databases, Genetic/statistics & numerical data , Humans , United Kingdom , White People/genetics
15.
Ann Biomed Eng ; 48(8): 2218-2232, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32303872

ABSTRACT

Here we demonstrate a technique to generate proteomic signatures of specific cell types within heterogeneous populations. While our method is broadly applicable across biological systems, we have limited the current work to study neural cell types isolated from human, post-mortem Alzheimer's disease (AD) and aged-matched non-symptomatic (NS) brains. Motivating the need for this tool, we conducted an initial meta-analysis of current, human AD proteomics studies. While the results broadly corroborated major neurodegenerative disease hypotheses, cell type-specific predictions were limited. By adapting our Formaldehyde-fixed Intracellular Target-Sorted Antigen Retrieval (FITSAR) method for proteomics and applying this technique to characterize AD and NS brains, we generated enriched neuron and astrocyte proteomic profiles for a sample set of donors (available at www.fitsarpro.appspot.com ). Results showed the feasibility for using FITSAR to evaluate cell-type specific hypotheses. Our overall methodological approach provides an accessible platform to determine protein presence in specific cell types and emphasizes the need for protein-compatible techniques to resolve systems complicated by cellular heterogeneity.


Subject(s)
Alzheimer Disease/metabolism , Astrocytes/metabolism , Brain/metabolism , Neurons/metabolism , Proteomics , Alzheimer Disease/pathology , Astrocytes/pathology , Brain/pathology , Neurons/pathology
16.
Nat Genet ; 52(4): 408-417, 2020 04.
Article in English | MEDLINE | ID: mdl-32203462

ABSTRACT

Local adaptation directs populations towards environment-specific fitness maxima through acquisition of positively selected traits. However, rapid environmental changes can identify hidden fitness trade-offs that turn adaptation into maladaptation, resulting in evolutionary traps. Cancer, a disease that is prone to drug resistance, is in principle susceptible to such traps. We therefore performed pooled CRISPR-Cas9 knockout screens in acute myeloid leukemia (AML) cells treated with various chemotherapies to map the drug-dependent genetic basis of fitness trade-offs, a concept known as antagonistic pleiotropy (AP). We identified a PRC2-NSD2/3-mediated MYC regulatory axis as a drug-induced AP pathway whose ability to confer resistance to bromodomain inhibition and sensitivity to BCL-2 inhibition templates an evolutionary trap. Across diverse AML cell-line and patient-derived xenograft models, we find that acquisition of resistance to bromodomain inhibition through this pathway exposes coincident hypersensitivity to BCL-2 inhibition. Thus, drug-induced AP can be leveraged to design evolutionary traps that selectively target drug resistance in cancer.


Subject(s)
Drug Resistance, Neoplasm/genetics , Genetic Pleiotropy/genetics , Neoplasms/genetics , Adaptation, Physiological/genetics , Animals , Biological Evolution , CRISPR-Cas Systems/genetics , Cell Line , Cell Line, Tumor , Environment , Genetic Fitness/genetics , HEK293 Cells , HL-60 Cells , Humans , Mice , Nuclear Proteins/genetics , Phenotype , Quantitative Trait Loci/genetics , Transcription Factors/genetics
17.
Am J Physiol Cell Physiol ; 317(2): C155-C166, 2019 08 01.
Article in English | MEDLINE | ID: mdl-30917031

ABSTRACT

Many different subpopulations of subcellular extracellular vesicles (EVs) have been described. EVs are released from all cell types and have been shown to regulate normal physiological homeostasis, as well as pathological states by influencing cell proliferation, differentiation, organ homing, injury and recovery, as well as disease progression. In this review, we focus on the bidirectional actions of vesicles from normal and diseased cells on normal or leukemic target cells; and on the leukemic microenvironment as a whole. EVs from human bone marrow mesenchymal stem cells (MSC) can have a healing effect, reversing the malignant phenotype in prostate and colorectal cancer, as well as mitigating radiation damage to marrow. The role of EVs in leukemia and their bimodal cross talk with the encompassing microenvironment remains to be fully characterized. This may provide insight for clinical advances via the application of EVs as potential therapy and the employment of statistical and machine learning models to capture the pleiotropic effects EVs endow to a dynamic microenvironment, possibly allowing for precise therapeutic intervention.


Subject(s)
Biomarkers, Tumor/metabolism , Extracellular Vesicles/metabolism , Leukemia/metabolism , Mesenchymal Stem Cells/metabolism , Neoplastic Stem Cells/metabolism , Tumor Microenvironment , Animals , Antineoplastic Agents/therapeutic use , Biomarkers, Tumor/genetics , Cell Communication , Drug Resistance, Neoplasm , Extracellular Vesicles/drug effects , Extracellular Vesicles/genetics , Extracellular Vesicles/pathology , Humans , Leukemia/drug therapy , Leukemia/genetics , Leukemia/pathology , Machine Learning , Mesenchymal Stem Cells/drug effects , Mesenchymal Stem Cells/pathology , Neoplastic Stem Cells/drug effects , Neoplastic Stem Cells/pathology , Phenotype , Signal Transduction , Systems Biology/methods
18.
PLoS Genet ; 15(2): e1007978, 2019 02.
Article in English | MEDLINE | ID: mdl-30735486

ABSTRACT

Linear mixed effect models are powerful tools used to account for population structure in genome-wide association studies (GWASs) and estimate the genetic architecture of complex traits. However, fully-specified models are computationally demanding and common simplifications often lead to reduced power or biased inference. We describe Grid-LMM (https://github.com/deruncie/GridLMM), an extendable algorithm for repeatedly fitting complex linear models that account for multiple sources of heterogeneity, such as additive and non-additive genetic variance, spatial heterogeneity, and genotype-environment interactions. Grid-LMM can compute approximate (yet highly accurate) frequentist test statistics or Bayesian posterior summaries at a genome-wide scale in a fraction of the time compared to existing general-purpose methods. We apply Grid-LMM to two types of quantitative genetic analyses. The first is focused on accounting for spatial variability and non-additive genetic variance while scanning for QTL; and the second aims to identify gene expression traits affected by non-additive genetic variation. In both cases, modeling multiple sources of heterogeneity leads to new discoveries.


Subject(s)
Algorithms , Linear Models , Models, Genetic , Animals , Arabidopsis/genetics , Arabidopsis/growth & development , Bayes Theorem , Body Weight/genetics , Computer Simulation , Flowers/genetics , Flowers/growth & development , Gene-Environment Interaction , Genetic Markers , Genetic Variation , Genome-Wide Association Study/statistics & numerical data , Humans , Mice , Quantitative Trait Loci
19.
Ann Appl Stat ; 13(2): 958-989, 2019 Jun.
Article in English | MEDLINE | ID: mdl-32542104

ABSTRACT

The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other "black box" methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and two real data association mapping studies, we show that applying RATE enables an explanation for this improved performance.

20.
Nat Commun ; 9(1): 3513, 2018 08 29.
Article in English | MEDLINE | ID: mdl-30158527

ABSTRACT

While inhibitors of BCL-2 family proteins (BH3 mimetics) have shown promise as anti-cancer agents, the various dependencies or co-dependencies of diverse cancers on BCL-2 genes remain poorly understood. Here we develop a drug screening approach to define the sensitivity of cancer cells from ten tissue types to all possible combinations of selective BCL-2, BCL-XL, and MCL-1 inhibitors and discover that most cell lines depend on at least one combination for survival. We demonstrate that expression levels of BCL-2 genes predict single mimetic sensitivity, whereas EMT status predicts synergistic dependence on BCL-XL+MCL-1. Lastly, we use a CRISPR/Cas9 screen to discover that BFL-1 and BCL-w promote resistance to all tested combinations of BCL-2, BCL-XL, and MCL-1 inhibitors. Together, these results provide a roadmap for rationally targeting BCL-2 family dependencies in diverse human cancers and motivate the development of selective BFL-1 and BCL-w inhibitors to overcome intrinsic resistance to BH3 mimetics.


Subject(s)
Neoplasms/metabolism , Proto-Oncogene Proteins c-bcl-2/metabolism , Animals , Antineoplastic Agents, Phytogenic/pharmacology , Bridged Bicyclo Compounds, Heterocyclic/pharmacology , Cell Line, Tumor , Dose-Response Relationship, Drug , Male , Mice , RNA, Messenger/metabolism , Sulfonamides/pharmacology , bcl-X Protein/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...