Pesquisa | Portal Regional da BVS (teste)

1.

Higher-Order Least Squares: Assessing Partial Goodness of Fit of Linear Causal Models.

Schultheiss, Christoph; Bühlmann, Peter; Yuan, Ming.

J Am Stat Assoc ; 119(546): 1019-1031, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38974187

RESUMO

We introduce a simple diagnostic test for assessing the overall or partial goodness of fit of a linear causal model with errors being independent of the covariates. In particular, we consider situations where hidden confounding is potentially present. We develop a method and discuss its capability to distinguish between covariates that are confounded with the response by latent variables and those that are not. Thus, we provide a test and methodology for partial goodness of fit. The test is based on comparing a novel higher-order least squares principle with ordinary least squares. In spite of its simplicity, the proposed method is extremely general and is also proven to be valid for high-dimensional settings. Supplementary materials for this article are available online.

2.

Model selection over partially ordered sets.

Taeb, Armeen; Bühlmann, Peter; Chandrasekaran, Venkat.

Proc Natl Acad Sci U S A ; 121(8): e2314228121, 2024 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-38363866

RESUMO

In problems such as variable selection and graph estimation, models are characterized by Boolean logical structure such as the presence or absence of a variable or an edge. Consequently, false-positive error or false-negative error can be specified as the number of variables/edges that are incorrectly included or excluded in an estimated model. However, there are several other problems such as ranking, clustering, and causal inference in which the associated model classes do not admit transparent notions of false-positive and false-negative errors due to the lack of an underlying Boolean logical structure. In this paper, we present a generic approach to endow a collection of models with partial order structure, which leads to a hierarchical organization of model classes as well as natural analogs of false-positive and false-negative errors. We describe model selection procedures that provide false-positive error control in our general setting, and we illustrate their utility with numerical experiments.

3.

Predicting sepsis using deep learning across international sites: a retrospective development and validation study.

Moor, Michael; Bennett, Nicolas; Plecko, Drago; Horn, Max; Rieck, Bastian; Meinshausen, Nicolai; Bühlmann, Peter; Borgwardt, Karsten.

EClinicalMedicine ; 62: 102124, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37588623

RESUMO

Background: When sepsis is detected, organ damage may have progressed to irreversible stages, leading to poor prognosis. The use of machine learning for predicting sepsis early has shown promise, however international validations are missing. Methods: This was a retrospective, observational, multi-centre cohort study. We developed and externally validated a deep learning system for the prediction of sepsis in the intensive care unit (ICU). Our analysis represents the first international, multi-centre in-ICU cohort study for sepsis prediction using deep learning to our knowledge. Our dataset contains 136,478 unique ICU admissions, representing a refined and harmonised subset of four large ICU databases comprising data collected from ICUs in the US, the Netherlands, and Switzerland between 2001 and 2016. Using the international consensus definition Sepsis-3, we derived hourly-resolved sepsis annotations, amounting to 25,694 (18.8%) patient stays with sepsis. We compared our approach to clinical baselines as well as machine learning baselines and performed an extensive internal and external statistical validation within and across databases, reporting area under the receiver-operating-characteristic curve (AUC). Findings: Averaged over sites, our model was able to predict sepsis with an AUC of 0.846 (95% confidence interval [CI], 0.841-0.852) on a held-out validation cohort internal to each site, and an AUC of 0.761 (95% CI, 0.746-0.770) when validating externally across sites. Given access to a small fine-tuning set (10% per site), the transfer to target sites was improved to an AUC of 0.807 (95% CI, 0.801-0.813). Our model raised 1.4 false alerts per true alert and detected 80% of the septic patients 3.7 h (95% CI, 3.0-4.3) prior to the onset of sepsis, opening a vital window for intervention. Interpretation: By monitoring clinical and laboratory measurements in a retrospective simulation of a real-time prediction scenario, a deep learning system for the detection of sepsis generalised to previously unseen ICU cohorts, internationally. Funding: This study was funded by the Personalized Health and Related Technologies (PHRT) strategic focus area of the ETH domain.

4.

Distributional regression modeling via generalized additive models for location, scale, and shape: An overview through a data set from learning analytics.

Marmolejo-Ramos, Fernando; Tejo, Mauricio; Brabec, Marek; Kuzilek, Jakub; Joksimovic, Srecko; Kovanovic, Vitomir; González, Jorge; Kneib, Thomas; Bühlmann, Peter; Kook, Lucas; Briseño-Sánchez, Guillermo; Ospina, Raydonal.

Wiley Interdiscip Rev Data Min Knowl Discov ; 13(1): e1479, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37502671

RESUMO

The advent of technological developments is allowing to gather large amounts of data in several research fields. Learning analytics (LA)/educational data mining has access to big observational unstructured data captured from educational settings and relies mostly on unsupervised machine learning (ML) algorithms to make sense of such type of data. Generalized additive models for location, scale, and shape (GAMLSS) are a supervised statistical learning framework that allows modeling all the parameters of the distribution of the response variable with respect to the explanatory variables. This article overviews the power and flexibility of GAMLSS in relation to some ML techniques. Also, GAMLSS' capability to be tailored toward causality via causal regularization is briefly commented. This overview is illustrated via a data set from the field of LA. This article is categorized under:Application Areas > Education and LearningAlgorithmic Development > StatisticsTechnologies > Machine Learning.

5.

Single-cell profiling of alveolar rhabdomyosarcoma reveals RAS pathway inhibitors as cell-fate hijackers with therapeutic relevance.

Danielli, Sara G; Porpiglia, Ermelinda; De Micheli, Andrea J; Navarro, Natalia; Zellinger, Michael J; Bechtold, Ingrid; Kisele, Samanta; Volken, Larissa; Marques, Joana G; Kasper, Stephanie; Bode, Peter K; Henssen, Anton G; Gürgen, Dennis; Delattre, Olivier; Surdez, Didier; Roma, Josep; Bühlmann, Peter; Blau, Helen M; Wachtel, Marco; Schäfer, Beat W.

Sci Adv ; 9(6): eade9238, 2023 02 10.

Artigo em Inglês | MEDLINE | ID: mdl-36753540

RESUMO

Rhabdomyosarcoma (RMS) is a group of pediatric cancers with features of developing skeletal muscle. The cellular hierarchy and mechanisms leading to developmental arrest remain elusive. Here, we combined single-cell RNA sequencing, mass cytometry, and high-content imaging to resolve intratumoral heterogeneity of patient-derived primary RMS cultures. We show that the aggressive alveolar RMS (aRMS) subtype contains plastic muscle stem-like cells and cycling progenitors that drive tumor growth, and a subpopulation of differentiated cells that lost its proliferative potential and correlates with better outcomes. While chemotherapy eliminates cycling progenitors, it enriches aRMS for muscle stem-like cells. We screened for drugs hijacking aRMS toward clinically favorable subpopulations and identified a combination of RAF and MEK inhibitors that potently induces myogenic differentiation and inhibits tumor growth. Overall, our work provides insights into the developmental states underlying aRMS aggressiveness, chemoresistance, and progression and identifies the RAS pathway as a promising therapeutic target.

Assuntos

Antineoplásicos , Rabdomiossarcoma Alveolar , Rabdomiossarcoma , Criança , Humanos , Rabdomiossarcoma Alveolar/tratamento farmacológico , Rabdomiossarcoma Alveolar/genética , Rabdomiossarcoma Alveolar/patologia , Rabdomiossarcoma/tratamento farmacológico , Rabdomiossarcoma/genética , Rabdomiossarcoma/patologia , Músculo Esquelético/metabolismo , Diferenciação Celular , Antineoplásicos/uso terapêutico , Linhagem Celular Tumoral

6.

DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING.

Guo, Zijian; Cevid, Domagoj; Bühlmann, Peter.

Ann Stat ; 50(3): 1320-1347, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35958884

RESUMO

Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the Doubly Debiased Lasso estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.

7.

Distributional anchor regression.

Kook, Lucas; Sick, Beate; Bühlmann, Peter.

Stat Comput ; 32(3): 39, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35582000

RESUMO

Prediction models often fail if train and test data do not stem from the same distribution. Out-of-distribution (OOD) generalization to unseen, perturbed test data is a desirable but difficult-to-achieve property for prediction models and in general requires strong assumptions on the data generating process (DGP). In a causally inspired perspective on OOD generalization, the test data arise from a specific class of interventions on exogenous random variables of the DGP, called anchors. Anchor regression models, introduced by Rothenhäusler et al. (J R Stat Soc Ser B 83(2):215-246, 2021. 10.1111/rssb.12398), protect against distributional shifts in the test data by employing causal regularization. However, so far anchor regression has only been used with a squared-error loss which is inapplicable to common responses such as censored continuous or ordinal data. Here, we propose a distributional version of anchor regression which generalizes the method to potentially censored responses with at least an ordered sample space. To this end, we combine a flexible class of parametric transformation models for distributional regression with an appropriate causal regularizer under a more general notion of residuals. In an exemplary application and several simulation scenarios we demonstrate the extent to which OOD generalization is possible.

8.

ricu: R's interface to intensive care data.

Bennett, Nicolas; Plecko, Drago; Ukor, Ida-Fong; Meinshausen, Nicolai; Bühlmann, Peter.

Gigascience ; 122022 Dec 28.

Artigo em Inglês | MEDLINE | ID: mdl-37318234

RESUMO

OBJECTIVE: To develop a unified framework for analyzing data from 5 large publicly available intensive care unit (ICU) datasets. FINDINGS: Using 3 American (Medical Information Mart for Intensive Care III, Medical Information Mart for Intensive Care IV, electronic ICU) and 2 European (Amsterdam University Medical Center Database, High Time Resolution ICU Dataset) databases, we constructed a mapping for each database to a set of clinically relevant concepts, which are grounded in the Observational Medical Outcomes Partnership Vocabulary wherever possible. Furthermore, we performed synchronization in the units of measurement and data type representation. On top of this, we built functionality, which allows the user to download, set up, and load data from all of the 5 databases, through a unified Application Programming Interface. The resulting ricu R-package represents the computational infrastructure for handling publicly available ICU datasets, and its latest release allows the user to load 119 existing clinical concepts from the 5 data sources. CONCLUSION: The ricu R-package (available on GitHub and CRAN) is the first tool that enables users to analyze publicly available ICU datasets simultaneously (datasets are available upon request from respective owners). Such an interface saves researchers time when analyzing ICU data and helps reproducibility. We hope that ricu can become a community-wide effort, so that data harmonization is not repeated by each research group separately. One current limitation is that concepts were added on a case-to-case basis, and therefore the resulting dictionary of concepts is not comprehensive. Further work is needed to make the dictionary comprehensive.

Assuntos

Cuidados Críticos , Unidades de Terapia Intensiva , Humanos , Reprodutibilidade dos Testes , Cuidados Críticos/métodos , Bases de Dados Factuais , Gerenciamento de Dados

9.

Multiomic profiling of the liver across diets and age in a diverse mouse population.

Williams, Evan G; Pfister, Niklas; Roy, Suheeta; Statzer, Cyril; Haverty, Jack; Ingels, Jesse; Bohl, Casey; Hasan, Moaraj; Cuklina, Jelena; Bühlmann, Peter; Zamboni, Nicola; Lu, Lu; Ewald, Collin Y; Williams, Robert W; Aebersold, Ruedi.

Cell Syst ; 13(1): 43-57.e6, 2022 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-34666007

RESUMO

We profiled the liver transcriptome, proteome, and metabolome in 347 individuals from 58 isogenic strains of the BXD mouse population across age (7 to 24 months) and diet (low or high fat) to link molecular variations to metabolic traits. Several hundred genes are affected by diet and/or age at the transcript and protein levels. Orthologs of two aging-associated genes, St7 and Ctsd, were knocked down in C. elegans, reducing longevity in wild-type and mutant long-lived strains. The multiomics data were analyzed as segregating gene networks according to each independent variable, providing causal insight into dietary and aging effects. Candidates were cross-examined in an independent diversity outbred mouse liver dataset segregating for similar diets, with â¼80%-90% of diet-related candidate genes found in common across datasets. Together, we have developed a large multiomics resource for multivariate analysis of complex traits and demonstrate a methodology for moving from observational associations to causal connections.

Assuntos

Caenorhabditis elegans , Fígado , Animais , Caenorhabditis elegans/genética , Dieta , Redes Reguladoras de Genes , Fígado/metabolismo , Camundongos , Transcriptoma/genética

10.

Identifying cancer pathway dysregulations using differential causal effects.

Jablonski, Kim Philipp; Pirkl, Martin; Cevid, Domagoj; Bühlmann, Peter; Beerenwinkel, Niko.

Bioinformatics ; 38(6): 1550-1559, 2022 03 04.

Artigo em Inglês | MEDLINE | ID: mdl-34927666

RESUMO

MOTIVATION: Signaling pathways control cellular behavior. Dysregulated pathways, for example, due to mutations that cause genes and proteins to be expressed abnormally, can lead to diseases, such as cancer. RESULTS: We introduce a novel computational approach, called Differential Causal Effects (dce), which compares normal to cancerous cells using the statistical framework of causality. The method allows to detect individual edges in a signaling pathway that are dysregulated in cancer cells, while accounting for confounding. Hence, technical artifacts have less influence on the results and dce is more likely to detect the true biological signals. We extend the approach to handle unobserved dense confounding, where each latent variable, such as, for example, batch effects or cell cycle states, affects many covariates. We show that dce outperforms competing methods on synthetic datasets and on CRISPR knockout screens. We validate its latent confounding adjustment properties on a GTEx (Genotype-Tissue Expression) dataset. Finally, in an exploratory analysis on breast cancer data from TCGA (The Cancer Genome Atlas), we recover known and discover new genes involved in breast cancer progression. AVAILABILITY AND IMPLEMENTATION: The method dce is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/dce.html) as well as on https://github.com/cbg-ethz/dce. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Neoplasias da Mama , Software , Humanos , Feminino , Genoma , Transdução de Sinais

11.

Toward causality and improving external validity.

Bühlmann, Peter.

Proc Natl Acad Sci U S A ; 117(42): 25963-25965, 2020 10 20.

Artigo em Inglês | MEDLINE | ID: mdl-33046646

Assuntos

Genética , Causalidade , Reprodutibilidade dos Testes , Pesquisa

12.

SPHN/PHRT: Forming a Swiss-Wide Infrastructure for Data-Driven Sepsis Research.

Egli, Adrian; Battegay, Manuel; Büchler, Andrea C; Bühlmann, Peter; Calandra, Thierry; Eckert, Philippe; Furrer, Hansjakob; Greub, Gilbert; Jakob, Stephan M; Kaiser, Laurent; Leib, Stephen L; Marsch, Stephan; Meinshausen, Nicolai; Pagani, Jean-Luc; Pugin, Jerome; Rätsch, Gunnar; Schrenzel, Jacques; Schüpbach, Reto; Siegemund, Martin; Zamboni, Nicola; Zbinden, Reinhard; Zinkernagel, Annelies; Borgwardt, Karsten.

Stud Health Technol Inform ; 270: 1163-1167, 2020 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-32570564

RESUMO

Sepsis is a highly heterogenous syndrome with variable causes and outcomes. As part of the SPHN/PHRT funding program, we aim to build a highly interoperable, interconnected network for data collection, exchange and analysis of patients on intensive care units in order to predict sepsis onset and mortality earlier. All five University Hospitals, Universities, the Swiss Institute of Bioinformatics and ETH Zurich are involved in this multi-disciplinary project. With two prospective clinical observational studies, we test our infrastructure setup and improve the framework gradually and generate relevant data for research.

Assuntos

Sepse , Hospitais Universitários , Humanos , Unidades de Terapia Intensiva , Estudos Observacionais como Assunto , Estudos Prospectivos , Suíça

13.

A multi-marker association method for genome-wide association studies without the need for population structure correction.

Klasen, Jonas R; Barbez, Elke; Meier, Lukas; Meinshausen, Nicolai; Bühlmann, Peter; Koornneef, Maarten; Busch, Wolfgang; Schneeberger, Korbinian.

Nat Commun ; 7: 13299, 2016 11 10.

Artigo em Inglês | MEDLINE | ID: mdl-27830750

RESUMO

All common genome-wide association (GWA) methods rely on population structure correction, to avoid false genotype-to-phenotype associations. However, population structure correction is a stringent penalization, which also impedes identification of real associations. Using recent statistical advances, we developed a new GWA method, called Quantitative Trait Cluster Association Test (QTCAT), enabling simultaneous multi-marker associations while considering correlations between markers. With this, QTCAT overcomes the need for population structure correction and also reflects the polygenic nature of complex traits better than single-marker methods. Using simulated data, we show that QTCAT clearly outperforms linear mixed model approaches. Moreover, using QTCAT to reanalyse public human, mouse and Arabidopsis GWA data revealed nearly all known and some previously undetected associations. Following up on the most significant novel association in the Arabidopsis data allowed us to identify a so far unknown component of root growth.

Assuntos

Mapeamento Cromossômico/métodos , Estudos de Associação Genética/métodos , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas/genética , Arabidopsis/genética , Frequência do Gene , Genoma de Planta/genética , Genótipo , Modelos Lineares , Fenótipo , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes

14.

Methods for causal inference from gene perturbation experiments and validation.

Meinshausen, Nicolai; Hauser, Alain; Mooij, Joris M; Peters, Jonas; Versteeg, Philip; Bühlmann, Peter.

Proc Natl Acad Sci U S A ; 113(27): 7361-8, 2016 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-27382150

RESUMO

Inferring causal effects from observational and interventional data is a highly desirable but ambitious goal. Many of the computational and statistical methods are plagued by fundamental identifiability issues, instability, and unreliable performance, especially for large-scale systems with many measured variables. We present software and provide some validation of a recently developed methodology based on an invariance principle, called invariant causal prediction (ICP). The ICP method quantifies confidence probabilities for inferring causal structures and thus leads to more reliable and confirmatory statements for causal relations and predictions of external intervention effects. We validate the ICP method and some other procedures using large-scale genome-wide gene perturbation experiments in Saccharomyces cerevisiae The results suggest that prediction and prioritization of future experimental interventions, such as gene deletions, can be improved by using our statistical inference techniques.

Assuntos

Modelos Genéticos , Estatística como Assunto , Algoritmos , Citometria de Fluxo , Deleção de Genes , Saccharomyces cerevisiae , Software

15.

Assessing statistical significance in multivariable genome wide association analysis.

Buzdugan, Laura; Kalisch, Markus; Navarro, Arcadi; Schunk, Daniel; Fehr, Ernst; Bühlmann, Peter.

Bioinformatics ; 32(13): 1990-2000, 2016 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-27153677

RESUMO

MOTIVATION: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. RESULTS: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the 'spuriously correlated' SNP merely happens to be correlated with the 'truly causal' SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. AVAILABILITY AND IMPLEMENTATION: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. CONTACT: peter.buehlmann@stat.math.ethz.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Análise por Conglomerados , Simulação por Computador , Genótipo , Humanos , Modelos Lineares , Fenótipo , Reprodutibilidade dos Testes

16.

A Sequential Rejection Testing Method for High-Dimensional Regression with Correlated Variables.

Mandozzi, Jacopo; Bühlmann, Peter.

Int J Biostat ; 12(1): 79-95, 2016 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-27227719

RESUMO

We propose a general, modular method for significance testing of groups (or clusters) of variables in a high-dimensional linear model. In presence of high correlations among the covariables, due to serious problems of identifiability, it is indispensable to focus on detecting groups of variables rather than singletons. We propose an inference method which allows to build in hierarchical structures. It relies on repeated sample splitting and sequential rejection, and we prove that it asymptotically controls the familywise error rate. It can be implemented on any collection of clusters and leads to improved power in comparison to more standard non-sequential rejection methods. We complement the theoretical analysis with empirical results for simulated and real data.

Assuntos

Bioestatística/métodos , Biologia Computacional/métodos , Interpretação Estatística de Dados , Modelos Estatísticos

17.

Arabidopsis GERANYLGERANYL DIPHOSPHATE SYNTHASE 11 is a hub isozyme required for the production of most photosynthesis-related isoprenoids.

Ruiz-Sola, M Águila; Coman, Diana; Beck, Gilles; Barja, M Victoria; Colinas, Maite; Graf, Alexander; Welsch, Ralf; Rütimann, Philipp; Bühlmann, Peter; Bigler, Laurent; Gruissem, Wilhelm; Rodríguez-Concepción, Manuel; Vranová, Eva.

New Phytol ; 209(1): 252-64, 2016 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26224411

RESUMO

Most plastid isoprenoids, including photosynthesis-related metabolites such as carotenoids and the side chain of chlorophylls, tocopherols (vitamin E), phylloquinones (vitamin K), and plastoquinones, derive from geranylgeranyl diphosphate (GGPP) synthesized by GGPP synthase (GGPPS) enzymes. Seven out of 10 functional GGPPS isozymes in Arabidopsis thaliana reside in plastids. We aimed to address the function of different GGPPS paralogues for plastid isoprenoid biosynthesis. We constructed a gene co-expression network (GCN) using GGPPS paralogues as guide genes and genes from the upstream and downstream pathways as query genes. Furthermore, knock-out and/or knock-down ggpps mutants were generated and their growth and metabolic phenotypes were analyzed. Also, interacting protein partners of GGPPS11 were searched for. Our data showed that GGPPS11, encoding the only plastid isozyme essential for plant development, functions as a hub gene among GGPPS paralogues and is required for the production of all major groups of plastid isoprenoids. Furthermore, we showed that the GGPPS11 protein physically interacts with enzymes that use GGPP for the production of carotenoids, chlorophylls, tocopherols, phylloquinone, and plastoquinone. GGPPS11 is a hub isozyme required for the production of most photosynthesis-related isoprenoids. Both gene co-expression and protein-protein interaction likely contribute to the channeling of GGPP by GGPPS11.

Assuntos

Alquil e Aril Transferases/metabolismo , Proteínas de Arabidopsis/metabolismo , Arabidopsis/enzimologia , Terpenos/metabolismo , Alquil e Aril Transferases/genética , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Carotenoides/metabolismo , Clorofila/metabolismo , Isoenzimas , Fenótipo , Fotossíntese , Plastídeos/enzimologia , Fosfatos de Poli-Isoprenil/metabolismo , Mapeamento de Interação de Proteínas

18.

Structural intervention distance for evaluating causal graphs.

Peters, Jonas; Bühlmann, Peter.

Neural Comput ; 27(3): 771-99, 2015 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-25602767

RESUMO

Causal inference relies on the structure of a graph, often a directed acyclic graph (DAG). Different graphs may result in different causal inference statements and different intervention distributions. To quantify such differences, we propose a (pre-)metric between DAGs, the structural intervention distance (SID). The SID is based on a graphical criterion only and quantifies the closeness between two DAGs in terms of their corresponding causal inference statements. It is therefore well suited for evaluating graphs that are used for computing interventions. Instead of DAGs, it is also possible to compare CPDAGs, completed partially DAGs that represent Markov equivalence classes. The SID differs significantly from the widely used structural Hamming distance and therefore constitutes a valuable additional measure. We discuss properties of this distance and provide a (reasonably) efficient implementation with software code available on the first author's home page.

19.

Simultaneous analysis of large-scale RNAi screens for pathogen entry.

Rämö, Pauli; Drewek, Anna; Arrieumerlou, Cécile; Beerenwinkel, Niko; Ben-Tekaya, Houchaima; Cardel, Bettina; Casanova, Alain; Conde-Alvarez, Raquel; Cossart, Pascale; Csúcs, Gábor; Eicher, Simone; Emmenlauer, Mario; Greber, Urs; Hardt, Wolf-Dietrich; Helenius, Ari; Kasper, Christoph; Kaufmann, Andreas; Kreibich, Saskia; Kühbacher, Andreas; Kunszt, Peter; Low, Shyan Huey; Mercer, Jason; Mudrak, Daria; Muntwiler, Simone; Pelkmans, Lucas; Pizarro-Cerdá, Javier; Podvinec, Michael; Pujadas, Eva; Rinn, Bernd; Rouilly, Vincent; Schmich, Fabian; Siebourg-Polster, Juliane; Snijder, Berend; Stebler, Michael; Studer, Gabriel; Szczurek, Ewa; Truttmann, Matthias; von Mering, Christian; Vonderheit, Andreas; Yakimovich, Artur; Bühlmann, Peter; Dehio, Christoph.

BMC Genomics ; 15: 1162, 2014 Dec 22.

Artigo em Inglês | MEDLINE | ID: mdl-25534632

RESUMO

BACKGROUND: Large-scale RNAi screening has become an important technology for identifying genes involved in biological processes of interest. However, the quality of large-scale RNAi screening is often deteriorated by off-targets effects. In order to find statistically significant effector genes for pathogen entry, we systematically analyzed entry pathways in human host cells for eight pathogens using image-based kinome-wide siRNA screens with siRNAs from three vendors. We propose a Parallel Mixed Model (PMM) approach that simultaneously analyzes several non-identical screens performed with the same RNAi libraries. RESULTS: We show that PMM gains statistical power for hit detection due to parallel screening. PMM allows incorporating siRNA weights that can be assigned according to available information on RNAi quality. Moreover, PMM is able to estimate a sharedness score that can be used to focus follow-up efforts on generic or specific gene regulators. By fitting a PMM model to our data, we found several novel hit genes for most of the pathogens studied. CONCLUSIONS: Our results show parallel RNAi screening can improve the results of individual screens. This is currently particularly interesting when large-scale parallel datasets are becoming more and more publicly available. Our comprehensive siRNA dataset provides a public, freely available resource for further statistical and biological analyses in the high-content, high-throughput siRNA screening field.

Assuntos

Genômica/métodos , Interferência de RNA , RNA Interferente Pequeno/genética , Linhagem Celular , Biblioteca Gênica , Genômica/normas , Ensaios de Triagem em Larga Escala , Interações Hospedeiro-Patógeno/genética , Humanos , Curva ROC , Reprodutibilidade dos Testes

20.

Statistical approach to protein quantification.

Gerster, Sarah; Kwon, Taejoon; Ludwig, Christina; Matondo, Mariette; Vogel, Christine; Marcotte, Edward M; Aebersold, Ruedi; Bühlmann, Peter.

Mol Cell Proteomics ; 13(2): 666-77, 2014 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-24255132

RESUMO

A major goal in proteomics is the comprehensive and accurate description of a proteome. This task includes not only the identification of proteins in a sample, but also the accurate quantification of their abundance. Although mass spectrometry typically provides information on peptide identity and abundance in a sample, it does not directly measure the concentration of the corresponding proteins. Specifically, most mass-spectrometry-based approaches (e.g. shotgun proteomics or selected reaction monitoring) allow one to quantify peptides using chromatographic peak intensities or spectral counting information. Ultimately, based on these measurements, one wants to infer the concentrations of the corresponding proteins. Inferring properties of the proteins based on experimental peptide evidence is often a complex problem because of the ambiguity of peptide assignments and different chemical properties of the peptides that affect the observed concentrations. We present SCAMPI, a novel generic and statistically sound framework for computing protein abundance scores based on quantified peptides. In contrast to most previous approaches, our model explicitly includes information from shared peptides to improve protein quantitation, especially in eukaryotes with many homologous sequences. The model accounts for uncertainty in the input data, leading to statistical prediction intervals for the protein scores. Furthermore, peptides with extreme abundances can be reassessed and classified as either regular data points or actual outliers. We used the proposed model with several datasets and compared its performance to that of other, previously used approaches for protein quantification in bottom-up mass spectrometry.

Assuntos

Biologia Computacional/métodos , Interpretação Estatística de Dados , Proteínas/análise , Proteômica/estatística & dados numéricos , Linhagem Celular Tumoral , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Marcação por Isótopo/métodos , Leptospira interrogans/metabolismo , Leucemia Mieloide Aguda/metabolismo , Cadeias de Markov , Proteômica/métodos , Projetos de Pesquisa , Software

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA