Pesquisa | Portal Regional da BVS

Improving GWAS discovery and genomic prediction accuracy in biobank data.

Orliac, Etienne J; Trejo Banos, Daniel; Ojavee, Sven E; Läll, Kristi; Mägi, Reedik; Visscher, Peter M; Robinson, Matthew R.

Proc Natl Acad Sci U S A ; 119(31): e2121279119, 2022 08 02.

Artigo em Inglês | MEDLINE | ID: mdl-35905320

RESUMO

Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency-linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated [Formula: see text]. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average [Formula: see text] value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies.

Assuntos

Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Medicina de Precisão , Característica Quantitativa Herdável , Teorema de Bayes , Inglaterra , Estônia , Genômica , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único

Genomic architecture and prediction of censored time-to-event phenotypes with a Bayesian genome-wide analysis.

Ojavee, Sven E; Kousathanas, Athanasios; Trejo Banos, Daniel; Orliac, Etienne J; Patxot, Marion; Läll, Kristi; Mägi, Reedik; Fischer, Krista; Kutalik, Zoltan; Robinson, Matthew R.

Nat Commun ; 12(1): 2337, 2021 04 20.

Artigo em Inglês | MEDLINE | ID: mdl-33879782

RESUMO

While recent advancements in computation and modelling have improved the analysis of complex traits, our understanding of the genetic basis of the time at symptom onset remains limited. Here, we develop a Bayesian approach (BayesW) that provides probabilistic inference of the genetic architecture of age-at-onset phenotypes in a sampling scheme that facilitates biobank-scale time-to-event analyses. We show in extensive simulation work the benefits BayesW provides in terms of number of discoveries, model performance and genomic prediction. In the UK Biobank, we find many thousands of common genomic regions underlying the age-at-onset of high blood pressure (HBP), cardiac disease (CAD), and type-2 diabetes (T2D), and for the genetic basis of onset reflecting the underlying genetic liability to disease. Age-at-menopause and age-at-menarche are also highly polygenic, but with higher variance contributed by low frequency variants. Genomic prediction into the Estonian Biobank data shows that BayesW gives higher prediction accuracy than other approaches.

Assuntos

Idade de Início , Genoma Humano , Modelos Genéticos , Herança Multifatorial , Fatores Etários , Algoritmos , Teorema de Bayes , Doenças Cardiovasculares/genética , Simulação por Computador , Bases de Dados Genéticas , Diabetes Mellitus Tipo 2/genética , Estônia , Feminino , Estudos de Associação Genética , Estudo de Associação Genômica Ampla , Genômica , Humanos , Hipertensão/genética , Menarca/genética , Menopausa/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Reino Unido

Multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults.

Hillary, Robert F; Trejo-Banos, Daniel; Kousathanas, Athanasios; McCartney, Daniel L; Harris, Sarah E; Stevenson, Anna J; Patxot, Marion; Ojavee, Sven Erik; Zhang, Qian; Liewald, David C; Ritchie, Craig W; Evans, Kathryn L; Tucker-Drob, Elliot M; Wray, Naomi R; McRae, Allan F; Visscher, Peter M; Deary, Ian J; Robinson, Matthew R; Marioni, Riccardo E.

Genome Med ; 12(1): 60, 2020 07 08.

Artigo em Inglês | MEDLINE | ID: mdl-32641083

RESUMO

BACKGROUND: The molecular factors which control circulating levels of inflammatory proteins are not well understood. Furthermore, association studies between molecular probes and human traits are often performed by linear model-based methods which may fail to account for complex structure and interrelationships within molecular datasets. METHODS: In this study, we perform genome- and epigenome-wide association studies (GWAS/EWAS) on the levels of 70 plasma-derived inflammatory protein biomarkers in healthy older adults (Lothian Birth Cohort 1936; n = 876; Olink® inflammation panel). We employ a Bayesian framework (BayesR+) which can account for issues pertaining to data structure and unknown confounding variables (with sensitivity analyses using ordinary least squares- (OLS) and mixed model-based approaches). RESULTS: We identified 13 SNPs associated with 13 proteins (n = 1 SNP each) concordant across OLS and Bayesian methods. We identified 3 CpG sites spread across 3 proteins (n = 1 CpG each) that were concordant across OLS, mixed-model and Bayesian analyses. Tagged genetic variants accounted for up to 45% of variance in protein levels (for MCP2, 36% of variance alone attributable to 1 polymorphism). Methylation data accounted for up to 46% of variation in protein levels (for CXCL10). Up to 66% of variation in protein levels (for VEGFA) was explained using genetic and epigenetic data combined. We demonstrated putative causal relationships between CD6 and IL18R1 with inflammatory bowel disease and between IL12B and Crohn's disease. CONCLUSIONS: Our data may aid understanding of the molecular regulation of the circulating inflammatory proteome as well as causal relationships between inflammatory mediators and disease.

Assuntos

Biomarcadores , Epigenômica , Estudo de Associação Genômica Ampla , Genômica , Proteínas/genética , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Proteínas Sanguíneas/genética , Biologia Computacional/métodos , Metilação de DNA , Suscetibilidade a Doenças , Epigênese Genética , Epigenômica/métodos , Feminino , Regulação da Expressão Gênica , Genômica/métodos , Voluntários Saudáveis , Humanos , Inflamação/etiologia , Inflamação/metabolismo , Mediadores da Inflamação , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único , Proteínas/metabolismo , Locos de Características Quantitativas

Bayesian reassessment of the epigenetic architecture of complex traits.

Trejo Banos, Daniel; McCartney, Daniel L; Patxot, Marion; Anchieri, Lucas; Battram, Thomas; Christiansen, Colette; Costeira, Ricardo; Walker, Rosie M; Morris, Stewart W; Campbell, Archie; Zhang, Qian; Porteous, David J; McRae, Allan F; Wray, Naomi R; Visscher, Peter M; Haley, Chris S; Evans, Kathryn L; Deary, Ian J; McIntosh, Andrew M; Hemani, Gibran; Bell, Jordana T; Marioni, Riccardo E; Robinson, Matthew R.

Nat Commun ; 11(1): 2865, 2020 06 08.

Artigo em Inglês | MEDLINE | ID: mdl-32513961

RESUMO

Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification. Here, a statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly. Implicitly adjusting for probe correlations, data structure (cell-count or relatedness), and single-nucleotide polymorphism (SNP) marker effects, improves association estimates and in 9,448 individuals, 75.7% (95% CI 71.70-79.3) of body mass index (BMI) variation and 45.6% (95% CI 37.3-51.9) of cigarette consumption variation was captured by whole blood methylation array data. Pathway-linked probes of blood cholesterol, lipid transport and sterol metabolism for BMI, and xenobiotic stimuli response for smoking, showed >1.5 times larger associations with >95% posterior inclusion probability. Prediction accuracy improved by 28.7% for BMI and 10.2% for smoking over a LASSO model, with age-, and tissue-specificity, implying associations are a phenotypic consequence rather than causal.

Assuntos

Epigênese Genética , Característica Quantitativa Herdável , Adulto , Algoritmos , Teorema de Bayes , Biomarcadores/análise , Índice de Massa Corporal , Simulação por Computador , Metilação de DNA/genética , Humanos , Anotação de Sequência Molecular , Especificidade de Órgãos/genética , Reprodutibilidade dos Testes

Closed-loop cycles of experiment design, execution, and learning accelerate systems biology model development in yeast.

Coutant, Anthony; Roper, Katherine; Trejo-Banos, Daniel; Bouthinon, Dominique; Carpenter, Martin; Grzebyta, Jacek; Santini, Guillaume; Soldano, Henry; Elati, Mohamed; Ramon, Jan; Rouveirol, Celine; Soldatova, Larisa N; King, Ross D.

Proc Natl Acad Sci U S A ; 116(36): 18142-18147, 2019 09 03.

Artigo em Inglês | MEDLINE | ID: mdl-31420515

RESUMO

One of the most challenging tasks in modern science is the development of systems biology models: Existing models are often very complex but generally have low predictive performance. The construction of high-fidelity models will require hundreds/thousands of cycles of model improvement, yet few current systems biology research studies complete even a single cycle. We combined multiple software tools with integrated laboratory robotics to execute three cycles of model improvement of the prototypical eukaryotic cellular transformation, the yeast (Saccharomyces cerevisiae) diauxic shift. In the first cycle, a model outperforming the best previous diauxic shift model was developed using bioinformatic and systems biology tools. In the second cycle, the model was further improved using automatically planned experiments. In the third cycle, hypothesis-led experiments improved the model to a greater extent than achieved using high-throughput experiments. All of the experiments were formalized and communicated to a cloud laboratory automation system (Eve) for automatic execution, and the results stored on the semantic web for reuse. The final model adds a substantial amount of knowledge about the yeast diauxic shift: 92 genes (+45%), and 1,048 interactions (+147%). This knowledge is also relevant to understanding cancer, the immune system, and aging. We conclude that systems biology software tools can be combined and integrated with laboratory robots in closed-loop cycles.

Assuntos

Biologia Computacional , Regulação Fúngica da Expressão Gênica , Robótica , Saccharomyces cerevisiae , Software , Biologia de Sistemas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

A Bayesian approach for structure learning in oscillating regulatory networks.

Trejo Banos, Daniel; Millar, Andrew J; Sanguinetti, Guido.

Bioinformatics ; 31(22): 3617-24, 2015 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-26177966

RESUMO

MOTIVATION: Oscillations lie at the core of many biological processes, from the cell cycle, to circadian oscillations and developmental processes. Time-keeping mechanisms are essential to enable organisms to adapt to varying conditions in environmental cycles, from day/night to seasonal. Transcriptional regulatory networks are one of the mechanisms behind these biological oscillations. However, while identifying cyclically expressed genes from time series measurements is relatively easy, determining the structure of the interaction network underpinning the oscillation is a far more challenging problem. RESULTS: Here, we explicitly leverage the oscillatory nature of the transcriptional signals and present a method for reconstructing network interactions tailored to this special but important class of genetic circuits. Our method is based on projecting the signal onto a set of oscillatory basis functions using a Discrete Fourier Transform. We build a Bayesian Hierarchical model within a frequency domain linear model in order to enforce sparsity and incorporate prior knowledge about the network structure. Experiments on real and simulated data show that the method can lead to substantial improvements over competing approaches if the oscillatory assumption is met, and remains competitive also in cases it is not. AVAILABILITY: DSS, experiment scripts and data are available at http://homepages.inf.ed.ac.uk/gsanguin/DSS.zip. CONTACT: d.trejo-banos@sms.ed.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Redes Reguladoras de Genes , Arabidopsis/genética , Teorema de Bayes , Ciclo Celular/genética , Relógios Circadianos/genética , Simulação por Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA