Search | VHL Regional Portal

1.

BridgePRS leverages shared genetic effects across ancestries to increase polygenic risk score portability.

Hoggart, Clive J; Choi, Shing Wan; García-González, Judit; Souaiaia, Tade; Preuss, Michael; O'Reilly, Paul F.

Nat Genet ; 56(1): 180-186, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38123642

ABSTRACT

Here we present BridgePRS, a novel Bayesian polygenic risk score (PRS) method that leverages shared genetic effects across ancestries to increase PRS portability. We evaluate BridgePRS via simulations and real UK Biobank data across 19 traits in individuals of African, South Asian and East Asian ancestry, using both UK Biobank and Biobank Japan genome-wide association study summary statistics; out-of-cohort validation is performed in the Mount Sinai (New York) BioMe biobank. BridgePRS is compared with the leading alternative, PRS-CSx, and two other PRS methods. Simulations suggest that the performance of BridgePRS relative to PRS-CSx increases as uncertainty increases: with lower trait heritability, higher polygenicity and greater between-population genetic diversity; and when causal variants are not present in the data. In real data, BridgePRS has a 61% larger average R2 than PRS-CSx in out-of-cohort prediction of African ancestry samples in BioMe (P = 6 × 10-5). BridgePRS is a computationally efficient, user-friendly and powerful approach for PRS analyses in non-European ancestries.

Subject(s)

Genetic Predisposition to Disease , Genetic Risk Score , Humans , Risk Factors , Genome-Wide Association Study , Bayes Theorem , Polymorphism, Single Nucleotide/genetics , Multifactorial Inheritance/genetics

2.

PRSet: Pathway-based polygenic risk score analyses and software.

Choi, Shing Wan; García-González, Judit; Ruan, Yunfeng; Wu, Hei Man; Porras, Christian; Johnson, Jessica; Hoggart, Clive J; O'Reilly, Paul F.

PLoS Genet ; 19(2): e1010624, 2023 02.

Article in English | MEDLINE | ID: mdl-36749789

ABSTRACT

Polygenic risk scores (PRSs) have been among the leading advances in biomedicine in recent years. As a proxy of genetic liability, PRSs are utilised across multiple fields and applications. While numerous statistical and machine learning methods have been developed to optimise their predictive accuracy, these typically distil genetic liability to a single number based on aggregation of an individual's genome-wide risk alleles. This results in a key loss of information about an individual's genetic profile, which could be critical given the functional sub-structure of the genome and the heterogeneity of complex disease. In this manuscript, we introduce a 'pathway polygenic' paradigm of disease risk, in which multiple genetic liabilities underlie complex diseases, rather than a single genome-wide liability. We describe a method and accompanying software, PRSet, for computing and analysing pathway-based PRSs, in which polygenic scores are calculated across genomic pathways for each individual. We evaluate the potential of pathway PRSs in two distinct ways, creating two major sections: (1) In the first section, we benchmark PRSet as a pathway enrichment tool, evaluating its capacity to capture GWAS signal in pathways. We find that for target sample sizes of >10,000 individuals, pathway PRSs have similar power for evaluating pathway enrichment as leading methods MAGMA and LD score regression, with the distinct advantage of providing individual-level estimates of genetic liability for each pathway -opening up a range of pathway-based PRS applications, (2) In the second section, we evaluate the performance of pathway PRSs for disease stratification. We show that using a supervised disease stratification approach, pathway PRSs (computed by PRSet) outperform two standard genome-wide PRSs (computed by C+T and lassosum) for classifying disease subtypes in 20 of 21 scenarios tested. As the definition and functional annotation of pathways becomes increasingly refined, we expect pathway PRSs to offer key insights into the heterogeneity of complex disease and treatment response, to generate biologically tractable therapeutic targets from polygenic signal, and, ultimately, to provide a powerful path to precision medicine.

Subject(s)

Genomics , Multifactorial Inheritance , Humans , Risk Factors , Multifactorial Inheritance/genetics , Genome-Wide Association Study , Software , Genetic Predisposition to Disease

3.

Bridging a diagnostic Kawasaki disease classifier from a microarray platform to a qRT-PCR assay.

Kuiper, Rowan; Wright, Victoria J; Habgood-Coote, Dominic; Shimizu, Chisato; Huigh, Daphne; Tremoulet, Adriana H; van Keulen, Danielle; Hoggart, Clive J; Rodriguez-Manzano, Jesus; Herberg, Jethro A; Kaforou, Myrsini; Tempel, Dennie; Burns, Jane C; Levin, Michael.

Pediatr Res ; 93(3): 559-569, 2023 02.

Article in English | MEDLINE | ID: mdl-35732822

ABSTRACT

BACKGROUND: Kawasaki disease (KD) is a systemic vasculitis that mainly affects children under 5 years of age. Up to 30% of patients develop coronary artery abnormalities, which are reduced with early treatment. Timely diagnosis of KD is challenging but may become more straightforward with the recent discovery of a whole-blood host response classifier that discriminates KD patients from patients with other febrile conditions. Here, we bridged this microarray-based classifier to a clinically applicable quantitative reverse transcription-polymerase chain reaction (qRT-PCR) assay: the Kawasaki Disease Gene Expression Profiling (KiDs-GEP) classifier. METHODS: We designed and optimized a qRT-PCR assay and applied it to a subset of samples previously used for the classifier discovery to reweight the original classifier. RESULTS: The performance of the KiDs-GEP classifier was comparable to the original classifier with a cross-validated area under the ROC curve of 0.964 [95% CI: 0.924-1.00] vs 0.992 [95% CI: 0.978-1.00], respectively. Both classifiers demonstrated similar trends over various disease conditions, with the clearest distinction between individuals diagnosed with KD vs viral infections. CONCLUSION: We successfully bridged the microarray-based classifier into the KiDs-GEP classifier, a more rapid and more cost-efficient qRT-PCR assay, bringing a diagnostic test for KD closer to the hospital clinical laboratory. IMPACT: A diagnostic test is needed for Kawasaki disease and is currently not available. We describe the development of a One-Step multiplex qRT-PCR assay and the subsequent modification (i.e., bridging) of the microarray-based host response classifier previously described by Wright et al. The bridged KiDs-GEP classifier performs well in discriminating Kawasaki disease patients from febrile controls. This host response clinical test for Kawasaki disease can be adapted to the hospital clinical laboratory.

Subject(s)

Mucocutaneous Lymph Node Syndrome , Child , Humans , Child, Preschool , Mucocutaneous Lymph Node Syndrome/diagnosis , Mucocutaneous Lymph Node Syndrome/genetics , Reverse Transcriptase Polymerase Chain Reaction , Gene Expression Profiling , Fever , ROC Curve

4.

EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses.

Choi, Shing Wan; Mak, Timothy Shin Heng; Hoggart, Clive J; O'Reilly, Paul F.

Gigascience ; 122022 Dec 28.

Article in English | MEDLINE | ID: mdl-37326441

ABSTRACT

BACKGROUND: Polygenic risk score (PRS) analyses are now routinely applied across biomedical research. However, as PRS studies grow in size, there is an increased risk of sample overlap between the genome-wide association study (GWAS) from which the PRS is derived and the "target sample," in which PRSs are computed and hypotheses are tested. Despite the wide recognition of the sample overlap problem, its potential impact on the results from PRS studies has not yet been quantified, and no analytical solution has been provided. FINDINGS: Here, we first conduct a comprehensive investigation into the scale of the sample overlap problem, finding that PRS results can be substantially inflated even in the presence of minimal overlap. Next, we introduce a method and software, EraSOR (Erase Sample Overlap and Relatedness), which eliminates the inflation caused by sample overlap (and close relatedness) in almost all settings tested here. CONCLUSIONS: EraSOR could be useful in PRS studies (with target sample >1,000) similar to those investigated here, either (i) to mitigate the potential effects of known or unknown intercohort overlap and close relatedness or (ii) as a sensitivity tool to highlight the possible presence of sample overlap before its direct removal, when possible, or else to provide a lower bound on PRS analysis results after accounting for potential sample overlap.

Subject(s)

Genome-Wide Association Study , Multifactorial Inheritance , Humans , Software , Risk Assessment/methods , Risk Factors , Genetic Predisposition to Disease

5.

Detectable A Disintegrin and Metalloproteinase With Thrombospondin Motifs-1 in Serum Is Associated With Adverse Outcome in Pediatric Sepsis.

Boeddha, Navin P; Driessen, Gertjan J; Hagedoorn, Nienke N; Kohlfuerst, Daniela S; Hoggart, Clive J; van Rijswijk, Angelique L; Ekinci, Ebru; Priem, Debby; Schlapbach, Luregn J; Herberg, Jethro A; de Groot, Ronald; Anderson, Suzanne T; Fink, Colin G; Carrol, Enitan D; van der Flier, Michiel; Martinón-Torres, Federico; Levin, Michael; Leebeek, Frank W; Zenz, Werner; de Maat, Moniek P M; Hazelzet, Jan A; Emonts, Marieke; Dik, Willem A.

Crit Care Explor ; 3(11): e0569, 2021 Nov.

Article in English | MEDLINE | ID: mdl-34765980

ABSTRACT

IMPORTANCE: A Disintegrin and Metalloproteinase with Thrombospondin Motifs-1 is hypothesized to play a role in the pathogenesis of invasive infection, but studies in sepsis are lacking. OBJECTIVES: To study A Disintegrin and Metalloproteinase with Thrombospondin Motifs-1 protein level in pediatric sepsis and to study the association with outcome. DESIGN: Data from two prospective cohort studies. SETTING AND PARTICIPANTS: Cohort 1 is from a single-center study involving children admitted to PICU with meningococcal sepsis (samples obtained at three time points). Cohort 2 includes patients from a multicenter study involving children admitted to the hospital with invasive bacterial infections of differing etiologies (samples obtained within 48 hr after hospital admission). MAIN OUTCOMES AND MEASURES: Primary outcome measure was mortality. Secondary outcome measures were PICU-free days at day 28 and hospital length of stay. RESULTS: In cohort 1 (n = 59), nonsurvivors more frequently had A Disintegrin and Metalloproteinase with Thrombospondin Motifs-1 levels above the detection limit than survivors at admission to PICU (8/11 [73%] and 6/23 [26%], respectively; p = 0.02) and at t = 24 hours (2/3 [67%] and 3/37 [8%], respectively; p = 0.04). In cohort 2 (n = 240), A Disintegrin and Metalloproteinase with Thrombospondin Motifs-1 levels in patients within 48 hours after hospital admission were more frequently above the detection limit than in healthy controls (110/240 [46%] and 14/64 [22%], respectively; p = 0.001). Nonsurvivors more often had detectable A Disintegrin and Metalloproteinase with Thrombospondin Motifs-1 levels than survivors (16/21 [76%] and 94/219 [43%], respectively; p = 0.003), which was mostly attributable to patients with Neisseria meningitidis. CONCLUSIONS AND RELEVANCE: In children with bacterial infection, detection of A Disintegrin and Metalloproteinase with Thrombospondin Motifs-1 within 48 hours after hospital admission is associated with death, particularly in meningococcal sepsis. Future studies should confirm the prognostic value of A Disintegrin and Metalloproteinase with Thrombospondin Motifs-1 and should study pathophysiologic mechanisms.

6.

Evaluation of Host Serum Protein Biomarkers of Tuberculosis in sub-Saharan Africa.

Morris, Thomas C; Hoggart, Clive J; Chegou, Novel N; Kidd, Martin; Oni, Tolu; Goliath, Rene; Wilkinson, Katalin A; Dockrell, Hazel M; Sichali, Lifted; Banda, Louis; Crampin, Amelia C; French, Neil; Walzl, Gerhard; Levin, Michael; Wilkinson, Robert J; Hamilton, Melissa S.

Front Immunol ; 12: 639174, 2021.

Article in English | MEDLINE | ID: mdl-33717190

ABSTRACT

Accurate and affordable point-of-care diagnostics for tuberculosis (TB) are needed. Host serum protein signatures have been derived for use in primary care settings, however validation of these in secondary care settings is lacking. We evaluated serum protein biomarkers discovered in primary care cohorts from Africa reapplied to patients from secondary care. In this nested case-control study, concentrations of 22 proteins were quantified in sera from 292 patients from Malawi and South Africa who presented predominantly to secondary care. Recruitment was based upon intention of local clinicians to test for TB. The case definition for TB was culture positivity for Mycobacterium tuberculosis; and for other diseases (OD) a confirmed alternative diagnosis. Equal numbers of TB and OD patients were selected. Within each group, there were equal numbers with and without HIV and from each site. Patients were split into training and test sets for biosignature discovery. A nine-protein signature to distinguish TB from OD was discovered comprising fibrinogen, alpha-2-macroglobulin, CRP, MMP-9, transthyretin, complement factor H, IFN-gamma, IP-10, and TNF-alpha. This signature had an area under the receiver operating characteristic curve in the training set of 90% (95% CI 86-95%), and, after adjusting the cut-off for increased sensitivity, a sensitivity and specificity in the test set of 92% (95% CI 80-98%) and 71% (95% CI 56-84%), respectively. The best single biomarker was complement factor H [area under the receiver operating characteristic curve 70% (95% CI 64-76%)]. Biosignatures consisting of host serum proteins may function as point-of-care screening tests for TB in African hospitals. Complement factor H is identified as a new biomarker for such signatures.

Subject(s)

Biomarkers/blood , Complement Factor H/metabolism , HIV Infections/diagnosis , HIV-1/physiology , Mycobacterium tuberculosis/physiology , Tuberculosis, Pulmonary/diagnosis , Adult , Africa South of the Sahara/epidemiology , Complement Factor H/genetics , Female , Fibrinogen/genetics , Fibrinogen/metabolism , HIV Infections/epidemiology , Humans , Male , Middle Aged , Point-of-Care Testing , Tuberculosis, Pulmonary/epidemiology

7.

A three-marker protein biosignature distinguishes tuberculosis from other respiratory diseases in Gambian children.

Togun, Toyin; Hoggart, Clive J; Agbla, Schadrac C; Gomez, Marie P; Egere, Uzochukwu; Sillah, Abdou K; Saidy, Binta; Mendy, Francis; Pai, Madhukar; Kampmann, Beate.

EBioMedicine ; 58: 102909, 2020 Aug.

Article in English | MEDLINE | ID: mdl-32711253

ABSTRACT

BACKGROUND: Our study aimed to identify a host cytokine biosignature that could distinguish childhood tuberculosis (TB) from other respiratory diseases (OD). METHODS: Cytokine responses in prospectively recruited children with symptoms suggestive of TB were measured in whole blood assay supernatants, harvested after overnight incubation, using a Luminex platform. We used logistic regression models with Least Absolute Shrinkage and Selection Operator (LASSO) penalty to identify the optimal biosignature associated with confirmed TB disease in the training set. We subsequently assessed its performance in the test set. FINDINGS: Of the 431 children included in the study, 44 had bacteriologically confirmed TB, 60 had clinically diagnosed TB while 327 had OD. All children were HIV-negative. Application of LASSO regression models to the training set (n = 260) resulted in the combination of IL-1ra, IL-7 and IP-10 from unstimulated samples as the optimally discriminant cytokine biosignature associated with bacteriologically confirmed TB. In the test set (n = 171), this biosignature distinguished children diagnosed with TB disease, irrespective of microbiological confirmation, from OD with area under the receiver operator characteristic curve (AUC) of 0â¢74 (95% CI: 0â¢67, 0â¢81), and demonstrated sensitivity and specificity of 72â¢2% (95% CI: 60â¢4, 82â¢1%) and 75â¢0% (95% CI: 64â¢9, 83â¢4%) respectively, with its performance independent of their age group and their age- and sex-adjusted nutritional status. INTERPRETATION: This novel biosignature of childhood TB derived from unstimulated supernatants is promising. Independent validation with further optimisation will improve its performance and translational potential. FUNDING: Steinberg Fellowship (McGill University); Grand Challenges Canada; MRC Program Grant.

Subject(s)

Biomarkers/blood , Chemokine CXCL10/blood , Interleukin 1 Receptor Antagonist Protein/blood , Interleukin-7/blood , Respiratory Tract Infections/diagnosis , Tuberculosis, Pulmonary/diagnosis , Adolescent , Child , Child, Preschool , Diagnosis, Differential , Female , Gambia , Humans , Infant , Male , Mycobacterium tuberculosis/isolation & purification , Prospective Studies , Regression Analysis , Respiratory Tract Infections/blood , Respiratory Tract Infections/microbiology , Sensitivity and Specificity , Tuberculosis, Pulmonary/blood

8.

Massively parallel reporter assays of melanoma risk variants identify MX2 as a gene promoting melanoma.

Choi, Jiyeon; Zhang, Tongwu; Vu, Andrew; Ablain, Julien; Makowski, Matthew M; Colli, Leandro M; Xu, Mai; Hennessey, Rebecca C; Yin, Jinhu; Rothschild, Harriet; Gräwe, Cathrin; Kovacs, Michael A; Funderburk, Karen M; Brossard, Myriam; Taylor, John; Pasaniuc, Bogdan; Chari, Raj; Chanock, Stephen J; Hoggart, Clive J; Demenais, Florence; Barrett, Jennifer H; Law, Matthew H; Iles, Mark M; Yu, Kai; Vermeulen, Michiel; Zon, Leonard I; Brown, Kevin M.

Nat Commun ; 11(1): 2718, 2020 06 01.

Article in English | MEDLINE | ID: mdl-32483191

ABSTRACT

Genome-wide association studies (GWAS) have identified ~20 melanoma susceptibility loci, most of which are not functionally characterized. Here we report an approach integrating massively-parallel reporter assays (MPRA) with cell-type-specific epigenome and expression quantitative trait loci (eQTL) to identify susceptibility genes/variants from multiple GWAS loci. From 832 high-LD variants, we identify 39 candidate functional variants from 14 loci displaying allelic transcriptional activity, a subset of which corroborates four colocalizing melanocyte cis-eQTL genes. Among these, we further characterize the locus encompassing the HIV-1 restriction gene, MX2 (Chr21q22.3), and validate a functional intronic variant, rs398206. rs398206 mediates the binding of the transcription factor, YY1, to increase MX2 levels, consistent with the cis-eQTL of MX2 in primary human melanocytes. Melanocyte-specific expression of human MX2 in a zebrafish model demonstrates accelerated melanoma formation in a BRAFV600E background. Our integrative approach streamlines GWAS follow-up studies and highlights a pleiotropic function of MX2 in melanoma susceptibility.

Subject(s)

Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Melanoma/genetics , Mutation , Myxovirus Resistance Proteins/genetics , Polymorphism, Single Nucleotide , Animals , Cell Line, Tumor , Disease Models, Animal , Gene Expression Regulation , Genes, Reporter/genetics , HEK293 Cells , Humans , Melanocytes/metabolism , Melanoma/pathology , Proto-Oncogene Proteins B-raf/genetics , Proto-Oncogene Proteins B-raf/metabolism , Quantitative Trait Loci/genetics , Zebrafish/genetics , Zebrafish/metabolism

9.

Secondary re-analysis of the FEAST trial - Authors' reply.

Levin, Michael; Cunnington, Aubrey J; Hoggart, Clive J.

Lancet Respir Med ; 7(10): e31, 2019 10.

Article in English | MEDLINE | ID: mdl-31556400

Subject(s)

Albumins

10.

Effects of saline or albumin fluid bolus in resuscitation: evidence from re-analysis of the FEAST trial.

Levin, Michael; Cunnington, Aubrey J; Wilson, Clare; Nadel, Simon; Lang, Hans Joerg; Ninis, Nelly; McCulloch, Mignon; Argent, Andrew; Buys, Heloise; Moxon, Christopher A; Best, Abigail; Nijman, Ruud G; Hoggart, Clive J.

Lancet Respir Med ; 7(7): 581-593, 2019 07.

Article in English | MEDLINE | ID: mdl-31196803

ABSTRACT

BACKGROUND: Fluid resuscitation is the recommended management of shock, but increased mortality in febrile African children in the FEAST trial. We hypothesised that fluid bolus-induced deaths in FEAST would be associated with detectable changes in cardiovascular, neurological, or respiratory function, oxygen carrying capacity, and blood biochemistry. METHODS: We developed composite scores for respiratory, cardiovascular, and neurological function using vital sign data from the FEAST trial, and used them to compare participants from FEAST with those from four other cohorts and to identify differences between the bolus (n=2097) and no bolus (n=1044) groups of FEAST. We calculated the odds of adverse outcome for each ten-unit increase in baseline score using logistic regression for each cohort. Within FEAST participants, we also compared haemoglobin and plasma biochemistry between bolus and non-bolus patients, assessed the effects of these factors along with the vital sign scores on the contribution of bolus to mortality using Cox proportional hazard models, and used Bayesian clustering to identify subgroups that differed in response to bolus. The FEAST trial is registered with ISRCTN, number ISRCTN69856593. FINDINGS: Increasing respiratory (odds ratio 1·09, 95% CI 1·07-1·11), neurological (1·26, 1·21-1·31), and cardiovascular scores (1·09, 1·05-1·14) were associated with death in FEAST (all p<0·0001), and with adverse outcomes for specific scores in the four other cohorts. In FEAST, fluid bolus increased respiratory and neurological scores and decreased cardiovascular score at 1 h after commencement of the infusion. Fluid bolus recipients had mean 0·33 g/dL (95% CI 0·20-0·46) reduction in haemoglobin concentration after 8 h (p<0·0001), and at 24 h had a decrease of 1·41 mEq/L (95% CI 0·76-2·06; p=0·0002) in mean base excess and increase of 1·65 mmol/L (0·47-2·8; p=0·0070) in mean chloride, and a decrease of 0·96 mmol/L (0·45 to 1·47; p=0·0003) in bicarbonate. There were similar effects of fluid bolus in three patient subgroups, identified on the basis of their baseline characteristics. Hyperchloraemic acidosis and respiratory and neurological dysfunction induced by saline or albumin bolus explained the excess mortality due to bolus in Cox survival models. INTERPRETATION: In the resuscitation of febrile children, albumin and saline boluses can cause respiratory and neurological dysfunction, hyperchloraemic acidosis, and reduction in haemoglobin concentration. The findings support the notion that fluid resuscitation with unbuffered electrolyte solutions may cause harm and their use should be cautioned. The effects of lower volumes of buffered solutions should be evaluated further. FUNDING: Medical Research Council, Department for International Development, National Institute for Health Research, Imperial College Biomedical Research Centre.

Subject(s)

Albumins/therapeutic use , Fluid Therapy/adverse effects , Resuscitation/adverse effects , Saline Solution/therapeutic use , Shock/mortality , Shock/therapy , Adolescent , Child , Child, Preschool , Cohort Studies , Female , Fluid Therapy/methods , Humans , Infant , Male , Resuscitation/methods , Risk Assessment , Shock/etiology , Survival Rate

11.

Diagnosis of Kawasaki Disease Using a Minimal Whole-Blood Gene Expression Signature.

Wright, Victoria J; Herberg, Jethro A; Kaforou, Myrsini; Shimizu, Chisato; Eleftherohorinou, Hariklia; Shailes, Hannah; Barendregt, Anouk M; Menikou, Stephanie; Gormley, Stuart; Berk, Maurice; Hoang, Long Truong; Tremoulet, Adriana H; Kanegaye, John T; Coin, Lachlan J M; Glodé, Mary P; Hibberd, Martin; Kuijpers, Taco W; Hoggart, Clive J; Burns, Jane C; Levin, Michael.

JAMA Pediatr ; 172(10): e182293, 2018 10 01.

Article in English | MEDLINE | ID: mdl-30083721

ABSTRACT

Importance: To date, there is no diagnostic test for Kawasaki disease (KD). Diagnosis is based on clinical features shared with other febrile conditions, frequently resulting in delayed or missed treatment and an increased risk of coronary artery aneurysms. Objective: To identify a whole-blood gene expression signature that distinguishes children with KD in the first week of illness from other febrile conditions. Design, Setting, and Participants: The case-control study comprised a discovery group that included a training and test set and a validation group of children with KD or comparator febrile illness. The setting was pediatric centers in the United Kingdom, Spain, the Netherlands, and the United States. The training and test discovery group comprised 404 children with infectious and inflammatory conditions (78 KD, 84 other inflammatory diseases, and 242 bacterial or viral infections) and 55 healthy controls. The independent validation group comprised 102 patients with KD, including 72 in the first 7 days of illness, and 130 febrile controls. The study dates were March 1, 2009, to November 14, 2013, and data analysis took place from January 1, 2015, to December 31, 2017. Main Outcomes and Measures: Whole-blood gene expression was evaluated using microarrays, and minimal transcript sets distinguishing KD were identified using a novel variable selection method (parallel regularized regression model search). The ability of transcript signatures (implemented as disease risk scores) to discriminate KD cases from controls was assessed by area under the curve (AUC), sensitivity, and specificity at the optimal cut point according to the Youden index. Results: Among 404 patients in the discovery set, there were 78 with KD (median age, 27 months; 55.1% male) and 326 febrile controls (median age, 37 months; 56.4% male). Among 202 patients in the validation set, there were 72 with KD (median age, 34 months; 62.5% male) and 130 febrile controls (median age, 17 months; 56.9% male). A 13-transcript signature identified in the discovery training set distinguished KD from other infectious and inflammatory conditions in the discovery test set, with AUC of 96.2% (95% CI, 92.5%-99.9%), sensitivity of 81.7% (95% CI, 60.0%-94.8%), and specificity of 92.1% (95% CI, 84.0%-97.0%). In the validation set, the signature distinguished KD from febrile controls, with AUC of 94.6% (95% CI, 91.3%-98.0%), sensitivity of 85.9% (95% CI, 76.8%-92.6%), and specificity of 89.1% (95% CI, 83.0%-93.7%). The signature was applied to clinically defined categories of definite, highly probable, and possible KD, resulting in AUCs of 98.1% (95% CI, 94.5%-100%), 96.3% (95% CI, 93.3%-99.4%), and 70.0% (95% CI, 53.4%-86.6%), respectively, mirroring certainty of clinical diagnosis. Conclusions and Relevance: In this study, a 13-transcript blood gene expression signature distinguished KD from other febrile conditions. Diagnostic accuracy increased with certainty of clinical diagnosis. A test incorporating the 13-transcript disease risk score may enable earlier diagnosis and treatment of KD and reduce inappropriate treatment in those with other diagnoses.

Subject(s)

Gene Expression Profiling/methods , Mucocutaneous Lymph Node Syndrome/diagnosis , RNA/blood , Child, Preschool , Diagnosis, Differential , Female , Genetic Markers , Humans , Infant , Male , Mucocutaneous Lymph Node Syndrome/blood , Mucocutaneous Lymph Node Syndrome/genetics , RNA/genetics , Reproducibility of Results , Retrospective Studies , Severity of Illness Index , Transcription, Genetic

12.

npInv: accurate detection and genotyping of inversions using long read sub-alignment.

Shao, Haojing; Ganesamoorthy, Devika; Duarte, Tania; Cao, Minh Duc; Hoggart, Clive J; Coin, Lachlan J M.

BMC Bioinformatics ; 19(1): 261, 2018 07 13.

Article in English | MEDLINE | ID: mdl-30001702

ABSTRACT

BACKGROUND: Detection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. RESULT: We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats. CONCLUSION: The application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.

Subject(s)

Chromosome Inversion/genetics , Genotype , Humans

13.

Diagnostic Test Accuracy of a 2-Transcript Host RNA Signature for Discriminating Bacterial vs Viral Infection in Febrile Children.

Herberg, Jethro A; Kaforou, Myrsini; Wright, Victoria J; Shailes, Hannah; Eleftherohorinou, Hariklia; Hoggart, Clive J; Cebey-López, Miriam; Carter, Michael J; Janes, Victoria A; Gormley, Stuart; Shimizu, Chisato; Tremoulet, Adriana H; Barendregt, Anouk M; Salas, Antonio; Kanegaye, John; Pollard, Andrew J; Faust, Saul N; Patel, Sanjay; Kuijpers, Taco; Martinón-Torres, Federico; Burns, Jane C; Coin, Lachlan J M; Levin, Michael.

JAMA ; 316(8): 835-45, 2016.

Article in English | MEDLINE | ID: mdl-27552617

ABSTRACT

IMPORTANCE: Because clinical features do not reliably distinguish bacterial from viral infection, many children worldwide receive unnecessary antibiotic treatment, while bacterial infection is missed in others. OBJECTIVE: To identify a blood RNA expression signature that distinguishes bacterial from viral infection in febrile children. DESIGN, SETTING, AND PARTICIPANTS: Febrile children presenting to participating hospitals in the United Kingdom, Spain, the Netherlands, and the United States between 2009-2013 were prospectively recruited, comprising a discovery group and validation group. Each group was classified after microbiological investigation as having definite bacterial infection, definite viral infection, or indeterminate infection. RNA expression signatures distinguishing definite bacterial from viral infection were identified in the discovery group and diagnostic performance assessed in the validation group. Additional validation was undertaken in separate studies of children with meningococcal disease (n = 24) and inflammatory diseases (n = 48) and on published gene expression datasets. EXPOSURES: A 2-transcript RNA expression signature distinguishing bacterial infection from viral infection was evaluated against clinical and microbiological diagnosis. MAIN OUTCOMES AND MEASURES: Definite bacterial and viral infection was confirmed by culture or molecular detection of the pathogens. Performance of the RNA signature was evaluated in the definite bacterial and viral group and in the indeterminate infection group. RESULTS: The discovery group of 240 children (median age, 19 months; 62% male) included 52 with definite bacterial infection, of whom 36 (69%) required intensive care, and 92 with definite viral infection, of whom 32 (35%) required intensive care. Ninety-six children had indeterminate infection. Analysis of RNA expression data identified a 38-transcript signature distinguishing bacterial from viral infection. A smaller (2-transcript) signature (FAM89A and IFI44L) was identified by removing highly correlated transcripts. When this 2-transcript signature was implemented as a disease risk score in the validation group (130 children, with 23 definite bacterial, 28 definite viral, and 79 indeterminate infections; median age, 17 months; 57% male), all 23 patients with microbiologically confirmed definite bacterial infection were classified as bacterial (sensitivity, 100% [95% CI, 100%-100%]) and 27 of 28 patients with definite viral infection were classified as viral (specificity, 96.4% [95% CI, 89.3%-100%]). When applied to additional validation datasets from patients with meningococcal and inflammatory diseases, bacterial infection was identified with a sensitivity of 91.7% (95% CI, 79.2%-100%) and 90.0% (95% CI, 70.0%-100%), respectively, and with specificity of 96.0% (95% CI, 88.0%-100%) and 95.8% (95% CI, 89.6%-100%). Of the children in the indeterminate groups, 46.3% (63/136) were classified as having bacterial infection, although 94.9% (129/136) received antibiotic treatment. CONCLUSIONS AND RELEVANCE: This study provides preliminary data regarding test accuracy of a 2-transcript host RNA signature discriminating bacterial from viral infection in febrile children. Further studies are needed in diverse groups of patients to assess accuracy and clinical utility of this test in different clinical settings.

Subject(s)

Antigens/blood , Bacterial Infections/diagnosis , Cytoskeletal Proteins/blood , Fever/microbiology , Fever/virology , RNA/blood , Virus Diseases/diagnosis , Anti-Bacterial Agents/administration & dosage , Antigens/genetics , Area Under Curve , Bacterial Infections/complications , Bacterial Infections/genetics , Biomarkers/blood , Child, Preschool , Coinfection/diagnosis , Coinfection/microbiology , Coinfection/virology , Cytoskeletal Proteins/genetics , Diagnosis, Differential , Female , Fever/blood , Gene Expression Profiling , Genetic Markers , Humans , Infant , Logistic Models , Male , Prospective Studies , RNA/analysis , RNA/genetics , Risk , Sensitivity and Specificity , Severity of Illness Index , Virus Diseases/complications , Virus Diseases/genetics

14.

Predicting IVIG resistance in UK Kawasaki disease.

Davies, Sarah; Sutton, Natalina; Blackstock, Sarah; Gormley, Stuart; Hoggart, Clive J; Levin, Michael; Herberg, Jethro A.

Arch Dis Child ; 100(4): 366-8, 2015 Apr.

Article in English | MEDLINE | ID: mdl-25670405

ABSTRACT

The Kobayashi score (KS) predicts intravenous immunoglobulin (IVIG) resistance in Japanese children with Kawasaki disease (KD) and has been used to select patients for early corticosteroid treatment. We tested the ability of the KS to predict IVIG resistance and coronary artery abnormalities (CAA) in 78 children treated for KD in our UK centre. 19/59 children were IVIG non-responsive. This was not predicted by a high KS (11/19 IVIG non-responders, compared with 26/40 responders, had a score ≥4; p=0.77). CAA were not predicted by KS (12/20 children with CAA vs 25/39 with normal echo had a score ≥4; p=0.78). Low albumin and haemoglobin, and high C-reactive protein were significantly associated with CAA. The KS does not predict IVIG resistance or CAA in our population. This highlights the need for biomarkers to identify children at increased risk of CAA, and to select patients for anti-inflammatory treatment in addition to IVIG.

Subject(s)

Coronary Aneurysm/prevention & control , Immunoglobulins, Intravenous/therapeutic use , Mucocutaneous Lymph Node Syndrome/drug therapy , C-Reactive Protein/metabolism , Child , Child, Preschool , Coronary Aneurysm/etiology , Drug Resistance , Female , Hemoglobins/metabolism , Humans , Infant , Male , Mucocutaneous Lymph Node Syndrome/complications , Prognosis , Retrospective Studies , Serum Albumin/metabolism

15.

Novel approach identifies SNPs in SLC2A10 and KCNK9 with evidence for parent-of-origin effect on body mass index.

Hoggart, Clive J; Venturini, Giulia; Mangino, Massimo; Gomez, Felicia; Ascari, Giulia; Zhao, Jing Hua; Teumer, Alexander; Winkler, Thomas W; Tsernikova, Natalia; Luan, Jian'an; Mihailov, Evelin; Ehret, Georg B; Zhang, Weihua; Lamparter, David; Esko, Tõnu; Macé, Aurelien; Rüeger, Sina; Bochud, Pierre-Yves; Barcella, Matteo; Dauvilliers, Yves; Benyamin, Beben; Evans, David M; Hayward, Caroline; Lopez, Mary F; Franke, Lude; Russo, Alessia; Heid, Iris M; Salvi, Erika; Vendantam, Sailaja; Arking, Dan E; Boerwinkle, Eric; Chambers, John C; Fiorito, Giovanni; Grallert, Harald; Guarrera, Simonetta; Homuth, Georg; Huffman, Jennifer E; Porteous, David; Moradpour, Darius; Iranzo, Alex; Hebebrand, Johannes; Kemp, John P; Lammers, Gert J; Aubert, Vincent; Heim, Markus H; Martin, Nicholas G; Montgomery, Grant W; Peraita-Adrados, Rosa; Santamaria, Joan; Negro, Francesco.

PLoS Genet ; 10(7): e1004508, 2014 Jul.

Article in English | MEDLINE | ID: mdl-25078964

ABSTRACT

The phenotypic effect of some single nucleotide polymorphisms (SNPs) depends on their parental origin. We present a novel approach to detect parent-of-origin effects (POEs) in genome-wide genotype data of unrelated individuals. The method exploits increased phenotypic variance in the heterozygous genotype group relative to the homozygous groups. We applied the method to >56,000 unrelated individuals to search for POEs influencing body mass index (BMI). Six lead SNPs were carried forward for replication in five family-based studies (of â¼4,000 trios). Two SNPs replicated: the paternal rs2471083-C allele (located near the imprinted KCNK9 gene) and the paternal rs3091869-T allele (located near the SLC2A10 gene) increased BMI equally (betaâ=â0.11 (SD), P<0.0027) compared to the respective maternal alleles. Real-time PCR experiments of lymphoblastoid cell lines from the CEPH families showed that expression of both genes was dependent on parental origin of the SNPs alleles (P<0.01). Our scheme opens new opportunities to exploit GWAS data of unrelated individuals to identify POEs and demonstrates that they play an important role in adult obesity.

Subject(s)

Glucose Transport Proteins, Facilitative/genetics , Obesity/genetics , Polymorphism, Single Nucleotide/genetics , Potassium Channels, Tandem Pore Domain/genetics , Adult , Body Mass Index , Female , Gene Expression Regulation , Genetic Predisposition to Disease , Genome-Wide Association Study , Genomic Imprinting , Genotype , Humans , Male , Obesity/pathology , White People/genetics

16.

Genome-wide association study of primary tooth eruption identifies pleiotropic loci associated with height and craniofacial distances.

Fatemifar, Ghazaleh; Hoggart, Clive J; Paternoster, Lavinia; Kemp, John P; Prokopenko, Inga; Horikoshi, Momoko; Wright, Victoria J; Tobias, Jon H; Richmond, Stephen; Zhurov, Alexei I; Toma, Arshed M; Pouta, Anneli; Taanila, Anja; Sipila, Kirsi; Lähdesmäki, Raija; Pillas, Demetris; Geller, Frank; Feenstra, Bjarke; Melbye, Mads; Nohr, Ellen A; Ring, Susan M; St Pourcain, Beate; Timpson, Nicholas J; Davey Smith, George; Jarvelin, Marjo-Riitta; Evans, David M.

Hum Mol Genet ; 22(18): 3807-17, 2013 Sep 15.

Article in English | MEDLINE | ID: mdl-23704328

ABSTRACT

Twin and family studies indicate that the timing of primary tooth eruption is highly heritable, with estimates typically exceeding 80%. To identify variants involved in primary tooth eruption, we performed a population-based genome-wide association study of 'age at first tooth' and 'number of teeth' using 5998 and 6609 individuals, respectively, from the Avon Longitudinal Study of Parents and Children (ALSPAC) and 5403 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966). We tested 2 446 724 SNPs imputed in both studies. Analyses were controlled for the effect of gestational age, sex and age of measurement. Results from the two studies were combined using fixed effects inverse variance meta-analysis. We identified a total of 15 independent loci, with 10 loci reaching genome-wide significance (P < 5 × 10(-8)) for 'age at first tooth' and 11 loci for 'number of teeth'. Together, these associations explain 6.06% of the variation in 'age of first tooth' and 4.76% of the variation in 'number of teeth'. The identified loci included eight previously unidentified loci, some containing genes known to play a role in tooth and other developmental pathways, including an SNP in the protein-coding region of BMP4 (rs17563, P = 9.080 × 10(-17)). Three of these loci, containing the genes HMGA2, AJUBA and ADK, also showed evidence of association with craniofacial distances, particularly those indexing facial width. Our results suggest that the genome-wide association approach is a powerful strategy for detecting variants involved in tooth eruption, and potentially craniofacial growth and more generally organ development.

Subject(s)

Body Height/genetics , Face/anatomy & histology , Genetic Loci , Tooth Eruption/genetics , Chromosomes, Human , Dentition , Female , Finland , Genetic Pleiotropy , Genome-Wide Association Study , Humans , Longitudinal Studies , Polymorphism, Single Nucleotide

17.

The effect of genomic inversions on estimation of population genetic parameters from SNP data.

Seich Al Basatena, Nafisa-Katrin; Hoggart, Clive J; Coin, Lachlan J; O'Reilly, Paul F.

Genetics ; 193(1): 243-53, 2013 Jan.

Article in English | MEDLINE | ID: mdl-23150602

ABSTRACT

In recent years it has emerged that structural variants have a substantial impact on genomic variation. Inversion polymorphisms represent a significant class of structural variant, and despite the challenges in their detection, data on inversions in the human genome are increasing rapidly. Statistical methods for inferring parameters such as the recombination rate and the selection coefficient have generally been developed without accounting for the presence of inversions. Here we exploit new software for simulating inversions in population genetic data, invertFREGENE, to assess the potential impact of inversions on such methods. Using data simulated by invertFREGENE, as well as real data from several sources, we test whether large inversions have a disruptive effect on widely applied population genetics methods for inferring recombination rates, for detecting selection, and for controlling for population structure in genome-wide association studies (GWAS). We find that recombination rates estimated by LDhat are biased downward at inversion loci relative to the true contemporary recombination rates at the loci but that recombination hotspots are not falsely inferred at inversion breakpoints as may have been expected. We find that the integrated haplotype score (iHS) method for detecting selection appears robust to the presence of inversions. Finally, we observe a strong bias in the genome-wide results of principal components analysis (PCA), used to control for population structure in GWAS, in the presence of even a single large inversion, confirming the necessity to thin SNPs by linkage disequilibrium at large physical distances to obtain unbiased results.

Subject(s)

Genetics, Population , Models, Genetic , Polymorphism, Single Nucleotide , Humans , Principal Component Analysis , Recombination, Genetic , Selection, Genetic

18.

A multi-SNP locus-association method reveals a substantial fraction of the missing heritability.

Ehret, Georg B; Lamparter, David; Hoggart, Clive J; Whittaker, John C; Beckmann, Jacques S; Kutalik, Zoltán.

Am J Hum Genet ; 91(5): 863-71, 2012 Nov 02.

Article in English | MEDLINE | ID: mdl-23122585

ABSTRACT

There are many known examples of multiple semi-independent associations at individual loci; such associations might arise either because of true allelic heterogeneity or because of imperfect tagging of an unobserved causal variant. This phenomenon is of great importance in monogenic traits but has not yet been systematically investigated and quantified in complex-trait genome-wide association studies (GWASs). Here, we describe a multi-SNP association method that estimates the effect of loci harboring multiple association signals by using GWAS summary statistics. Applying the method to a large anthropometric GWAS meta-analysis (from the Genetic Investigation of Anthropometric Traits consortium study), we show that for height, body mass index (BMI), and waist-to-hip ratio (WHR), 3%, 2%, and 1%, respectively, of additional phenotypic variance can be explained on top of the previously reported 10% (height), 1.5% (BMI), and 1% (WHR). The method also permitted a substantial increase (by up to 50%) in the number of loci that replicate in a discovery-validation design. Specifically, we identified 74 loci at which the multi-SNP, a linear combination of SNPs, explains significantly more variance than does the best individual SNP. A detailed analysis of multi-SNPs shows that most of the additional variability explained is derived from SNPs that are not in linkage disequilibrium with the lead SNP, suggesting a major contribution of allelic heterogeneity to the missing heritability.

Subject(s)

Genome-Wide Association Study , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Body Mass Index , Humans , Lipids/blood , Lipids/genetics , Phenotype , Waist-Hip Ratio

19.

MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS.

O'Reilly, Paul F; Hoggart, Clive J; Pomyen, Yotsawat; Calboli, Federico C F; Elliott, Paul; Jarvelin, Marjo-Riitta; Coin, Lachlan J M.

PLoS One ; 7(5): e34861, 2012.

Article in English | MEDLINE | ID: mdl-22567092

ABSTRACT

The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.

Subject(s)

Genome-Wide Association Study/methods , Models, Theoretical , Phenotype

20.

Fine-scale estimation of location of birth from genome-wide single-nucleotide polymorphism data.

Hoggart, Clive J; O'Reilly, Paul F; Kaakinen, Marika; Zhang, Weihua; Chambers, John C; Kooner, Jaspal S; Coin, Lachlan J M; Jarvelin, Marjo-Riitta.

Genetics ; 190(2): 669-77, 2012 Feb.

Article in English | MEDLINE | ID: mdl-22095078

ABSTRACT

Systematic nonrandom mating in populations results in genetic stratification and is predominantly caused by geographic separation, providing the opportunity to infer individuals' birthplace from genetic data. Such inference has been demonstrated for individuals' country of birth, but here we use data from the Northern Finland Birth Cohort 1966 (NFBC1966) to investigate the characteristics of genetic structure within a population and subsequently develop a method for inferring location to a finer scale. Principal component analysis (PCA) shows that while the first PCs are particularly informative for location, there is also location information in the higher-order PCs, but it cannot be captured by a linear model. We introduce a new method, pcLOCATE, which is able to exploit this information to improve the accuracy of location inference. pcLOCATE uses individuals' PC values to estimate the probability of birth in each town and then averages over all towns to give an estimated longitude and latitude of birth using a fully Bayesian model. We apply pcLOCATE to the NFBC1966 data to estimate parental birthplace, testing with successively more PCs and finding the model with the top 23 PCs most accurate, with a median distance of 23 km between the estimated and the true location. pcLOCATE predicts the most recent residence of NFBC1966 individuals to a median distance of 47 km. We also apply pcLOCATE to Indian individuals from the London Life Sciences Prospective Population Study (LOLIPOP) data, and find that birthplace is predicated to a median distance of 54 km from the true location. A method with such accuracy is potentially valuable in population genetics and forensics.

Subject(s)

Genetics, Population , Polymorphism, Single Nucleotide , Principal Component Analysis , Residence Characteristics/statistics & numerical data , Adult , Aged , Algorithms , Cohort Studies , Female , Finland , Genotype , Humans , London , Male , Middle Aged , Models, Statistical

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL