Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Cell Chem Biol ; 30(11): 1377-1389.e8, 2023 11 16.
Article in English | MEDLINE | ID: mdl-37586370

ABSTRACT

TruAB Discovery is an approach that integrates cellular immunology, high-throughput immunosequencing, bioinformatics, and computational biology in order to discover naturally occurring human antibodies for prophylactic or therapeutic use. We adapted our previously described pairSEQ technology to pair B cell receptor heavy and light chains of SARS-CoV-2 spike protein-binding antibodies derived from enriched antigen-specific memory B cells and bulk antibody-secreting cells. We identified approximately 60,000 productive, in-frame, paired antibody sequences, from which 2,093 antibodies were selected for functional evaluation based on abundance, isotype and patterns of somatic hypermutation. The exceptionally diverse antibodies included RBD-binders with broad neutralizing activity against SARS-CoV-2 variants, and S2-binders with broad specificity against betacoronaviruses and the ability to block membrane fusion. A subset of these RBD- and S2-binding antibodies demonstrated robust protection against challenge in hamster and mouse models. This high-throughput approach can accelerate discovery of diverse, multifunctional antibodies against any target of interest.


Subject(s)
COVID-19 , SARS-CoV-2 , Animals , Mice , Humans , Antibodies, Neutralizing , Broadly Neutralizing Antibodies , Antibodies, Viral
2.
Cell Rep ; 42(1): 112014, 2023 01 31.
Article in English | MEDLINE | ID: mdl-36681898

ABSTRACT

The SARS-CoV-2 Omicron variant of concern (VoC) and its sublineages contain 31-36 mutations in spike and escape neutralization by most therapeutic antibodies. In a pseudovirus neutralization assay, 66 of the nearly 400 candidate therapeutics in the Coronavirus Immunotherapeutic Consortium (CoVIC) panel neutralize Omicron and multiple Omicron sublineages. Among natural immunoglobulin Gs (IgGs), especially those in the receptor-binding domain (RBD)-2 epitope community, nearly all Omicron neutralizers recognize spike bivalently, with both antigen-binding fragments (Fabs) simultaneously engaging adjacent RBDs on the same spike. Most IgGs that do not neutralize Omicron bind either entirely monovalently or have some (22%-50%) monovalent occupancy. Cleavage of bivalent-binding IgGs to Fabs abolishes neutralization and binding affinity, with disproportionate loss of activity against Omicron pseudovirus and spike. These results suggest that VoC-resistant antibodies overcome mutagenic substitution via avidity. Hence, vaccine strategies targeting future SARS-CoV-2 variants should consider epitope display with spacing and organization identical to trimeric spike.


Subject(s)
COVID-19 , Humans , SARS-CoV-2 , Ethnicity , Epitopes , Antibodies, Viral , Antibodies, Neutralizing , Neutralization Tests
3.
BMC Cancer ; 20(1): 612, 2020 Jun 30.
Article in English | MEDLINE | ID: mdl-32605647

ABSTRACT

BACKGROUND: The clonoSEQ® Assay (Adaptive Biotechnologies Corporation, Seattle, USA) identifies and tracks unique disease-associated immunoglobulin (Ig) sequences by next-generation sequencing of IgH, IgK, and IgL rearrangements and IgH-BCL1/2 translocations in malignant B cells. Here, we describe studies to validate the analytical performance of the assay using patient samples and cell lines. METHODS: Sensitivity and specificity were established by defining the limit of detection (LoD), limit of quantitation (LoQ) and limit of blank (LoB) in genomic DNA (gDNA) from 66 patients with multiple myeloma (MM), acute lymphoblastic leukemia (ALL), or chronic lymphocytic leukemia (CLL), and three cell lines. Healthy donor gDNA was used as a diluent to contrive samples with specific DNA masses and malignant-cell frequencies. Precision was validated using a range of samples contrived from patient gDNA, healthy donor gDNA, and 9 cell lines to generate measurable residual disease (MRD) frequencies spanning clinically relevant thresholds. Linearity was determined using samples contrived from cell line gDNA spiked into healthy gDNA to generate 11 MRD frequencies for each DNA input, then confirmed using clinical samples. Quantitation accuracy was assessed by (1) comparing clonoSEQ and multiparametric flow cytometry (mpFC) measurements of ALL and MM cell lines diluted in healthy mononuclear cells, and (2) analyzing precision study data for bias between clonoSEQ MRD results in diluted gDNA and those expected from mpFC based on original, undiluted samples. Repeatability of nucleotide base calls was assessed via the assay's ability to recover malignant clonotype sequences across several replicates, process features, and MRD levels. RESULTS: LoD and LoQ were estimated at 1.903 cells and 2.390 malignant cells, respectively. LoB was zero in healthy donor gDNA. Precision ranged from 18% CV (coefficient of variation) at higher DNA inputs to 68% CV near the LoD. Variance component analysis showed MRD results were robust, with expected laboratory process variations contributing ≤3% CV. Linearity and accuracy were demonstrated for each disease across orders of magnitude of clonal frequencies. Nucleotide sequence error rates were extremely low. CONCLUSIONS: These studies validate the analytical performance of the clonoSEQ Assay and demonstrate its potential as a highly sensitive diagnostic tool for selected lymphoid malignancies.


Subject(s)
High-Throughput Nucleotide Sequencing/instrumentation , Leukemia, Lymphocytic, Chronic, B-Cell/diagnosis , Multiple Myeloma/diagnosis , Precursor Cell Lymphoblastic Leukemia-Lymphoma/diagnosis , Reagent Kits, Diagnostic , Bone Marrow/pathology , Cyclin D1/genetics , Gene Rearrangement , Humans , Immunoglobulin Heavy Chains/genetics , Immunoglobulin lambda-Chains/genetics , Immunoglobulins/genetics , Leukemia, Lymphocytic, Chronic, B-Cell/blood , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Leukemia, Lymphocytic, Chronic, B-Cell/therapy , Limit of Detection , Multiple Myeloma/blood , Multiple Myeloma/genetics , Multiple Myeloma/therapy , Neoplasm, Residual , Precursor Cell Lymphoblastic Leukemia-Lymphoma/blood , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/therapy , Proto-Oncogene Proteins c-bcl-2/genetics , Translocation, Genetic
4.
Science ; 356(6334): 200-205, 2017 04 14.
Article in English | MEDLINE | ID: mdl-28408606

ABSTRACT

Immunotherapy has clinical activity in certain virally associated cancers. However, the tumor antigens targeted in successful treatments remain poorly defined. We used a personalized immunogenomic approach to elucidate the global landscape of antitumor T cell responses in complete regression of human papillomavirus-associated metastatic cervical cancer after tumor-infiltrating adoptive T cell therapy. Remarkably, immunodominant T cell reactivities were directed against mutated neoantigens or a cancer germline antigen, rather than canonical viral antigens. T cells targeting viral tumor antigens did not display preferential in vivo expansion. Both viral and nonviral tumor antigen-specific T cells resided predominantly in the programmed cell death 1 (PD-1)-expressing T cell compartment, which suggests that PD-1 blockade may unleash diverse antitumor T cell reactivities. These findings suggest a new paradigm of targeting nonviral antigens in immunotherapy of virally associated cancers.


Subject(s)
Antigens, Neoplasm/immunology , Carcinoma, Squamous Cell/therapy , DNA-Binding Proteins/immunology , Immunotherapy, Adoptive/methods , Oncogene Proteins, Viral/immunology , Papillomavirus Infections/therapy , Repressor Proteins/immunology , Uterine Cervical Neoplasms/therapy , Antigens, Neoplasm/genetics , Carcinoma, Squamous Cell/virology , DNA-Binding Proteins/genetics , Female , Human papillomavirus 16/immunology , Human papillomavirus 18/immunology , Humans , Lymphocytes, Tumor-Infiltrating/immunology , Lymphocytes, Tumor-Infiltrating/transplantation , Oncogene Proteins, Viral/genetics , Papillomavirus Infections/complications , Programmed Cell Death 1 Receptor/antagonists & inhibitors , Programmed Cell Death 1 Receptor/genetics , Programmed Cell Death 1 Receptor/metabolism , Repressor Proteins/genetics , T-Lymphocytes/immunology , T-Lymphocytes/transplantation , Uterine Cervical Neoplasms/virology
5.
Cancer Immunol Res ; 4(9): 734-43, 2016 09 02.
Article in English | MEDLINE | ID: mdl-27354337

ABSTRACT

Adoptive transfer of T cells with engineered T-cell receptor (TCR) genes that target tumor-specific antigens can mediate cancer regression. Accumulating evidence suggests that the clinical success of many immunotherapies is mediated by T cells targeting mutated neoantigens unique to the patient. We hypothesized that the most frequent TCR clonotypes infiltrating the tumor were reactive against tumor antigens. To test this hypothesis, we developed a multistep strategy that involved TCRB deep sequencing of the CD8(+)PD-1(+) T-cell subset, matching of TCRA-TCRB pairs by pairSEQ and single-cell RT-PCR, followed by testing of the TCRs for tumor-antigen specificity. Analysis of 12 fresh metastatic melanomas revealed that in 11 samples, up to 5 tumor-reactive TCRs were present in the 5 most frequently occurring clonotypes, which included reactivity against neoantigens. These data show the feasibility of developing a rapid, personalized TCR-gene therapy approach that targets the unique set of antigens presented by the autologous tumor without the need to identify their immunologic reactivity. Cancer Immunol Res; 4(9); 734-43. ©2016 AACR.


Subject(s)
Antigens, Neoplasm/metabolism , Lymphocytes, Tumor-Infiltrating/immunology , Lymphocytes, Tumor-Infiltrating/metabolism , Neoplasms/immunology , Neoplasms/metabolism , Receptors, Antigen, T-Cell/metabolism , T-Lymphocyte Subsets/immunology , T-Lymphocyte Subsets/metabolism , Adult , Aged , Biomarkers , CD8-Positive T-Lymphocytes/immunology , CD8-Positive T-Lymphocytes/metabolism , Clonal Evolution , Female , Humans , Immunophenotyping , Lymphocyte Count , Lymphocytes, Tumor-Infiltrating/pathology , Male , Middle Aged , Mutation , Neoplasm Metastasis , Neoplasm Staging , Neoplasms/pathology , T-Lymphocyte Subsets/pathology , Young Adult
6.
Mol Biol Evol ; 33(3): 657-69, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26545921

ABSTRACT

Genetic variation harbors signatures of natural selection driven by selective pressures that are often unknown. Estimating the ages of selection signals may allow reconstructing the history of environmental changes that shaped human phenotypes and diseases. We have developed an approximate Bayesian computation (ABC) approach to estimate allele ages under a model of selection on new mutations and under demographic models appropriate for human populations. We have applied it to two resequencing data sets: An ultra-high depth data set from a relatively small sample of unrelated individuals and a lower depth data set in a larger sample with transmission information. In addition to evaluating the accuracy of our method based on simulations, for each SNP, we assessed the consistency between the posterior probabilities estimated by the ABC approach and the ancient DNA record, finding good agreement between the two types of data and methods. Applying this ABC approach to data for eight single nucleotide polymorphisms (SNPs), we were able to rule out an onset of selection prior to the dispersal out-of-Africa for three of them and more recent than the spread of agriculture for an additional three SNPs.


Subject(s)
Genetics, Population , Models, Genetic , Selection, Genetic , Alleles , Bayes Theorem , Computational Biology/methods , Computer Simulation , Evolution, Molecular , Gene Frequency , Genetic Variation , Humans , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
7.
Nat Commun ; 6: 8111, 2015 Sep 14.
Article in English | MEDLINE | ID: mdl-26368830

ABSTRACT

Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.


Subject(s)
Gene Frequency , Genetic Variation , Genotype , Haplotypes , Models, Statistical , White People/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Alleles , Genome, Human , Humans , Italy , Middle Aged , Models, Genetic , Polymorphism, Single Nucleotide , United Kingdom , Young Adult
8.
Sci Transl Med ; 7(301): 301ra131, 2015 Aug 19.
Article in English | MEDLINE | ID: mdl-26290413

ABSTRACT

The T cell receptor (TCR) protein is a heterodimer composed of an α chain and a ß chain. TCR genes undergo somatic DNA rearrangements to generate the diversity of T cell binding specificities needed for effective immunity. Recently, high-throughput immunosequencing methods have been developed to profile the TCR α (TCRA) and TCR ß (TCRB) repertoires. However, these methods cannot determine which TCRA and TCRB chains combine to form a specific TCR, which is essential for many functional and therapeutic applications. We describe and validate a method called pairSEQ, which can leverage the diversity of TCR sequences to accurately pair hundreds of thousands of TCRA and TCRB sequences in a single experiment. Our TCR pairing method uses standard laboratory consumables and equipment without the need for single-cell technologies. We show that pairSEQ can be applied to T cells from both blood and solid tissues, such as tumors.


Subject(s)
Receptors, Antigen, T-Cell/metabolism , CD8-Positive T-Lymphocytes/metabolism , Humans , Receptors, Antigen, T-Cell/genetics , Receptors, Antigen, T-Cell, alpha-beta/genetics , Receptors, Antigen, T-Cell, alpha-beta/metabolism
9.
Clin Cancer Res ; 20(17): 4540-8, 2014 Sep 01.
Article in English | MEDLINE | ID: mdl-24970842

ABSTRACT

PURPOSE: High-throughput sequencing (HTS) of immunoglobulin heavy-chain genes (IGH) in unselected clinical samples for minimal residual disease (MRD) in B lymphoblastic leukemia (B-ALL) has not been tested. As current MRD-detecting methods such as flow cytometry or patient-specific qPCR are complex or difficult to standardize in the clinical laboratory, sequencing may enhance clinical prognostication. EXPERIMENTAL DESIGN: We sequenced IGH in paired pretreatment and day 29 post-treatment samples using residual material from consecutive, unselected samples from the Children's Oncology Group AALL0932 trial to measure MRD as compared with flow cytometry. We assessed the impact of ongoing recombination at IGH on MRD detection in post-treatment samples. Finally, we evaluated a subset of cases with discordant MRD results between flow cytometry and sequencing. RESULTS: We found clonal IGH rearrangements in 92 of 98 pretreatment patient samples. Furthermore, while ongoing recombination of IGH was evident, index clones typically prevailed in MRD-positive post-treatment samples, suggesting that clonal evolution at IGH does not contribute substantively to tumor fitness. MRD was detected by sequencing in all flow cytometry-positive cases with no false-negative results. In addition, in a subset of patients, MRD was detected by sequencing, but not by flow cytometry, including a fraction with MRD levels within the sensitivity of flow cytometry. We provide data that suggest that this discordance in some patients may be due to the phenotypic maturation of the transformed cell. CONCLUSION: Our results provide strong support for HTS of IGH to enhance clinical prognostication in B-ALL.


Subject(s)
High-Throughput Nucleotide Sequencing , Immunoglobulin Heavy Chains/genetics , Leukemia, B-Cell/genetics , Neoplasm, Residual/genetics , Disease-Free Survival , Flow Cytometry , Gene Rearrangement, B-Lymphocyte, Heavy Chain/genetics , Humans , Leukemia, B-Cell/complications , Leukemia, B-Cell/pathology , Neoplasm, Residual/etiology , Neoplasm, Residual/pathology , Prognosis
10.
Am J Hum Genet ; 93(4): 687-96, 2013 Oct 03.
Article in English | MEDLINE | ID: mdl-24094745

ABSTRACT

High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.


Subject(s)
Genome, Human , Haplotypes/genetics , Sequence Analysis, DNA/methods , Child , Fathers , Female , Genotype , Humans , Male , Models, Genetic , Mothers , Polymorphism, Single Nucleotide
11.
Nature ; 502(7471): 377-80, 2013 Oct 17.
Article in English | MEDLINE | ID: mdl-23995691

ABSTRACT

Statins are prescribed widely to lower plasma low-density lipoprotein (LDL) concentrations and cardiovascular disease risk and have been shown to have beneficial effects in a broad range of patients. However, statins are associated with an increased risk, albeit small, of clinical myopathy and type 2 diabetes. Despite evidence for substantial genetic influence on LDL concentrations, pharmacogenomic trials have failed to identify genetic variations with large effects on either statin efficacy or toxicity, and have produced little information regarding mechanisms that modulate statin response. Here we identify a downstream target of statin treatment by screening for the effects of in vitro statin exposure on genetic associations with gene expression levels in lymphoblastoid cell lines derived from 480 participants of a clinical trial of simvastatin treatment. This analysis identified six expression quantitative trait loci (eQTLs) that interacted with simvastatin exposure, including rs9806699, a cis-eQTL for the gene glycine amidinotransferase (GATM) that encodes the rate-limiting enzyme in creatine synthesis. We found this locus to be associated with incidence of statin-induced myotoxicity in two separate populations (meta-analysis odds ratio = 0.60). Furthermore, we found that GATM knockdown in hepatocyte-derived cell lines attenuated transcriptional response to sterol depletion, demonstrating that GATM may act as a functional link between statin-mediated lowering of cholesterol and susceptibility to statin-induced myopathy.


Subject(s)
Amidinotransferases/genetics , Gene Expression Regulation/drug effects , Hydroxymethylglutaryl-CoA Reductase Inhibitors/adverse effects , Muscular Diseases/chemically induced , Quantitative Trait Loci/genetics , Simvastatin/adverse effects , Amidinotransferases/deficiency , Amidinotransferases/metabolism , Cell Line , Cholesterol/deficiency , Cholesterol/metabolism , Cholesterol/pharmacology , Gene Knockdown Techniques , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/pharmacology , Lymphocytes/cytology , Lymphocytes/drug effects , Lymphocytes/metabolism , Muscular Diseases/genetics , Muscular Diseases/metabolism , Polymorphism, Single Nucleotide/genetics , Simvastatin/pharmacology , Sterol Regulatory Element Binding Proteins/metabolism , Transcription, Genetic/drug effects
12.
Genome Res ; 23(5): 749-61, 2013 May.
Article in English | MEDLINE | ID: mdl-23478400

ABSTRACT

Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%-48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.


Subject(s)
Evolution, Molecular , Genome, Human , INDEL Mutation/genetics , Genetics, Population , Genome-Wide Association Study , High-Throughput Nucleotide Sequencing , Humans , Mutagenesis, Insertional , Mutation Rate , Polymorphism, Single Nucleotide
13.
Nat Genet ; 44(8): 955-9, 2012 Jul 22.
Article in English | MEDLINE | ID: mdl-22820512

ABSTRACT

The 1000 Genomes Project and disease-specific sequencing efforts are producing large collections of haplotypes that can be used as reference panels for genotype imputation in genome-wide association studies (GWAS). However, imputing from large reference panels with existing methods imposes a high computational burden. We introduce a strategy called 'pre-phasing' that maintains the accuracy of leading methods while reducing computational costs. We first statistically estimate the haplotypes for each individual within the GWAS sample (pre-phasing) and then impute missing genotypes into these estimated haplotypes. This reduces the computational cost because (i) the GWAS samples must be phased only once, whereas standard methods would implicitly repeat phasing with each reference panel update, and (ii) it is much faster to match a phased GWAS haplotype to one reference haplotype than to match two unphased GWAS genotypes to a pair of reference haplotypes. We implemented our approach in the MaCH and IMPUTE2 frameworks, and we tested it on data sets from the Wellcome Trust Case Control Consortium 2 (WTCCC2), the Genetic Association Information Network (GAIN), the Women's Health Initiative (WHI) and the 1000 Genomes Project. This strategy will be particularly valuable for repeated imputation as reference panels evolve.


Subject(s)
Genome-Wide Association Study/methods , Genotype , Computational Biology , Databases, Genetic , Genome-Wide Association Study/statistics & numerical data , Haplotypes , Human Genome Project , Humans
14.
Eur J Hum Genet ; 20(7): 801-5, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22293688

ABSTRACT

We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two 'missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10(-16). The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, 'missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation.


Subject(s)
Genetic Testing/methods , Genome, Human , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Algorithms , Case-Control Studies , Chromosome Mapping/methods , Crohn Disease/genetics , Cyclin-Dependent Kinase Inhibitor p15/genetics , Diabetes Mellitus, Type 1/genetics , Diabetes Mellitus, Type 2/genetics , Exons , Genetic Linkage , Genetic Loci , Genetic Predisposition to Disease , Genetic Testing/standards , Genome-Wide Association Study/standards , Haplotypes , Humans , Interleukin-2 Receptor alpha Subunit/genetics , Reference Standards
15.
PLoS Genet ; 7(6): e1002105, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21655089

ABSTRACT

Genome-wide association studies (GWAS) have identified 14 tagging single nucleotide polymorphisms (tagSNPs) that are associated with the risk of colorectal cancer (CRC), and several of these tagSNPs are near bone morphogenetic protein (BMP) pathway loci. The penalty of multiple testing implicit in GWAS increases the attraction of complementary approaches for disease gene discovery, including candidate gene- or pathway-based analyses. The strongest candidate loci for additional predisposition SNPs are arguably those already known both to have functional relevance and to be involved in disease risk. To investigate this proposition, we searched for novel CRC susceptibility variants close to the BMP pathway genes GREM1 (15q13.3), BMP4 (14q22.2), and BMP2 (20p12.3) using sample sets totalling 24,910 CRC cases and 26,275 controls. We identified new, independent CRC predisposition SNPs close to BMP4 (rs1957636, P = 3.93×10(-10)) and BMP2 (rs4813802, P = 4.65×10(-11)). Near GREM1, we found using fine-mapping that the previously-identified association between tagSNP rs4779584 and CRC actually resulted from two independent signals represented by rs16969681 (P = 5.33×10(-8)) and rs11632715 (P = 2.30×10(-10)). As low-penetrance predisposition variants become harder to identify-owing to small effect sizes and/or low risk allele frequencies-approaches based on informed candidate gene selection may become increasingly attractive. Our data emphasise that genetic fine-mapping studies can deconvolute associations that have arisen owing to independent correlation of a tagSNP with more than one functional SNP, thus explaining some of the apparently missing heritability of common diseases.


Subject(s)
Bone Morphogenetic Protein 2/genetics , Bone Morphogenetic Protein 4/genetics , Colorectal Neoplasms/genetics , Genetic Predisposition to Disease , Intercellular Signaling Peptides and Proteins/genetics , Aged , Bone Morphogenetic Protein 2/metabolism , Bone Morphogenetic Protein 4/metabolism , Case-Control Studies , Colorectal Neoplasms/metabolism , Gene Frequency , Genetic Variation , Genome-Wide Association Study , Humans , Male , Middle Aged , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable , Signal Transduction
16.
G3 (Bethesda) ; 1(6): 457-70, 2011 Nov.
Article in English | MEDLINE | ID: mdl-22384356

ABSTRACT

Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package.

17.
Nat Rev Genet ; 11(7): 499-511, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20517342

ABSTRACT

In the past few years genome-wide association (GWA) studies have uncovered a large number of convincingly replicated associations for many complex human diseases. Genotype imputation has been used widely in the analysis of GWA studies to boost power, fine-map associations and facilitate the combination of results across studies using meta-analysis. This Review describes the details of several different statistical methods for imputing genotypes, illustrates and discusses the factors that influence imputation performance, and reviews methods that can be used to assess imputation performance and test association at imputed SNPs.


Subject(s)
Biostatistics/methods , Genome-Wide Association Study , Genotype , Models, Genetic , Polymorphism, Single Nucleotide
18.
PLoS Genet ; 5(6): e1000529, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19543373

ABSTRACT

Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%-20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.


Subject(s)
Genome-Wide Association Study/methods , Genetics, Population , Genotype , Humans , Polymorphism, Single Nucleotide , Software
19.
Nat Genet ; 41(6): 657-65, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19465909

ABSTRACT

We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.


Subject(s)
Genome-Wide Association Study , Hemoglobin, Sickle/genetics , Malaria/genetics , Polymorphism, Single Nucleotide , Chromosome Mapping , Ethnicity/genetics , Gambia , Genetic Variation , Humans , Linkage Disequilibrium , Polymorphism, Genetic , Reference Values , Severity of Illness Index
SELECTION OF CITATIONS
SEARCH DETAIL
...