Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
2.
Nat Genet ; 53(2): 185-194, 2021 02.
Article in English | MEDLINE | ID: mdl-33462484

ABSTRACT

Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.


Subject(s)
Biomarkers/blood , Biomarkers/urine , HLA Antigens/genetics , Proteins/genetics , Biological Specimen Banks , Cardiovascular Diseases/genetics , Cardiovascular Diseases/metabolism , DNA Copy Number Variations , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Genetic Pleiotropy , Humans , Linkage Disequilibrium , Liver-Specific Organic Anion Transporter 1/genetics , Mendelian Randomization Analysis , Polymorphism, Single Nucleotide , Renal Insufficiency, Chronic , Serine Endopeptidases/genetics , United Kingdom
3.
JAMA ; 321(14): 1391-1399, 2019 04 09.
Article in English | MEDLINE | ID: mdl-30964529

ABSTRACT

Importance: Data sets linking comprehensive genomic profiling (CGP) to clinical outcomes may accelerate precision medicine. Objective: To assess whether a database that combines EHR-derived clinical data with CGP can identify and extend associations in non-small cell lung cancer (NSCLC). Design, Setting, and Participants: Clinical data from EHRs were linked with CGP results for 28 998 patients from 275 US oncology practices. Among 4064 patients with NSCLC, exploratory associations between tumor genomics and patient characteristics with clinical outcomes were conducted, with data obtained between January 1, 2011, and January 1, 2018. Exposures: Tumor CGP, including presence of a driver alteration (a pathogenic or likely pathogenic alteration in a gene shown to drive tumor growth); tumor mutation burden (TMB), defined as the number of mutations per megabase; and clinical characteristics gathered from EHRs. Main Outcomes and Measures: Overall survival (OS), time receiving therapy, maximal therapy response (as documented by the treating physician in the EHR), and clinical benefit rate (fraction of patients with stable disease, partial response, or complete response) to therapy. Results: Among 4064 patients with NSCLC (median age, 66.0 years; 51.9% female), 3183 (78.3%) had a history of smoking, 3153 (77.6%) had nonsquamous cancer, and 871 (21.4%) had an alteration in EGFR, ALK, or ROS1 (701 [17.2%] with EGFR, 128 [3.1%] with ALK, and 42 [1.0%] with ROS1 alterations). There were 1946 deaths in 7 years. For patients with a driver alteration, improved OS was observed among those treated with (n = 575) vs not treated with (n = 560) targeted therapies (median, 18.6 months [95% CI, 15.2-21.7] vs 11.4 months [95% CI, 9.7-12.5] from advanced diagnosis; P < .001). TMB (in mutations/Mb) was significantly higher among smokers vs nonsmokers (8.7 [IQR, 4.4-14.8] vs 2.6 [IQR, 1.7-5.2]; P < .001) and significantly lower among patients with vs without an alteration in EGFR (3.5 [IQR, 1.76-6.1] vs 7.8 [IQR, 3.5-13.9]; P < .001), ALK (2.1 [IQR, 0.9-4.0] vs 7.0 [IQR, 3.5-13.0]; P < .001), RET (4.6 [IQR, 1.7-8.7] vs 7.0 [IQR, 2.6-13.0]; P = .004), or ROS1 (4.0 [IQR, 1.2-9.6] vs 7.0 [IQR, 2.6-13.0]; P = .03). In patients treated with anti-PD-1/PD-L1 therapies (n = 1290, 31.7%), TMB of 20 or more was significantly associated with improved OS from therapy initiation (16.8 months [95% CI, 11.6-24.9] vs 8.5 months [95% CI, 7.6-9.7]; P < .001), longer time receiving therapy (7.8 months [95% CI, 5.5-11.1] vs 3.3 months [95% CI, 2.8-3.7]; P < .001), and increased clinical benefit rate (80.7% vs 56.7%; P < .001) vs TMB less than 20. Conclusions and Relevance: Among patients with NSCLC included in a longitudinal database of clinical data linked to CGP results from routine care, exploratory analyses replicated previously described associations between clinical and genomic characteristics, between driver mutations and response to targeted therapy, and between TMB and response to immunotherapy. These findings demonstrate the feasibility of creating a clinicogenomic database derived from routine clinical experience and provide support for further research and discovery evaluating this approach in oncology.


Subject(s)
Carcinoma, Non-Small-Cell Lung/genetics , Databases, Genetic , Electronic Health Records , Immunotherapy , Lung Neoplasms/genetics , Mutation , Aged , Biomarkers, Tumor/analysis , Carcinoma, Non-Small-Cell Lung/therapy , Datasets as Topic , Female , Gene Expression Profiling , Genomics , Genotype , Humans , Male , Medical Record Linkage , Middle Aged , Precision Medicine , Programmed Cell Death 1 Receptor/analysis
6.
JAMA ; 320(5): 469-477, 2018 08 07.
Article in English | MEDLINE | ID: mdl-30088010

ABSTRACT

Importance: Broad-based genomic sequencing is being used more frequently for patients with advanced non-small cell lung cancer (NSCLC). However, little is known about the association between broad-based genomic sequencing and treatment selection or survival among patients with advanced NSCLC in a community oncology setting. Objective: To compare clinical outcomes between patients with advanced NSCLC who received broad-based genomic sequencing vs a control group of patients who received routine testing for EGFR mutations and/or ALK rearrangements alone. Design, Setting, and Participants: Retrospective cohort study of patients with chart-confirmed advanced NSCLC between January 1, 2011, and July 31, 2016, and who received care at 1 of 191 oncology practices across the United States using the Flatiron Health Database. Patients were diagnosed with stage IIIB/IV or unresectable nonsquamous NSCLC who received at least 1 line of antineoplastic treatment. Exposures: Receipt of either broad-based genomic sequencing or routine testing (EGFR and/or ALK only). Broad-based genomic sequencing included any multigene panel sequencing assay examining more than 30 genes prior to third-line treatment. Main Outcomes and Measures: Primary outcomes were 12-month mortality and overall survival from the start of first-line treatment. Secondary outcomes included frequency of genetic alterations and treatments received. Results: Among 5688 individuals with advanced NSCLC (median age, 67 years [interquartile range, 41-85], 63.6% white, 80% with a history of smoking); 875 (15.4%) received broad-based genomic sequencing and 4813 (84.6%) received routine testing. Among patients who received broad-based genomic sequencing, 4.5% received targeted treatment based on testing results, 9.8% received routine EGFR/ALK targeted treatment, and 85.1% received no targeted treatment. Unadjusted mortality rates at 12 months were 49.2% for patients undergoing broad-based genomic sequencing and 35.9% for patients undergoing routine testing. Using an instrumental variable analysis, there was no significant association between broad-based genomic sequencing and 12-month mortality (predicted probability of death at 12 months, 41.1% for broad-based genomic sequencing vs 44.4% for routine testing; difference -3.6% [95% CI, -18.4% to 11.1%]; P = .63). The results were consistent in the propensity score-matched survival analysis (42.0% vs 45.1%; hazard ratio, 0.92 [95% CI, 0.73 to 1.11]; P = .40) vs unmatched cohort (hazard ratio, 0.69 [95% CI, 0.62 to 0.77]; log-rank P < .001). Conclusions and Relevance: Among patients with advanced non-small cell lung cancer receiving care in the community oncology setting, broad-based genomic sequencing directly informed treatment in a minority of patients and was not independently associated with better survival.


Subject(s)
Carcinoma, Non-Small-Cell Lung/genetics , Lung Neoplasms/genetics , Adult , Aged , Aged, 80 and over , Anaplastic Lymphoma Kinase , Antineoplastic Agents/therapeutic use , Carcinoma, Non-Small-Cell Lung/mortality , Carcinoma, Non-Small-Cell Lung/therapy , DNA, Neoplasm/analysis , Female , Genes, erbB-1 , Genomics , Genotype , Humans , Immunotherapy , Lung Neoplasms/mortality , Lung Neoplasms/therapy , Male , Middle Aged , Mutation , Neoplasm Staging , Receptor Protein-Tyrosine Kinases/genetics , Retrospective Studies , Sequence Analysis, DNA , Survival Analysis
7.
Health Aff (Millwood) ; 37(5): 765-772, 2018 05.
Article in English | MEDLINE | ID: mdl-29733723

ABSTRACT

The majority of US adult cancer patients today are diagnosed and treated outside the context of any clinical trial (that is, in the real world). Although these patients are not part of a research study, their clinical data are still recorded. Indeed, data captured in electronic health records form an ever-growing, rich digital repository of longitudinal patient experiences, treatments, and outcomes. Likewise, genomic data from tumor molecular profiling are increasingly guiding oncology care. Linking real-world clinical and genomic data, as well as information from other co-occurring data sets, could create study populations that provide generalizable evidence for precision medicine interventions. However, the infrastructure required to link, ensure quality, and rapidly learn from such composite data is complex. We outline the challenges and describe a novel approach to building a real-world clinico-genomic database of patients with cancer. This work represents a case study in how data collected during routine patient care can inform precision medicine efforts for the population at large. We suggest that health policies can promote innovation by defining appropriate uses of real-world evidence, establishing data standards, and incentivizing data sharing.


Subject(s)
Carcinoma, Non-Small-Cell Lung/drug therapy , Carcinoma, Non-Small-Cell Lung/genetics , Information Dissemination , Lung Neoplasms/drug therapy , Lung Neoplasms/genetics , Proto-Oncogene Proteins p21(ras)/genetics , Electronic Health Records , Female , Follow-Up Studies , Genomics , Humans , Molecular Targeted Therapy , Prognosis , Protein Kinase Inhibitors/therapeutic use , Proto-Oncogene Proteins p21(ras)/drug effects , Sirolimus/analogs & derivatives , Sirolimus/therapeutic use , Treatment Outcome
9.
Sci Data ; 4: 170179, 2017 12 19.
Article in English | MEDLINE | ID: mdl-29257133

ABSTRACT

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Genetic Variation , Humans , White People
10.
Diabetes ; 66(11): 2903-2914, 2017 11.
Article in English | MEDLINE | ID: mdl-28838971

ABSTRACT

Type 2 diabetes (T2D) affects more than 415 million people worldwide, and its costs to the health care system continue to rise. To identify common or rare genetic variation with potential therapeutic implications for T2D, we analyzed and replicated genome-wide protein coding variation in a total of 8,227 individuals with T2D and 12,966 individuals without T2D of Latino descent. We identified a novel genetic variant in the IGF2 gene associated with ∼20% reduced risk for T2D. This variant, which has an allele frequency of 17% in the Mexican population but is rare in Europe, prevents splicing between IGF2 exons 1 and 2. We show in vitro and in human liver and adipose tissue that the variant is associated with a specific, allele-dosage-dependent reduction in the expression of IGF2 isoform 2. In individuals who do not carry the protective allele, expression of IGF2 isoform 2 in adipose is positively correlated with both incidence of T2D and increased plasma glycated hemoglobin in individuals without T2D, providing support that the protective effects are mediated by reductions in IGF2 isoform 2. Broad phenotypic examination of carriers of the protective variant revealed no association with other disease states or impaired reproductive health. These findings suggest that reducing IGF2 isoform 2 expression in relevant tissues has potential as a new therapeutic strategy for T2D, even beyond the Latin American population, with no major adverse effects on health or reproduction.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Insulin-Like Growth Factor II/metabolism , RNA Splice Sites/genetics , Adipose Tissue , Cell Line , Gene Expression Regulation/physiology , Genetic Variation , Genotype , Humans , Insulin-Like Growth Factor II/genetics , Liver , Mexican Americans/genetics , Mexico , Protein Isoforms , Stem Cells , White People
11.
Nature ; 536(7614): 41-47, 2016 08 04.
Article in English | MEDLINE | ID: mdl-27398621

ABSTRACT

The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Genetic Predisposition to Disease/genetics , Genetic Variation/genetics , Alleles , DNA Mutational Analysis , Europe/ethnology , Exome , Genome-Wide Association Study , Genotyping Techniques , Humans , Sample Size
12.
PLoS Genet ; 11(4): e1005165, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25906071

ABSTRACT

Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α = 2.5 × 10(-6)) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.


Subject(s)
Genetic Diseases, Inborn , Genetic Variation , Genome-Wide Association Study , Models, Theoretical , Alleles , Computer Simulation , Diabetes Mellitus, Type 2/genetics , Exome/genetics , Genetic Predisposition to Disease , Humans , Linkage Disequilibrium , Phenotype
13.
Am J Hum Genet ; 94(5): 710-20, 2014 May 01.
Article in English | MEDLINE | ID: mdl-24768551

ABSTRACT

Finnish samples have been extensively utilized in studying single-gene disorders, where the founder effect has clearly aided in discovery, and more recently in genome-wide association studies of complex traits, where the founder effect has had less obvious impacts. As the field starts to explore rare variants' contribution to polygenic traits, it is of great importance to characterize and confirm the Finnish founder effect in sequencing data and to assess its implications for rare-variant association studies. Here, we employ forward simulation, guided by empirical deep resequencing data, to model the genetic architecture of quantitative polygenic traits in both the general European and the Finnish populations simultaneously. We demonstrate that power of rare-variant association tests is higher in the Finnish population, especially when variants' phenotypic effects are tightly coupled with fitness effects and therefore reflect a greater contribution of rarer variants. SKAT-O, variable-threshold tests, and single-variant tests are more powerful than other rare-variant methods in the Finnish population across a range of genetic models. We also compare the relative power and efficiency of exome array genotyping to those of high-coverage exome sequencing. At a fixed cost, less expensive genotyping strategies have far greater power than sequencing; in a fixed number of samples, however, genotyping arrays miss a substantial portion of genetic signals detected in sequencing, even in the Finnish founder population. As genetic studies probe sequence variation at greater depth in more diverse populations, our simulation approach provides a framework for evaluating various study designs for gene discovery.


Subject(s)
Computer Simulation , Founder Effect , Models, Genetic , Population/genetics , White People/genetics , Diabetes Mellitus, Type 2/genetics , Exome/genetics , Finland , Humans , Multifactorial Inheritance/genetics
14.
Nat Genet ; 45(12): 1418-27, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24141362

ABSTRACT

The genetic architecture of human diseases governs the success of genetic mapping and the future of personalized medicine. Although numerous studies have queried the genetic basis of common disease, contradictory hypotheses have been advocated about features of genetic architecture (for example, the contribution of rare versus common variants). We developed an integrated simulation framework, calibrated to empirical data, to enable the systematic evaluation of such hypotheses. For type 2 diabetes (T2D), two simple parameters--(i) the target size for causal mutation and (ii) the coupling between selection and phenotypic effect--define a broad space of architectures. Whereas extreme models are excluded by the combination of epidemiology, linkage and genome-wide association studies, many models remain consistent, including those where rare variants explain either little (<25%) or most (>80%) of T2D heritability. Ongoing sequencing and genotyping studies will further constrain the space of possible architectures, but very large samples (for example, >250,000 unselected individuals) will be required to localize most of the heritability underlying T2D and other traits characterized by these models.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Disease/genetics , Empirical Research , Multifactorial Inheritance , Computer Simulation , Genetic Linkage , Genetic Predisposition to Disease , Genetic Variation , Genome-Wide Association Study , Humans , Models, Genetic
15.
Nat Genet ; 45(11): 1380-5, 2013 Nov.
Article in English | MEDLINE | ID: mdl-24097065

ABSTRACT

Genome sequencing can identify individuals in the general population who harbor rare coding variants in genes for Mendelian disorders and who may consequently have increased disease risk. Previous studies of rare variants in phenotypically extreme individuals display ascertainment bias and may demonstrate inflated effect-size estimates. We sequenced seven genes for maturity-onset diabetes of the young (MODY) in well-phenotyped population samples (n = 4,003). We filtered rare variants according to two prediction criteria for disease-causing mutations: reported previously in MODY or satisfying stringent de novo thresholds (rare, conserved and protein damaging). Approximately 1.5% and 0.5% of randomly selected individuals from the Framingham and Jackson Heart Studies, respectively, carry variants from these two classes. However, the vast majority of carriers remain euglycemic through middle age. Accurate estimates of variant effect sizes from population-based sequencing are needed to avoid falsely predicting a substantial fraction of individuals as being at risk for MODY or other Mendelian diseases.


Subject(s)
Diabetes Mellitus, Type 2/genetics , Genetic Variation , Adult , Base Sequence , Basic Helix-Loop-Helix Transcription Factors/genetics , Chromosome Mapping , Female , Genetic Predisposition to Disease , Germinal Center Kinases , Hepatocyte Nuclear Factor 1-alpha/genetics , Hepatocyte Nuclear Factor 1-beta/genetics , Hepatocyte Nuclear Factor 4/genetics , Homeodomain Proteins/genetics , Humans , Male , Middle Aged , Phenotype , Protein Serine-Threonine Kinases/genetics , Risk , Sequence Analysis, DNA , Trans-Activators/genetics , Young Adult
16.
Nat Protoc ; 8(11): 2281-2308, 2013 Nov.
Article in English | MEDLINE | ID: mdl-24157548

ABSTRACT

Targeted nucleases are powerful tools for mediating genome alteration with high precision. The RNA-guided Cas9 nuclease from the microbial clustered regularly interspaced short palindromic repeats (CRISPR) adaptive immune system can be used to facilitate efficient genome engineering in eukaryotic cells by simply specifying a 20-nt targeting sequence within its guide RNA. Here we describe a set of tools for Cas9-mediated genome editing via nonhomologous end joining (NHEJ) or homology-directed repair (HDR) in mammalian cells, as well as generation of modified cell lines for downstream functional studies. To minimize off-target cleavage, we further describe a double-nicking strategy using the Cas9 nickase mutant with paired guide RNAs. This protocol provides experimentally derived guidelines for the selection of target sites, evaluation of cleavage efficiency and analysis of off-target activity. Beginning with target design, gene modifications can be achieved within as little as 1-2 weeks, and modified clonal cell lines can be derived within 2-3 weeks.


Subject(s)
Clustered Regularly Interspaced Short Palindromic Repeats , Genetic Engineering/methods , Genome , Base Sequence , Cell Culture Techniques , Cell Line , DNA End-Joining Repair , DNA Mutational Analysis , DNA Repair , Deoxyribonucleases/chemistry , Deoxyribonucleases/genetics , Genotyping Techniques , HEK293 Cells , Humans , Molecular Sequence Data , Mutagenesis , Transfection
17.
Nat Biotechnol ; 31(9): 827-32, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23873081

ABSTRACT

The Streptococcus pyogenes Cas9 (SpCas9) nuclease can be efficiently targeted to genomic loci by means of single-guide RNAs (sgRNAs) to enable genome editing. Here, we characterize SpCas9 targeting specificity in human cells to inform the selection of target sites and avoid off-target effects. Our study evaluates >700 guide RNA variants and SpCas9-induced indel mutation levels at >100 predicted genomic off-target loci in 293T and 293FT cells. We find that SpCas9 tolerates mismatches between guide RNA and target DNA at different positions in a sequence-dependent manner, sensitive to the number, position and distribution of mismatches. We also show that SpCas9-mediated cleavage is unaffected by DNA methylation and that the dosage of SpCas9 and sgRNA can be titrated to minimize off-target modification. To facilitate mammalian genome engineering applications, we provide a web-based software tool to guide the selection and validation of target sequences as well as off-target analyses.


Subject(s)
DNA/genetics , Deoxyribonucleases/genetics , Genetic Engineering/methods , Transcription Factors/genetics , Bacterial Proteins/genetics , Base Pair Mismatch , Base Sequence , Molecular Sequence Data , Streptococcus pyogenes/genetics , RNA, Small Untranslated
18.
PLoS One ; 7(9): e44196, 2012.
Article in English | MEDLINE | ID: mdl-23028501

ABSTRACT

Chromosomal translocations are frequent features of cancer genomes that contribute to disease progression. These rearrangements result from formation and illegitimate repair of DNA double-strand breaks (DSBs), a process that requires spatial colocalization of chromosomal breakpoints. The "contact first" hypothesis suggests that translocation partners colocalize in the nuclei of normal cells, prior to rearrangement. It is unclear, however, the extent to which spatial interactions based on three-dimensional genome architecture contribute to chromosomal rearrangements in human disease. Here we intersect Hi-C maps of three-dimensional chromosome conformation with collections of 1,533 chromosomal translocations from cancer and germline genomes. We show that many translocation-prone pairs of regions genome-wide, including the cancer translocation partners BCR-ABL and MYC-IGH, display elevated Hi-C contact frequencies in normal human cells. Considering tissue specificity, we find that translocation breakpoints reported in human hematologic malignancies have higher Hi-C contact frequencies in lymphoid cells than those reported in sarcomas and epithelial tumors. However, translocations from multiple tissue types show significant correlation with Hi-C contact frequencies, suggesting that both tissue-specific and universal features of chromatin structure contribute to chromosomal alterations. Our results demonstrate that three-dimensional genome architecture shapes the landscape of rearrangements directly observed in human disease and establish Hi-C as a key method for dissecting these effects.


Subject(s)
Computational Biology/methods , Genome, Human , Translocation, Genetic , Chromosome Breakpoints , Chromosomes/chemistry , Chromosomes/genetics , Humans , Neoplasms/genetics , Organ Specificity/genetics
19.
Nat Genet ; 43(8): 801-5, 2011 Jul 24.
Article in English | MEDLINE | ID: mdl-21775993

ABSTRACT

Noncoding variants at human chromosome 9p21 near CDKN2A and CDKN2B are associated with type 2 diabetes, myocardial infarction, aneurysm, vertical cup disc ratio and at least five cancers. Here we compare approaches to more comprehensively assess genetic variation in the region. We carried out targeted sequencing at high coverage in 47 individuals and compared the results to pilot data from the 1000 Genomes Project. We imputed variants into type 2 diabetes and myocardial infarction cohorts directly from targeted sequencing, from a genotyped reference panel derived from sequencing and from 1000 Genomes Project low-coverage data. Polymorphisms with frequency >5% were captured well by all strategies. Imputation of intermediate-frequency polymorphisms required a higher density of tag SNPs in disease samples than is available on first-generation genome-wide association study (GWAS) arrays. Our association analyses identified more comprehensive sets of variants showing equivalent statistical association with type 2 diabetes or myocardial infarction, but did not identify stronger associations than the original GWAS signals.


Subject(s)
Chromosome Mapping , Chromosomes, Human, Pair 9/genetics , Diabetes Mellitus, Type 2/genetics , Genetic Variation/genetics , Genome-Wide Association Study , Myocardial Infarction/genetics , Polymorphism, Single Nucleotide/genetics , Genome, Human , Haplotypes/genetics , Humans
20.
Am J Hum Genet ; 88(2): 183-92, 2011 Feb 11.
Article in English | MEDLINE | ID: mdl-21310275

ABSTRACT

Assessing the significance of novel genetic variants revealed by DNA sequencing is a major challenge to the integration of genomic techniques with medical practice. Many variants remain difficult to classify by traditional genetic methods. Computational methods have been developed that could contribute to classifying these variants, but they have not been properly validated and are generally not considered mature enough to be used effectively in a clinical setting. We developed a computational method for predicting the effects of missense variants detected in patients with hypertrophic cardiomyopathy (HCM). We used a curated clinical data set of 74 missense variants in six genes associated with HCM to train and validate an automated predictor. The predictor is based on support vector regression and uses phylogenetic and structural features specific to genes involved in HCM. Ten-fold cross validation estimated our predictor's sensitivity at 94% (95% confidence interval: 83%-98%) and specificity at 89% (95% confidence interval: 72%-100%). This corresponds to an odds ratio of 10 for a prediction of pathogenic (95% confidence interval: 4.0-infinity), or an odds ratio of 9.9 for a prediction of benign (95% confidence interval: 4.6-21). Coverage (proportion of variants for which a prediction was made) was 57% (95% confidence interval: 49%-64%). This performance exceeds that of existing methods that are not specifically designed for HCM. The accuracy of this predictor provides support for the clinical use of automated predictions alongside family segregation and population frequency data in the interpretation of new missense variants and suggests future development of similar tools for other diseases.


Subject(s)
Cardiomyopathy, Hypertrophic/genetics , Computational Biology , Genetic Variation/genetics , Mutation, Missense/genetics , Nuclear Proteins/genetics , Genetic Predisposition to Disease , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...