Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Cancer Epidemiol Biomarkers Prev ; 29(2): 427-433, 2020 02.
Article in English | MEDLINE | ID: mdl-31748258

ABSTRACT

BACKGROUND: Obesity is a major risk factor for esophageal adenocarcinoma (EA) and its precursor Barrett's esophagus (BE). Research suggests that individuals with high genetic risk to obesity have a higher BE/EA risk. To facilitate understanding of biological factors that lead to progression from BE to EA, the present study investigated the shared genetic background of BE/EA and obesity-related traits. METHODS: Cross-trait linkage disequilibrium score regression was applied to summary statistics from genome-wide association meta-analyses on BE/EA and on obesity traits. Body mass index (BMI) was used as a proxy for general obesity, and waist-to-hip ratio (WHR) for abdominal obesity. For single marker analyses, all genome-wide significant risk alleles for BMI and WHR were compared with summary statistics of the BE/EA meta-analyses. RESULTS: Sex-combined analyses revealed a significant genetic correlation between BMI and BE/EA (rg = 0.13, P = 2 × 10-04) and a rg of 0.12 between WHR and BE/EA (P = 1 × 10-02). Sex-specific analyses revealed a pronounced genetic correlation between BMI and EA in females (rg = 0.17, P = 1.2 × 10-03), and WHR and EA in males (rg = 0.18, P = 1.51 × 10-02). On the single marker level, significant enrichment of concordant effects was observed for BMI and BE/EA risk variants (P = 8.45 × 10-03) and WHR and BE/EA risk variants (P = 2 × 10-02). CONCLUSIONS: Our study provides evidence for sex-specific genetic correlations that might reflect specific biological mecha-nisms. The data demonstrate that shared genetic factors are particularly relevant in progression from BE to EA. IMPACT: Our study quantifies the genetic correlation between BE/EA and obesity. Further research is now warranted to elucidate these effects and to understand the shared pathophysiology.


Subject(s)
Adenocarcinoma/genetics , Barrett Esophagus/genetics , Esophageal Neoplasms/genetics , Obesity/genetics , Quantitative Trait Loci , Adenocarcinoma/pathology , Barrett Esophagus/pathology , Body Mass Index , Disease Progression , Esophageal Neoplasms/pathology , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Linkage Disequilibrium , Male , Meta-Analysis as Topic , Polymorphism, Single Nucleotide , Risk Assessment , Risk Factors , Sex Factors , Waist-Hip Ratio
2.
PLoS One ; 13(10): e0205895, 2018.
Article in English | MEDLINE | ID: mdl-30379966

ABSTRACT

Bipolar disorder (BD) is a major psychiatric illness affecting around 1% of the global population. BD is characterized by recurrent manic and depressive episodes, and has an estimated heritability of around 70%. Research has identified the first BD susceptibility genes. However, the underlying pathways and regulatory networks remain largely unknown. Research suggests that the cumulative impact of common alleles with small effects explains only around 25-38% of the phenotypic variance for BD. A plausible hypothesis therefore is that rare, high penetrance variants may contribute to BD risk. The present study investigated the role of rare, nonsynonymous, and potentially functional variants via whole exome sequencing in 15 BD cases from two large, multiply affected families from Cuba. The high prevalence of BD in these pedigrees renders them promising in terms of the identification of genetic risk variants with large effect sizes. In addition, SNP array data were used to calculate polygenic risk scores for affected and unaffected family members. After correction for multiple testing, no significant increase in polygenic risk scores for common, BD-associated genetic variants was found in BD cases compared to healthy relatives. Exome sequencing identified a total of 17 rare and potentially damaging variants in 17 genes. The identified variants were shared by all investigated BD cases in the respective pedigree. The most promising variant was located in the gene SERPING1 (p.L349F), which has been reported previously as a genome-wide significant risk gene for schizophrenia. The present data suggest novel candidate genes for BD susceptibility, and may facilitate the discovery of disease-relevant pathways and regulatory networks.


Subject(s)
Bipolar Disorder/genetics , Complement C1 Inhibitor Protein/genetics , Exome , Gene Regulatory Networks , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide , Alleles , Bipolar Disorder/diagnosis , Bipolar Disorder/physiopathology , Cuba , Family , Female , Gene Expression , Genome-Wide Association Study , Humans , Male , Pedigree , Penetrance , Risk , Exome Sequencing
3.
Am J Respir Cell Mol Biol ; 59(5): 614-622, 2018 11.
Article in English | MEDLINE | ID: mdl-29949718

ABSTRACT

Genome-wide association studies have identified common variants associated with chronic obstructive pulmonary disease (COPD). Whole-genome sequencing (WGS) offers comprehensive coverage of the entire genome, as compared with genotyping arrays or exome sequencing. We hypothesized that WGS in subjects with severe COPD and smoking control subjects with normal pulmonary function would allow us to identify novel genetic determinants of COPD. We sequenced 821 patients with severe COPD and 973 control subjects from the COPDGene and Boston Early-Onset COPD studies, including both non-Hispanic white and African American individuals. We performed single-variant and grouped-variant analyses, and in addition, we assessed the overlap of variants between sequencing- and array-based imputation. Our most significantly associated variant was in a known region near HHIP (combined P = 1.6 × 10-9); additional variants approaching genome-wide significance included previously described regions in CHRNA5, TNS1, and SERPINA6/SERPINA1 (the latter in African American individuals). None of our associations were clearly driven by rare variants, and we found minimal evidence of replication of genes identified by previously reported smaller sequencing studies. With WGS, we identified more than 20 million new variants, not seen with imputation, including more than 10,000 of potential importance in previously identified COPD genome-wide association study regions. WGS in severe COPD identifies a large number of potentially important functional variants, with the strongest associations being in known COPD risk loci, including HHIP and SERPINA1. Larger sample sizes will be needed to identify associated variants in novel regions of the genome.


Subject(s)
Genome-Wide Association Study , Lung/metabolism , Polymorphism, Single Nucleotide , Pulmonary Disease, Chronic Obstructive/genetics , Severity of Illness Index , Whole Genome Sequencing/methods , Black or African American/statistics & numerical data , Aged , Case-Control Studies , Cohort Studies , Female , Genetic Predisposition to Disease , Humans , Lung/pathology , Male , Middle Aged , Pulmonary Disease, Chronic Obstructive/ethnology , White People/statistics & numerical data
4.
Biostatistics ; 19(3): 295-306, 2018 07 01.
Article in English | MEDLINE | ID: mdl-28968646

ABSTRACT

To quantify polygenic effects, i.e. undetected genetic effects, in large-scale association studies, we propose a generalized estimating equation (GEE) based estimation framework. We develop a marginal model for single-variant association test statistics of complex diseases that generalizes existing approaches such as LD Score regression and that is applicable to population-based designs, to family-based designs or to arbitrary combinations of both. We extend the standard GEE approach so that the parameters of the proposed marginal model can be estimated based on working-correlation/linkage-disequilibrium (LD) matrices from external reference panels. Our method achieves substantial efficiency gains over standard approaches, while it is robust against misspecification of the LD structure, i.e. the LD structure of the reference panel can differ substantially from the true LD structure in the study population. In simulation studies and in applications to population-based and family-based studies, we illustrate the features of the proposed GEE framework. Our results suggest that our approach can be up to 100% more efficient than existing methodology.


Subject(s)
Biostatistics/methods , Genome-Wide Association Study/methods , Linkage Disequilibrium , Models, Statistical , Computer Simulation , Humans , Mental Disorders/genetics , Regression Analysis
5.
Genet Epidemiol ; 42(1): 123-126, 2018 02.
Article in English | MEDLINE | ID: mdl-29159827

ABSTRACT

For family-based association studies, Horvath et al. proposed an algorithm for the association analysis between haplotypes and arbitrary phenotypes when the phase of the haplotypes is unknown, that is, genotype data is given. Their approach to haplotype analysis maintains the original features of the TDT/FBAT-approach, that is, complete robustness against genetic confounding and misspecification of the phenotype. The algorithm has been implemented in the FBAT and PBAT software package and has been used in numerous substantive manuscripts. Here, we propose a simplification of the original algorithm that maintains the original approach but reduces the computational burden of the approach substantially and gives valuable insights regarding the conditional distribution. With the modified algorithm, the application to whole-genome sequencing (WGS) studies becomes feasible; for example, in sliding window approaches or spatial-clustering approaches. The reduction of the computational burden that our modification provides is especially dramatic when both parental genotypes are missing. For example, for eight variants and 441 nuclear families with mostly offspring-only families, in a WGS study at the APOE locus, the running time decreased from approximately 21 hr for the original algorithm to 0.11 sec after our modification.


Subject(s)
Algorithms , Haplotypes , Nuclear Family , Phenotype , Apolipoproteins E/genetics , Cluster Analysis , Female , Humans , Male , Models, Genetic , Time Factors , Whole Genome Sequencing
6.
Immunogenetics ; 69(6): 359-369, 2017 06.
Article in English | MEDLINE | ID: mdl-28386644

ABSTRACT

Mast cell activation syndrome (MCAS) and systemic mastocytosis (SM) are two clinical systemic mast cell activation disease variants. Few studies to date have investigated the genetic basis of MCAS. The present study had two aims. First, to investigate whether peripheral blood leukocytes from MCAS patients also harbor somatic mutations in genes implicated in SM using next-generation sequencing (NGS) technology and a relatively large MCAS cohort. We also addressed the question, whether some of the previously as somatic reported mutations are indeed germline mutations. Second, to identify germline mutations of relevance to MCAS pathogenesis. Here, mutation frequency in the present MCAS cohort was compared to that in public- and in-house databases in the case of frequent variants, and co-segregation was investigated in multiply affected families in the case of rare variants (allele frequency < 1%). MCAS diagnoses were assigned according to current criteria. Twenty five candidate genes were selected on the basis of published findings for SM. NGS was performed using a 76kbp custom designed Agilent SureSelect Target Enrichment and an Illumina Hiseq2000 2x100bp sequencing run. NGS revealed 67 germline mutations. No somatic mutations were detected. None of the germline mutations showed unequivocal association with MCAS. Failure to detect somatic mutations was probably attributable to the dilution of mutated mast cell DNA in normal leukocyte DNA. The present exploratory association findings suggest that some of the detected germline mutations may be functionally relevant and explain familial aggregation. Independent replication studies are therefore warranted.


Subject(s)
Leukocytes/metabolism , Mastocytosis/genetics , Mutation , Adolescent , Adult , Aged , Aged, 80 and over , Alleles , Amino Acid Substitution , Biomarkers , DNA Mutational Analysis , Female , Gene Frequency , Genome-Wide Association Study , Genomics/methods , Germ-Line Mutation , High-Throughput Nucleotide Sequencing , Humans , Male , Mastocytosis/diagnosis , Middle Aged , Pedigree , Phenotype , Polymorphism, Single Nucleotide , Syndrome , Young Adult
7.
Nat Commun ; 8: 14694, 2017 03 08.
Article in English | MEDLINE | ID: mdl-28272467

ABSTRACT

Male-pattern baldness (MPB) is a common and highly heritable trait characterized by androgen-dependent, progressive hair loss from the scalp. Here, we carry out the largest GWAS meta-analysis of MPB to date, comprising 10,846 early-onset cases and 11,672 controls from eight independent cohorts. We identify 63 MPB-associated loci (P<5 × 10-8, METAL) of which 23 have not been reported previously. The 63 loci explain ∼39% of the phenotypic variance in MPB and highlight several plausible candidate genes (FGF5, IRF4, DKK2) and pathways (melatonin signalling, adipogenesis) that are likely to be implicated in the key-pathophysiological features of MPB and may represent promising targets for the development of novel therapeutic options. The data provide molecular evidence that rather than being an isolated trait, MPB shares a substantial biological basis with numerous other human phenotypes and may deserve evaluation as an early prognostic marker, for example, for prostate cancer, sudden cardiac arrest and neurodegenerative disorders.


Subject(s)
Alopecia/genetics , 3-Oxo-5-alpha-Steroid 4-Dehydrogenase/genetics , Adipogenesis/genetics , Case-Control Studies , Fibroblast Growth Factor 5/genetics , Genetic Association Studies , Genome-Wide Association Study , Genotype , Humans , Intercellular Signaling Peptides and Proteins/genetics , Interferon Regulatory Factors/genetics , Male , Melatonin , Membrane Proteins/genetics , Phenotype , Signal Transduction/genetics , Trans-Activators/genetics
8.
Twin Res Hum Genet ; 20(3): 257-259, 2017 06.
Article in English | MEDLINE | ID: mdl-28345502

ABSTRACT

VEGAS (versatile gene-based association study) is a popular methodological framework to perform gene-based tests based on summary statistics from single-variant analyses. The approach incorporates linkage disequilibrium information from reference panels to account for the correlation of test statistics. The gene-based test can utilize three different types of tests. In 2015, the improved framework VEGAS2, using more detailed reference panels, was published. Both versions provide user-friendly web- and offline-based tools for the analysis. However, the implementation of the popular top-percentage test is erroneous in both versions. The p values provided by VEGAS2 are deflated/anti-conservative. Based on real data examples, we demonstrate that this can increase substantially the rate of false-positive findings and can lead to inconsistencies between different test options. We also provide code that allows the user of VEGAS to compute correct p values.


Subject(s)
Genetic Association Studies/statistics & numerical data , Linkage Disequilibrium/genetics , Polymorphism, Single Nucleotide/genetics , Data Interpretation, Statistical , Humans
9.
Bioinformatics ; 33(13): 1972-1979, 2017 Jul 01.
Article in English | MEDLINE | ID: mdl-28334167

ABSTRACT

MOTIVATION: In order to minimize the effects of genetic confounding on the analysis of high-throughput genetic association studies, e.g. (whole-genome) sequencing (WGS) studies, genome-wide association studies (GWAS), etc., we propose a general framework to assess and to test formally for genetic heterogeneity among study subjects. As the approach fully utilizes the recent ancestor information captured by rare variants, it is especially powerful in WGS studies. Even for relatively moderate sample sizes, the proposed testing framework is able to identify study subjects that are genetically too similar, e.g. cryptic relationships, or that are genetically too different, e.g. population substructure. The approach is computationally fast, enabling the application to whole-genome sequencing data, and straightforward to implement. RESULTS: Simulation studies illustrate the overall performance of our approach. In an application to the 1000 Genomes Project, we outline an analysis/cleaning pipeline that utilizes our approach to formally assess whether study subjects are related and whether population substructure is present. In the analysis of the 1000 Genomes Project data, our approach revealed subjects that are most likely related, but had previously passed standard qc-filters. AVAILABILITY AND IMPLEMENTATION: An implementation of our method, Similarity Test for Estimating Genetic Outliers (STEGO), is available in the R package stego from Github at https://github.com/dschlauch/stego . CONTACT: dschlauch@fas.harvard.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genetic Association Studies/methods , Genome, Human , Software , Whole Genome Sequencing/methods , Genetics, Population , Humans
10.
Genet Epidemiol ; 41(4): 332-340, 2017 05.
Article in English | MEDLINE | ID: mdl-28318110

ABSTRACT

For the association analysis of whole-genome sequencing (WGS) studies, we propose an efficient and fast spatial-clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed-window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease-associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R-implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window-based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by an application to Alzheimer's disease WGS data. Our analysis identifies a region in the ITGB3 gene that potentially harbors disease susceptibility loci for Alzheimer's disease. The region-based association signal of ITGB3 replicates in an independent data set and achieves formally genome-wide significance. Software Implementation: An implementation of the algorithm in R is available at: https://github.com/heidefier/cluster_wgs_data.


Subject(s)
Genome-Wide Association Study , Genome , Sequence Analysis, DNA , Algorithms , Alzheimer Disease/genetics , Cluster Analysis , Computer Simulation , Genomics , Humans , Models, Genetic , Software
11.
Alcohol Clin Exp Res ; 40(8): 1627-32, 2016 08.
Article in English | MEDLINE | ID: mdl-27374936

ABSTRACT

BACKGROUND: Common variants in the gene GATA binding protein 4 (GATA4) show association with alcohol dependence (AD). The aim of this study was to identify rare variants in GATA4 in order to elucidate the role of this gene in AD susceptibility. Identification of rare variants may provide a more complete picture of the allelic architecture at this risk locus. METHODS: Sanger sequencing of all 6 coding exons of GATA4 was performed in 528 patients and 517 controls. Four in silico prediction tools were used to determine the effect of a DNA variant on the amino acid sequence and protein function. Five variants were included in the replication step. Of these, 4 were successfully genotyped in our replication cohort of 655 patients and 1,501 controls. All patients fulfilled DSM-IV criteria for AD, and all individuals were of German descent. RESULTS: In the discovery step, 19 different heterozygous variants were identified. Four patient-specific and potentially functionally relevant variants were followed up. Only the variant S379S (c.1137C>T) remained patient specific (1/1,166 patients vs. 0/1,997 controls). None of the variants showed a statistically significant association with AD. CONCLUSIONS: The present study elucidated the role of GATA4 in AD susceptibility by identifying rare variants via Sanger sequencing and subsequent replication. Although novel patient-specific rare variants of GATA4 were identified, none received support in the independent replication step. However, given previous robust findings of association with common variants, GATA4 remains a promising candidate gene for AD.


Subject(s)
Alcoholism/diagnosis , Alcoholism/genetics , GATA4 Transcription Factor/genetics , Genetic Association Studies/methods , Genetic Variation/genetics , Adult , Cohort Studies , Female , Follow-Up Studies , Humans , Male , Middle Aged
12.
Bioinformatics ; 32(9): 1366-72, 2016 05 01.
Article in English | MEDLINE | ID: mdl-26722118

ABSTRACT

MOTIVATION: Population stratification is one of the major sources of confounding in genetic association studies, potentially causing false-positive and false-negative results. Here, we present a novel approach for the identification of population substructure in high-density genotyping data/next generation sequencing data. The approach exploits the co-appearances of rare genetic variants in individuals. The method can be applied to all available genetic loci and is computationally fast. Using sequencing data from the 1000 Genomes Project, the features of the approach are illustrated and compared to existing methodology (i.e. EIGENSTRAT). We examine the effects of different cutoffs for the minor allele frequency on the performance of the approach. We find that our approach works particularly well for genetic loci with very small minor allele frequencies. The results suggest that the inclusion of rare-variant data/sequencing data in our approach provides a much higher resolution picture of population substructure than it can be obtained with existing methodology. Furthermore, in simulation studies, we find scenarios where our method was able to control the type 1 error more precisely and showed higher power. CONTACT: dmitry.prokopenko@uni-bonn.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Animals , Computer Simulation , Gene Frequency , Genetic Association Studies , Genetic Variation , Genotype , High-Throughput Nucleotide Sequencing , Humans
13.
PLoS One ; 10(12): e0145152, 2015.
Article in English | MEDLINE | ID: mdl-26716445

ABSTRACT

As recombination events are not uniformly distributed along the human genome, the estimation of fine-scale recombination maps, e.g. HapMap Project, has been one of the major research endeavors over the last couple of years. For simulation studies, these estimates provide realistic reference scenarios to design future study and to develop novel methodology. To achieve a feasible framework for the estimation of such recombination maps, existing methodology uses sample probabilities for a two-locus model with recombination, with recent advances allowing for computationally fast implementations. In this work, we extend the existing theoretical framework for the recombination rate estimation to the presence of population substructure. We show under which assumptions the existing methodology can still be applied. We illustrate our extension of the methodology by an extensive simulation study.


Subject(s)
Genome, Human/genetics , Recombination, Genetic/genetics , Computer Simulation , Genetics, Population/methods , HapMap Project , Humans , Models, Genetic , Regression Analysis
14.
PLoS One ; 10(6): e0130708, 2015.
Article in English | MEDLINE | ID: mdl-26098940

ABSTRACT

One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.


Subject(s)
Algorithms , Models, Genetic , Population/genetics , Humans , Polymorphism, Single Nucleotide
15.
Am J Med Genet B Neuropsychiatr Genet ; 168B(5): 354-62, 2015 Jul.
Article in English | MEDLINE | ID: mdl-26010163

ABSTRACT

Transcription factor 4 (TCF4) is one of the most robust of all reported schizophrenia risk loci and is supported by several genetic and functional lines of evidence. While numerous studies have implicated common genetic variation at TCF4 in schizophrenia risk, the role of rare, small-sized variants at this locus-such as single nucleotide variants and short indels which are below the resolution of chip-based arrays requires further exploration. The aim of the present study was to investigate the association between rare TCF4 sequence variants and schizophrenia. Exon-targeted resequencing was performed in 190 German schizophrenia patients. Six rare variants at the coding exons and flanking sequences of the TCF4 gene were identified, including two missense variants and one splice site variant. These six variants were then pooled with nine additional rare variants identified in 379 European participants of the 1000 Genomes Project, and all 15 variants were genotyped in an independent German sample (n = 1,808 patients; n = 2,261 controls). These data were then analyzed using six statistical methods developed for the association analysis of rare variants. No significant association (P < 0.05) was found. However, the results from our association and power analyses suggest that further research into the possible involvement of rare TCF4 sequence variants in schizophrenia risk is warranted by the assessment of larger cohorts with higher statistical power to identify rare variant associations.


Subject(s)
Basic Helix-Loop-Helix Leucine Zipper Transcription Factors/genetics , Genetic Predisposition to Disease , Genetic Variation , Schizophrenia/genetics , Transcription Factors/genetics , Female , Genotype , Humans , Male , Transcription Factor 4 , White People/genetics
16.
Genet Epidemiol ; 38(8): 714-21, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25250875

ABSTRACT

DNA methylation may represent an important contributor to the missing heritability described in complex trait genetics. However, technology to measure DNA methylation has outpaced statistical methods for analysis. Taking advantage of the recent finding that methylated sites cluster together, we propose a Spatial Clustering Method (SCM) to detect differentially methylated regions (DMRs) in the genome in case and control studies using spatial location information. This new method compares the distribution of distances in cases and controls between DNA methylation marks in the genomic region of interest. A statistic is computed based on these distances. Proper type I error rate is maintained and statistical significance is evaluated using permutation test. The effectiveness of the SCM we propose is evaluated by a simulation study. By simulating a simple disease model, we demonstrate that SCM has good power to detect DMRs associated with the disease. Finally, we applied the SCM to an exploratory analysis of chromosome 14 from a colorectal cancer data set and identified statistically significant genomic regions. Identification of these regions should lead to a better understanding of methylated sites and their contribution to disease. The SCM can be used as a reliable statistical method for the identification of DMRs associated with disease states in exploratory epigenetic analyses.


Subject(s)
DNA Methylation , Chromosomes, Human, Pair 14 , Cluster Analysis , Colorectal Neoplasms/genetics , Genome, Human , Genomics/methods , Humans , Models, Genetic
17.
Birth Defects Res A Clin Mol Teratol ; 100(6): 493-8, 2014 Jun.
Article in English | MEDLINE | ID: mdl-24706492

ABSTRACT

BACKGROUND: The genes Gremlin-1 (GREM1) and Noggin (NOG) are components of the bone morphogenetic protein 4 pathway, which has been implicated in craniofacial development. Both genes map to recently identified susceptibility loci (chromosomal region 15q13, 17q22) for nonsyndromic cleft lip with or without cleft palate (nsCL/P). The aim of the present study was to determine whether rare variants in either gene are implicated in nsCL/P etiology. METHODS: The complete coding regions, untranslated regions, and splice sites of GREM1 and NOG were sequenced in 96 nsCL/P patients and 96 controls of Central European ethnicity. Three burden and four nonburden tests were performed. Statistically significant results were followed up in a second case-control sample (n = 96, respectively). For rare variants observed in cases, segregation analyses were performed. RESULTS: In NOG, four rare sequence variants (minor allele frequency < 1%) were identified. Here, burden and nonburden analyses generated nonsignificant results. In GREM1, 33 variants were identified, 15 of which were rare. Of these, five were novel. Significant p-values were generated in three nonburden analyses. Segregation analyses revealed incomplete penetrance for all variants investigated. CONCLUSION: Our study did not provide support for NOG being the causal gene at 17q22. However, the observation of a significant excess of rare variants in GREM1 supports the hypothesis that this is the causal gene at chr. 15q13. Because no single causal variant was identified, future sequencing analyses of GREM1 should involve larger samples and the investigation of regulatory elements.


Subject(s)
Carrier Proteins/genetics , Cleft Lip/genetics , Cleft Palate/genetics , Intercellular Signaling Peptides and Proteins/genetics , Alleles , Bone Morphogenetic Protein 4/genetics , Bone Morphogenetic Protein 4/metabolism , Carrier Proteins/metabolism , Case-Control Studies , Chromosomes, Human, Pair 15 , Chromosomes, Human, Pair 17 , Cleft Lip/epidemiology , Cleft Lip/metabolism , Cleft Palate/epidemiology , Cleft Palate/metabolism , DNA Mutational Analysis , Female , Gene Expression Regulation, Developmental , Gene Frequency , Genetic Loci , Genome-Wide Association Study , Germany/epidemiology , Humans , Intercellular Signaling Peptides and Proteins/metabolism , Male , Open Reading Frames , Penetrance , Signal Transduction , Untranslated Regions , White People
18.
Bioinformatics ; 30(2): 157-64, 2014 Jan 15.
Article in English | MEDLINE | ID: mdl-24262215

ABSTRACT

MOTIVATION: For samples of unrelated individuals, we propose a general analysis framework in which hundred thousands of genetic loci can be tested simultaneously for association with complex phenotypes. The approach is built on spatial-clustering methodology, assuming that genetic loci that are associated with the target phenotype cluster in certain genomic regions. In contrast to standard methodology for multilocus analysis, which has focused on the dimension reduction of the data, our multilocus association-clustering test profits from the availability of large numbers of genetic loci by detecting clusters of loci that are associated with the phenotype. RESULTS: The approach is computationally fast and powerful, enabling the simultaneous association testing of large genomic regions. Even the entire genome or certain chromosomes can be tested simultaneously. Using simulation studies, the properties of the approach are evaluated. In an application to a genome-wide association study for chronic obstructive pulmonary disease, we illustrate the practical relevance of the proposed method by simultaneously testing all genotyped loci of the genome-wide association study and by testing each chromosome individually. Our findings suggest that statistical methodology that incorporates spatial-clustering information will be especially useful in whole-genome sequencing studies in which millions or billions of base pairs are recorded and grouped by genomic regions or genes, and are tested jointly for association. AVAILABILITY AND IMPLEMENTATION: Implementation of the approach is available upon request.


Subject(s)
Genetic Loci , Genetic Predisposition to Disease , Genome, Human , Genome-Wide Association Study , Genomics/methods , Polymorphism, Single Nucleotide/genetics , Pulmonary Disease, Chronic Obstructive/genetics , Case-Control Studies , Chromosomes, Human/genetics , Cluster Analysis , Computer Simulation , Genetic Markers , Genotype , Humans , Phenotype
19.
Bioinformatics ; 28(23): 3027-33, 2012 Dec 01.
Article in English | MEDLINE | ID: mdl-23044548

ABSTRACT

MOTIVATION: For the analysis of rare variants in sequence data, numerous approaches have been suggested. Fixed and flexible threshold approaches collapse the rare variant information of a genomic region into a test statistic with reduced dimensionality. Alternatively, the rare variant information can be combined in statistical frameworks that are based on suitable regression models, machine learning, etc. Although the existing approaches provide powerful tests that can incorporate information on allele frequencies and prior biological knowledge, differences in the spatial clustering of rare variants between cases and controls cannot be incorporated. Based on the assumption that deleterious variants and protective variants cluster or occur in different parts of the genomic region of interest, we propose a testing strategy for rare variants that builds on spatial cluster methodology and that guides the identification of the biological relevant segments of the region. Our approach does not require any assumption about the directions of the genetic effects. RESULTS: In simulation studies, we assess the power of the clustering approach and compare it with existing methodology. Our simulation results suggest that the clustering approach for rare variants is well powered, even in situations that are ideal for standard methods. The efficiency of our spatial clustering approach is not affected by the presence of rare variants that have opposite effect size directions. An application to a sequencing study for non-syndromic cleft lip with or without cleft palate (NSCL/P) demonstrates its practical relevance. The proposed testing strategy is applied to a genomic region on chromosome 15q13.3 that was implicated in NSCL/P etiology in a previous genome-wide association study, and its results are compared with standard approaches. AVAILABILITY: Source code and documentation for the implementation in R will be provided online. Currently, the R-implementation only supports genotype data. We currently are working on an extension for VCF files. CONTACT: heide.fier@googlemail.com.


Subject(s)
Alleles , Cleft Lip/genetics , Cleft Palate/genetics , Computer Simulation , Brain/abnormalities , Chromosomes, Human, Pair 15/genetics , Cluster Analysis , Gene Frequency , Genome-Wide Association Study , Genotype , Humans
20.
Birth Defects Res A Clin Mol Teratol ; 94(11): 925-33, 2012 Nov.
Article in English | MEDLINE | ID: mdl-23081944

ABSTRACT

BACKGROUND: Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is one of the most common of all congenital anomalies, and has a multifactorial etiology involving both environmental and genetic factors. Recent genome-wide association studies (GWAS) identified strong association between a locus on chromosome 10q25.3 and NSCL/P in European samples. One gene at 10q25.3, the ventral anterior homeobox 1 (VAX1) gene, is considered a strong candidate gene for craniofacial malformations. The purpose of the present study was to provide further evidence that VAX1 is the causal gene at the 10q25.3 locus through identification of an excess of rare mutations in patients with NSCL/P. METHODS: The 5'UTR, complete coding regions, and adjacent splice sites of the two known VAX1 isoforms were sequenced in 384 patients with NSCL/P and 384 controls of Central European descent. Observed variants were investigated with respect to familial cosegregation or de novo occurrence, and in silico analyses were performed to identify putative effects on the transcript or protein level. RESULTS: Eighteen single-base variants were found, 15 of them rare and previously unreported. In the long VAX1 isoform, predicted functionally relevant variants were observed more often in NSCL/P cases, although this difference was not significant (p = 0.17). Analysis of family members demonstrated incomplete cosegregation in most pedigrees. CONCLUSION: Our data do not support the hypothesis that highly penetrant rare variants in VAX1 are a cause of NSCL/P. To determine whether VAX1 is the causative gene at 10q25.3 further research, in particular into the biologic function of its long isoform, is warranted. Birth Defects Research (Part A), 2012.


Subject(s)
Cleft Lip/genetics , Cleft Palate/genetics , Homeodomain Proteins/genetics , Mutation , Polymorphism, Single Nucleotide , Transcription Factors/genetics , White People , Alleles , Amino Acid Sequence , Case-Control Studies , Chromosomes, Human, Pair 10 , Cleft Lip/pathology , Cleft Palate/pathology , Female , Genetic Loci , Humans , Male , Molecular Sequence Data , Pedigree , Protein Isoforms/genetics , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL
...