RESUMO
BACKGROUND: Crohn's disease is one of the two categories of inflammatory bowel diseases that affect the gastrointestinal tract. The heritability estimate has been reported to be 0.75. Several genes linked to Crohn's disease risk have been identified using a plethora of strategies such as linkage-based studies, candidate gene association studies, and lately through genome-wide association studies (GWAS). Nevertheless, to our knowledge, a compendium of all the genes that have been associated with CD is lacking. METHODS: We conducted functional analyses of a gene set generated from a systematic review where genes potentially related to CD found in the literature were analyzed and classified depending on the genetic evidence reported and putative biological function. For this, we retrieved and analyzed 2496 abstracts comprising 1067 human genes plus 22 publications regarding 133 genes from GWAS Catalog. Then, each gene was curated and categorized according to the type of evidence associated with Crohn's disease. RESULTS: We identified 126 genes associated with Crohn's disease risk by specific experiments. Additionally, 71 genes were recognized associated through GWAS alone, 18 to treatment response, 41 to disease complications, and 81 to related diseases. Bioinformatic analysis of the 126 genes supports their importance in Crohn's disease and highlights genes associated with specific aspects such as symptoms, drugs, and comorbidities. Importantly, most genes were not included in commercial genetic panels suggesting that Crohn's disease is genetically underdiagnosed. CONCLUSIONS: We identified a total of 126 genes from PubMed and 71 from GWAS that showed evidence of association to diagnosis, 18 to treatment response, and 41 to disease complications in Crohn's disease. This prioritized gene catalog can be explored at http://victortrevino.bioinformatics.mx/CrohnDisease .
Assuntos
Doença de Crohn , Doenças Inflamatórias Intestinais , Biologia Computacional , Doença de Crohn/diagnóstico , Estudo de Associação Genômica Ampla , HumanosRESUMO
BACKGROUND: Crohn's disease (CD) is a type of inflammatory bowel disease (IBD) that affects the gastrointestinal tract with diverse symptoms. At present, genome-wide association studies (GWAS) has discovered more than 140 genetic loci associated with CD from several datasets. Using the usual univariate GWAS methods, researchers have discovered common variants with small effects. Univariate methods assume independence among the variants that miss subtle combinatorial signals. Multivariate approaches have improved risk prediction and have complemented univariate methods for elucidating the etiology of complex traits and potential novel associations. However, the current multivariate models for CD have been assessed for three datasets (published from 2006 to 2008) under unrelated methodological settings showing a broad performance spectrum. Notably, these multivariate studies do not analyze potential novel variants. Here, we aimed to perform a robust multivariate analysis of a CD dataset different from the one commonly used, and we used the information yielded by the models to identify whether the generated models could provide additional information about the potential novel variants of CD. METHODS: Therefore, we compared different multivariate methods and models, LASSO (least absolute shrinkage and selection operator), XGBoost, random forest (RF), Bootstrap stage-wise model selection (BSWiMS), and LDpred, using a strict random subsampling approach to predict the CD risk using a recent GWAS dataset, United Kingdom IBD IBD Genetics Consortium (UKIBDGC), made available in 2017, that had not been used for CD prediction studies. In addition, we assessed the effect of common strategies by increasing and decreasing the number of single-nucleotide polymorphism (SNP) markers (using genotype imputation and linkage disequilibrium (LD)-clumping). RESULTS: We found that the LDpred model without any imputation was the best model among all the tested models for predicting the CD risk (area under the receiver operating characteristic curve (AUROC) = 0.667 ± 0.024) in this dataset. We validated the best models using a second dataset (National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) IBD Genetics Consortium, which was previously used in CD prediction studies) in which LDpred was also the best method with a similar performance (AUROC = 0.634 ± 0.009). Based on the importance of the variants yielded by the multivariate models, we identified an unnoticed region within chromosome 6, tagged by SNP rs4945943; this region was close to the gene MARCKS, which appeared to contribute to CD risk. CONCLUSIONS: This research is the first multivariate prediction analysis applied to the UKIBDGC dataset. Our robust multivariate setting analysis enabled us to identify a potential variant that contributed to the CD risk. Multivariate methods are valuable tools for identifying genes that contribute to disease risk.
Assuntos
Doença de Crohn , Doenças Inflamatórias Intestinais , Doença de Crohn/genética , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla , Humanos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
BACKGROUND: Late-Onset Alzheimer's Disease (LOAD) is a leading form of dementia. There is no effective cure for LOAD, leaving the treatment efforts to depend on preventive cognitive therapies, which stand to benefit from the timely estimation of the risk of developing the disease. Fortunately, a growing number of Machine Learning methods that are well positioned to address this challenge are becoming available. RESULTS: We conducted systematic comparisons of representative Machine Learning models for predicting LOAD from genetic variation data provided by the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Our experimental results demonstrate that the classification performance of the best models tested yielded â¼72% of area under the ROC curve. CONCLUSIONS: Machine learning models are promising alternatives for estimating the genetic risk of LOAD. Systematic machine learning model selection also provides the opportunity to identify new genetic markers potentially associated with the disease.
Assuntos
Doença de Alzheimer/genética , Idade de Início , Idoso , Benchmarking , Estudos de Coortes , Feminino , Genômica , Humanos , Aprendizado de Máquina , Masculino , Neuroimagem/métodos , Curva ROCRESUMO
To identify genetic variants influencing bone mineral density (BMD) in the Mexican-Mestizo population, we performed a GWAS for femoral neck (FN) and lumbar spine (LS) in Mexican-Mestizo postmenopausal women. In the discovery sample, 300,000 SNPs were genotyped in a cohort of 411 postmenopausal women and seven SNPs were analyzed in the replication cohort (n = 420). The combined results of a meta-analysis from the discovery and replication samples identified two loci, RMND1 (rs6904364, P = 2.77 × 10-4) and CCDC170 (rs17081341, P = 1.62 × 10-5), associated with FN BMD. We also compared our results with those of the Genetic Factors for Osteoporosis (GEFOS) Consortium meta-analysis. The comparison revealed two loci previously reported in the GEFOS meta-analysis: SOX6 (rs7128738) and PKDCC (rs11887431) associated with FN and LS BMD, respectively, in our study population. Interestingly, rs17081341 rare in Caucasians (minor allele frequency < 0.03) was found in high frequency in our population, which suggests that this association could be specific to non-Caucasian populations. In conclusion, the first pilot Mexican GWA study of BMD confirmed previously identified loci and also demonstrated the importance of studying variability in diverse populations and/or specific populations.
RESUMO
Type 2 diabetes (T2D) affects more than 415 million people worldwide, and its costs to the health care system continue to rise. To identify common or rare genetic variation with potential therapeutic implications for T2D, we analyzed and replicated genome-wide protein coding variation in a total of 8,227 individuals with T2D and 12,966 individuals without T2D of Latino descent. We identified a novel genetic variant in the IGF2 gene associated with â¼20% reduced risk for T2D. This variant, which has an allele frequency of 17% in the Mexican population but is rare in Europe, prevents splicing between IGF2 exons 1 and 2. We show in vitro and in human liver and adipose tissue that the variant is associated with a specific, allele-dosage-dependent reduction in the expression of IGF2 isoform 2. In individuals who do not carry the protective allele, expression of IGF2 isoform 2 in adipose is positively correlated with both incidence of T2D and increased plasma glycated hemoglobin in individuals without T2D, providing support that the protective effects are mediated by reductions in IGF2 isoform 2. Broad phenotypic examination of carriers of the protective variant revealed no association with other disease states or impaired reproductive health. These findings suggest that reducing IGF2 isoform 2 expression in relevant tissues has potential as a new therapeutic strategy for T2D, even beyond the Latin American population, with no major adverse effects on health or reproduction.
Assuntos
Diabetes Mellitus Tipo 2/genética , Fator de Crescimento Insulin-Like II/metabolismo , Sítios de Splice de RNA/genética , Tecido Adiposo , Linhagem Celular , Regulação da Expressão Gênica/fisiologia , Variação Genética , Genótipo , Humanos , Fator de Crescimento Insulin-Like II/genética , Fígado , Americanos Mexicanos/genética , México , Isoformas de Proteínas , Células-Tronco , População BrancaRESUMO
IMPORTANCE: Latino populations have one of the highest prevalences of type 2 diabetes worldwide. OBJECTIVES: To investigate the association between rare protein-coding genetic variants and prevalence of type 2 diabetes in a large Latino population and to explore potential molecular and physiological mechanisms for the observed relationships. DESIGN, SETTING, AND PARTICIPANTS: Whole-exome sequencing was performed on DNA samples from 3756 Mexican and US Latino individuals (1794 with type 2 diabetes and 1962 without diabetes) recruited from 1993 to 2013. One variant was further tested for allele frequency and association with type 2 diabetes in large multiethnic data sets of 14,276 participants and characterized in experimental assays. MAIN OUTCOME AND MEASURES: Prevalence of type 2 diabetes. Secondary outcomes included age of onset, body mass index, and effect on protein function. RESULTS: A single rare missense variant (c.1522G>A [p.E508K]) was associated with type 2 diabetes prevalence (odds ratio [OR], 5.48; 95% CI, 2.83-10.61; P = 4.4 × 10(-7)) in hepatocyte nuclear factor 1-α (HNF1A), the gene responsible for maturity onset diabetes of the young type 3 (MODY3). This variant was observed in 0.36% of participants without type 2 diabetes and 2.1% of participants with it. In multiethnic replication data sets, the p.E508K variant was seen only in Latino patients (n = 1443 with type 2 diabetes and 1673 without it) and was associated with type 2 diabetes (OR, 4.16; 95% CI, 1.75-9.92; P = .0013). In experimental assays, HNF-1A protein encoding the p.E508K mutant demonstrated reduced transactivation activity of its target promoter compared with a wild-type protein. In our data, carriers and noncarriers of the p.E508K mutation with type 2 diabetes had no significant differences in compared clinical characteristics, including age at onset. The mean (SD) age for carriers was 45.3 years (11.2) vs 47.5 years (11.5) for noncarriers (P = .49) and the mean (SD) BMI for carriers was 28.2 (5.5) vs 29.3 (5.3) for noncarriers (P = .19). CONCLUSIONS AND RELEVANCE: Using whole-exome sequencing, we identified a single low-frequency variant in the MODY3-causing gene HNF1A that is associated with type 2 diabetes in Latino populations and may affect protein function. This finding may have implications for screening and therapeutic modification in this population, but additional studies are required.
Assuntos
Diabetes Mellitus Tipo 2/genética , Fator 1-alfa Nuclear de Hepatócito/genética , Adulto , Idade de Início , Idoso , Feminino , Genótipo , Hispânico ou Latino/genética , Humanos , Masculino , México , Pessoa de Meia-Idade , Mutação de Sentido Incorreto , Análise de Sequência de DNA , Estados UnidosRESUMO
Mexico harbors great cultural and ethnic diversity, yet fine-scale patterns of human genome-wide variation from this region remain largely uncharacterized. We studied genomic variation within Mexico from over 1000 individuals representing 20 indigenous and 11 mestizo populations. We found striking genetic stratification among indigenous populations within Mexico at varying degrees of geographic isolation. Some groups were as differentiated as Europeans are from East Asians. Pre-Columbian genetic substructure is recapitulated in the indigenous ancestry of admixed mestizo individuals across the country. Furthermore, two independently phenotyped cohorts of Mexicans and Mexican Americans showed a significant association between subcontinental ancestry and lung function. Thus, accounting for fine-scale ancestry patterns is critical for medical and population genetic studies within Mexico, in Mexican-descent populations, and likely in many other populations worldwide.