Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
BMC Bioinformatics ; 25(1): 181, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38720247

ABSTRACT

BACKGROUND: RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins. RESULTS: We aimed to investigate the impact of data preprocessing steps-focusing on normalization, batch effect correction, and data scaling-through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer. CONCLUSION: By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.


Subject(s)
Machine Learning , Neoplasms , RNA-Seq , Humans , RNA-Seq/methods , Neoplasms/genetics , Transcriptome/genetics , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Computational Biology/methods
2.
Genes Brain Behav ; 22(6): e12851, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37259642

ABSTRACT

Anxiety disorders are common and can be debilitating, with effective treatments remaining hampered by an incomplete understanding of the underlying genetic etiology. Improvements have been made in understanding the genetic influences on mouse behavioral models of anxiety, yet it is unclear the extent to which genes identified in these experimental systems contribute to genetic variation in human anxiety phenotypes. Leveraging new and existing large-scale human genome-wide association studies, we tested whether sets of genes previously identified in mouse anxiety-like behavior studies contribute to a range of human anxiety disorders. When tested as individual genes, 13 mouse-identified genes were associated with human anxiety phenotypes, suggesting an overlap of individual genes contributing to both mouse models of anxiety-like behaviors and human anxiety traits. When genes were tested as sets, we did identify 14 significant associations between mouse gene sets and human anxiety, but the majority of gene sets showed no significant association with human anxiety phenotypes. These few significant associations indicate a need to identify and develop more translatable mouse models by identifying sets of genes that "match" between model systems and specific human phenotypes of interest. We suggest that continuing to develop improved behavioral paradigms and finer-scale experimental data, for instance from individual neuronal subtypes or cell-type-specific expression data, is likely to improve our understanding of the genetic etiology and underlying functional changes in anxiety disorders.


Subject(s)
Anxiety Disorders , Genome-Wide Association Study , Humans , Mice , Animals , Anxiety Disorders/genetics , Anxiety/genetics , Phenotype
3.
PLoS Genet ; 19(5): e1010693, 2023 05.
Article in English | MEDLINE | ID: mdl-37216417

ABSTRACT

It remains unknown to what extent gene-gene interactions contribute to complex traits. Here, we introduce a new approach using predicted gene expression to perform exhaustive transcriptome-wide interaction studies (TWISs) for multiple traits across all pairs of genes expressed in several tissue types. Using imputed transcriptomes, we simultaneously reduce the computational challenge and improve interpretability and statistical power. We discover (in the UK Biobank) and replicate (in independent cohorts) several interaction associations, and find several hub genes with numerous interactions. We also demonstrate that TWIS can identify novel associated genes because genes with many or strong interactions have smaller single-locus model effect sizes. Finally, we develop a method to test gene set enrichment of TWIS associations (E-TWIS), finding numerous pathways and networks enriched in interaction associations. Epistasis is may be widespread, and our procedure represents a tractable framework for beginning to explore gene interactions and identify novel genomic targets.


Subject(s)
Epistasis, Genetic , Transcriptome , Transcriptome/genetics , Multifactorial Inheritance/genetics , Gene Regulatory Networks/genetics , Phenotype , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods
4.
Nicotine Tob Res ; 25(5): 1030-1038, 2023 04 06.
Article in English | MEDLINE | ID: mdl-36444815

ABSTRACT

INTRODUCTION: Smoking behaviors are partly heritable, yet the genetic and environmental mechanisms underlying smoking phenotypes are not fully understood. Developmental nicotine exposure (DNE) is a significant risk factor for smoking and leads to gene expression changes in mouse models; however, it is unknown whether the same genes whose expression is impacted by DNE are also those underlying smoking genetic liability. We examined whether genes whose expression in D1-type striatal medium spiny neurons due to DNE in the mouse are also associated with human smoking behaviors. METHODS: Specifically, we assessed whether human orthologs of mouse-identified genes, either individually or as a set, were genetically associated with five human smoking traits using MAGMA and S-LDSC while implementing a novel expression-based gene-SNP annotation methodology. RESULTS: We found no strong evidence that these genes sets were more strongly associated with smoking behaviors than the rest of the genome, but ten of these individual genes were significantly associated with three of the five human smoking traits examined (p < 2.5e-6). Three of these genes have not been reported previously and were discovered only when implementing the expression-based annotation. CONCLUSIONS: These results suggest the genes whose expression is impacted by DNE in mice are largely distinct from those contributing to smoking genetic liability in humans. However, examining a single mouse neuronal cell type may be too fine a resolution for comparison, suggesting that experimental manipulation of nicotine consumption, reward, or withdrawal in mice may better capture genes related to the complex genetics of human tobacco use. IMPLICATIONS: Genes whose expression is impacted by DNE in mouse D1-type striatal medium spiny neurons were not found to be, as a whole, more strongly associated with human smoking behaviors than the rest of the genome, though ten individual mouse-identified genes were associated with human smoking traits. This suggests little overlap between the genetic mechanisms impacted by DNE and those influencing heritable liability to smoking phenotypes in humans. Further research is warranted to characterize how developmental nicotine exposure paradigms in mice can be translated to understand nicotine use in humans and their heritable effects on smoking.


Subject(s)
Nicotine , Smoking , Humans , Animals , Mice , Smoking/genetics , Phenotype , Tobacco Smoking , Disease Models, Animal
5.
Eur J Hum Genet ; 2022 Nov 29.
Article in English | MEDLINE | ID: mdl-36446896

ABSTRACT

Complex traits show clear patterns of tissue-specific expression influenced by single nucleotide polymorphisms (SNPs), yet current strategies aggregate SNP effects to genes by employing simple physical proximity-based windows. Here, we examined whether incorporating SNPs with effects on tissue-specific cis-expression would improve our ability to detect trait-relevant tissues across 31 complex traits using stratified linkage disequilibrium score regression (S-LDSC). We found that a physical proximity annotation produced more significant tissue enrichments and larger S-LDSC regression coefficients, as compared to an expression-based annotation. Furthermore, we showed that our expression-based annotation did not outperform an annotation strategy in which an equal number of randomly chosen SNPs were annotated to genes within the same genomic window, suggesting extensive redundancy among SNP effect estimates due to linkage disequilibrium. That said, current sample sizes limit estimation of cis-genetic SNP effects; therefore, we recommend reexamination of the expression-based annotation when larger tissue-specific expression datasets become available. To examine the influence of sample size, we used a large whole blood eQTL reference panel (N = 31,684) applying a similar expression-based annotation strategy. We found that significant cis-expression QTLs in whole blood did not outperform the physical proximity annotation when estimating tissue-specific SNP heritability enrichment for either high- or low-density lipoprotein phenotypes but performed similarly for inflammatory bowel disease. Finally, we report new and updated tissue enrichment estimates across 31 complex traits, such as significant heritability enrichment of the frontal cortex for cognitive performance, educational attainment, and intelligence, providing further evidence of this structure's importance in higher cognitive function.

6.
Front Cell Neurosci ; 15: 629279, 2021.
Article in English | MEDLINE | ID: mdl-33897370

ABSTRACT

Microglia are the primary resident immune cells of the central nervous system that maintain physiological homeostasis in the brain and contribute to the pathogenesis of many psychiatric disorders and neurodegenerative diseases. Due to the lack of appropriate human cellular models, it is difficult to study the basic pathophysiological processes linking microglia to brain diseases. In this study, we adopted a microglia-like cellular model derived from peripheral blood monocytes with granulocyte-macrophage colony-stimulating factor (GM-CSF) and interleukin-34 (IL-34). We characterized and validated this in vitro cellular model by morphology, immunocytochemistry, gene expression profiles, and functional study. Our results indicated that the iMG cells developed typical microglial ramified morphology, expressed microglial specific surface markers (P2RY12 and TMEM119), and possessed phagocytic activity. Principal component analyses and multidimensional scaling analyses of RNA-seq data showed that iMG cells were distinct from monocytes and induced macrophages (iMacs) but clustered closer to human microglia and hiPSC-induced microglia. Heatmap analyses also found that iMG cells, but not monocytes, were closely clustered with human primary microglia. Further pathway and relative expression analysis indicated that unique genes from iMG cells were involved in the regulation of the complement system, especially in the synapse and ion transport. Overall, our data demonstrated that the iMG model mimicked many features of the brain resident microglia, highlighting its utility in the study of microglial function in many brain diseases, such as schizophrenia and Alzheimer's disease (AD).

8.
Transl Psychiatry ; 10(1): 307, 2020 09 01.
Article in English | MEDLINE | ID: mdl-32873781

ABSTRACT

Schizophrenia (SCZ) is a severe psychiatric disorder with a strong genetic component. High heritability of SCZ suggests a major role for transmitted genetic variants. Furthermore, SCZ is also associated with a marked reduction in fecundity, leading to the hypothesis that alleles with large effects on risk might often occur de novo. In this study, we conducted whole-genome sequencing for 23 families from two cohorts with unaffected siblings and parents. Two nonsense de novo mutations (DNMs) in GJC1 and HIST1H2AD were identified in SCZ patients. Ten genes (DPYSL2, NBPF1, SDK1, ZNF595, ZNF718, GCNT2, SNX9, AACS, KCNQ1, and MSI2) were found to carry more DNMs in SCZ patients than their unaffected siblings by burden test. Expression analyses indicated that these DNM implicated genes showed significantly higher expression in prefrontal cortex in prenatal stage. The DNM in the GJC1 gene is highly likely a loss function mutation (pLI = 0.94), leading to the dysregulation of ion channel in the glutamatergic excitatory neurons. Analysis of rare variants in independent exome sequencing dataset indicates that GJC1 has significantly more rare variants in SCZ patients than in unaffected controls. Data from genome-wide association studies suggested that common variants in the GJC1 gene may be associated with SCZ and SCZ-related traits. Genes co-expressed with GJC1 are involved in SCZ, SCZ-associated pathways, and drug targets. These evidences suggest that GJC1 may be a risk gene for SCZ and its function may be involved in prenatal and early neurodevelopment, a vulnerable period for developmental disorders such as SCZ.


Subject(s)
Schizophrenia , China , Connexins/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Mutation , Schizophrenia/genetics , Siblings
9.
Schizophr Res Treatment ; 2020: 1638403, 2020.
Article in English | MEDLINE | ID: mdl-32774919

ABSTRACT

Schizophrenia is a complex disorder with many comorbid conditions. In this study, we used polygenic risk scores (PRSs) from schizophrenia and comorbid traits to explore consistent cluster structure in schizophrenia patients. With 10 comorbid traits, we found a stable 4-cluster structure in two datasets (MGS and SSCCS). When the same traits and parameters were applied for the patients in a clinical trial of antipsychotics, the CATIE study, a 5-cluster structure was observed. One of the 4 clusters found in the MGS and SSCCS was further split into two clusters in CATIE, while the other 3 clusters remained unchanged. For the 5 CATIE clusters, we evaluated their association with the changes of clinical symptoms, neurocognitive functions, and laboratory tests between the enrollment baseline and the end of Phase I trial. Class I was found responsive to treatment, with significant reduction for the total, positive, and negative symptoms (p = 0.0001, 0.0099, and 0.0028, respectively), and improvement for cognitive functions (VIGILANCE, p = 0.0099; PROCESSING SPEED, p = 0.0006; WORKING MEMORY, p = 0.0023; and REASONING, p = 0.0015). Class II had modest reduction of positive symptoms (p = 0.0492) and better PROCESSING SPEED (p = 0.0071). Class IV had a specific reduction of negative symptoms (p = 0.0111) and modest cognitive improvement for all tested domains. Interestingly, Class IV was also associated with decreased lymphocyte counts and increased neutrophil counts, an indication of ongoing inflammation or immune dysfunction. In contrast, Classes III and V showed no symptom reduction but a higher level of phosphorus. Overall, our results suggest that PRSs from schizophrenia and comorbid traits can be utilized to classify patients into subtypes with distinctive clinical features. This genetic susceptibility based subtyping may be useful to facilitate more effective treatment and outcome prediction.

10.
Sci Rep ; 9(1): 12717, 2019 09 03.
Article in English | MEDLINE | ID: mdl-31481703

ABSTRACT

Recent studies imply that rare variants contribute to the risk of schizophrenia, however, the exact variants or genes responsible for this condition are largely unknown. In this study, we conducted whole genome sequencing (WGS) of 20 Chinese families. Each family consisted of at least two affected siblings diagnosed with schizophrenia and at least one unaffected sibling. We examined functional variants that were found in affected sibling(s) but not in unaffected sibling(s) within a family. Matching this criterion, a frameshift heterozygous deletion of CA (-/CA) at chromosome 18:24722722, also referred to as rs752084147, in the Carbohydrate Sulfotransferase 9 (CHST9) gene, was detected in two families. This deletion was confirmed by PCR-based Sanger sequencing. With the observed frequency of 0.00076 in Han Chinese population, we performed both case-control and family-based analyses to evaluate its association with schizophrenia. In the case-control analyses, Chi-square test P-value was 6.80e-12 and the P-value was 0.0008 after one million simulations. In family-based segregation analyses, segregation P-value was 7.72e-7 and simulated P-value was 5.70e-6. For both the case-control and family-based analyses, the CA deletion was significantly associated with schizophrenia in the Chinese population. Further investigation of this gene  is warranted in the development of schizophrenia by utilizing larger and more ethnically diverse samples.


Subject(s)
Asian/genetics , Chromosomes, Human, Pair 18/genetics , Family , Frameshift Mutation , Polymorphism, Single Nucleotide , Schizophrenia , Sulfotransferases/genetics , Female , Humans , Male , Schizophrenia/ethnology , Schizophrenia/genetics , Whole Genome Sequencing
11.
J Neuroimmune Pharmacol ; 13(4): 532-540, 2018 12.
Article in English | MEDLINE | ID: mdl-30276764

ABSTRACT

Schizophrenia is genetically heterogeneous and comorbid with many conditions. In this study, we explored polygenic scores (PGSs) from genetically related conditions and traits to predict schizophrenia diagnosis using both logistic regression and deep neural network (DNN) models. We used the combined Molecular Genetics of Schizophrenia and Swedish Schizophrenia Case Control Study (MGS + SSCCS) data for training and testing the models, and used the Clinical Antipsychotic Trials for Intervention Effectiveness (CATIE) data as independent validation. We screened 28 conditions and traits comorbid with schizophrenia to identify traits as potential predictors and used LASSO regression to select predictors for model construction. We investigated how PGS calculation influenced model performance. We found that the inclusion of comorbid traits improved model performance and PGSs calculated from two traits were more generalizable in independent validation. With a DNN model using 19 PGS predictors, we accomplished a prediction accuracy of 0.813 and an AUC of 0.905 in the MGS + SSCCS data. When this model was validated with the CATIE data, it achieved an accuracy of 0.721 and AUC of 0.747. Our results indicate that PGSs alone may not be sufficient to predict schizophrenia accurately and the inclusion of behavioral and clinical data may be necessary for more accurate prediction model.


Subject(s)
Databases, Genetic/trends , Quantitative Trait, Heritable , Schizophrenia/diagnosis , Schizophrenia/genetics , Adult , Case-Control Studies , Female , Humans , Male , Multifactorial Inheritance/genetics , Predictive Value of Tests , Schizophrenia/epidemiology , Sweden/epidemiology
SELECTION OF CITATIONS
SEARCH DETAIL
...