Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 74
Filter
Add more filters










Publication year range
1.
Comput Biol Med ; 174: 108434, 2024 May.
Article in English | MEDLINE | ID: mdl-38636329

ABSTRACT

In the study of tumor disease pathogenesis, the identification of genes specifically expressed in disease states is pivotal, yet challenges arise from high-dimensional datasets with limited samples. Conventional gene (feature) selection methods often fall short of capturing the complexity of gene-phenotype and gene-gene interactions, necessitating a more robust analysis method. To address these challenges, a gene subset augmentation strategy is proposed in this paper. Our approach introduces diverse perturbation mechanisms to generate distinct gene subsets. The partial least squares-based multiple gene measurement algorithm considers gene-phenotype and gene-gene correlations, identifying differentially expressed genes, including those with weak signals. The constructed gene networks derived from the augmented subsets unveil regulatory patterns, enabling association analysis to explore gene associations comprehensively. Our algorithm excels in identifying small-sized gene subsets with strong discriminative power, surpassing traditional methods that yield a single gene subset. Unlike conventional approaches, our algorithm reveals a spectrum of different gene subsets and their weakly differentially expressed genes. This nuanced perspective aids in unraveling the molecular characteristics and specific expression patterns of tumor genes. The versatility of our approach not only contributes to the advancement of tumor-specific gene identification but also holds promise for addressing challenges in various fields characterized by high-dimensional datasets and limited samples. The Python implementation is available at http://github.com/wenjieyou/PLSGSA.


Subject(s)
Algorithms , Neoplasms , Humans , Neoplasms/genetics , Gene Expression Profiling , Least-Squares Analysis , Gene Regulatory Networks , Gene Expression Regulation, Neoplastic , Databases, Genetic
2.
BMC Bioinformatics ; 24(1): 142, 2023 Apr 11.
Article in English | MEDLINE | ID: mdl-37041460

ABSTRACT

BACKGROUND: Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder that is highly phenotypically and genetically heterogeneous. With the accumulation of biological sequencing data, more and more studies shift to molecular subtype-first approach, from identifying molecular subtypes based on genetic and molecular data to linking molecular subtypes with clinical manifestation, which can reduce heterogeneity before phenotypic profiling. RESULTS: In this study, we perform similarity network fusion to integrate gene and gene set expression data of multiple human brain cell types for ASD molecular subtype identification. Then we apply subtype-specific differential gene and gene set expression analyses to study expression patterns specific to molecular subtypes in each cell type. To demonstrate the biological and practical significance, we analyze the molecular subtypes, investigate their correlation with ASD clinical phenotype, and construct ASD molecular subtype prediction models. CONCLUSIONS: The identified molecular subtype-specific gene and gene set expression may be used to differentiate ASD molecular subtypes, facilitating the diagnosis and treatment of ASD. Our method provides an analytical pipeline for the identification of molecular subtypes and even disease subtypes of complex disorders.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Humans , Autistic Disorder/genetics , Autism Spectrum Disorder/genetics , Brain/metabolism
3.
Plant Physiol ; 191(4): 2570-2587, 2023 04 03.
Article in English | MEDLINE | ID: mdl-36682816

ABSTRACT

High-salt stress continues to challenge the growth and survival of many plants. Alternative polyadenylation (APA) produces mRNAs with different 3'-untranslated regions (3' UTRs) to regulate gene expression at the post-transcriptional level. However, the roles of alternative 3' UTRs in response to salt stress remain elusive. Here, we report the function of alternative 3' UTRs in response to high-salt stress in S. alterniflora (Spartina alterniflora), a monocotyledonous halophyte tolerant of high-salt environments. We found that high-salt stress induced global APA dynamics, and ∼42% of APA genes responded to salt stress. High-salt stress led to 3' UTR lengthening of 207 transcripts through increasing the usage of distal poly(A) sites. Transcripts with alternative 3' UTRs were mainly enriched in salt stress-related ion transporters. Alternative 3' UTRs of HIGH-AFFINITY K+ TRANSPORTER 1 (SaHKT1) increased RNA stability and protein synthesis in vivo. Regulatory AU-rich elements were identified in alternative 3' UTRs, boosting the protein level of SaHKT1. RNAi-knock-down experiments revealed that the biogenesis of 3' UTR lengthening in SaHKT1 was controlled by the poly(A) factor CLEAVAGE AND POLYADENYLATION SPECIFICITY FACTOR 30 (SaCPSF30). Over-expression of SaHKT1 with an alternative 3' UTR in rice (Oryza sativa) protoplasts increased mRNA accumulation of salt-tolerance genes in an AU-rich element-dependent manner. These results suggest that mRNA 3' UTR lengthening is a potential mechanism in response to high-salt stress. These results also reveal complex regulatory roles of alternative 3' UTRs coupling APA and regulatory elements at the post-transcriptional level in plants.


Subject(s)
Oryza , Salt Tolerance , 3' Untranslated Regions/genetics , Salt Tolerance/genetics , Poaceae/genetics , Oryza/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Polyadenylation/genetics
4.
Genomics Proteomics Bioinformatics ; 21(3): 601-618, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36669641

ABSTRACT

Alternative polyadenylation (APA) contributes to transcriptome complexity and gene expression regulation and has been implicated in various cellular processes and diseases. Single-cell RNA sequencing (scRNA-seq) has enabled the profiling of APA at the single-cell level; however, the spatial information of cells is not preserved in scRNA-seq. Alternatively, spatial transcriptomics (ST) technologies provide opportunities to decipher the spatial context of the transcriptomic landscape. Pioneering studies have revealed potential spatially variable genes and/or splice isoforms; however, the pattern of APA usage in spatial contexts remains unappreciated. In this study, we developed a toolkit called stAPAminer for mining spatial patterns of APA from spatially barcoded ST data. APA sites were identified and quantified from the ST data. In particular, an imputation model based on the k-nearest neighbors algorithm was designed to recover APA signals, and then APA genes with spatial patterns of APA usage variation were identified. By analyzing well-established ST data of the mouse olfactory bulb (MOB), we presented a detailed view of spatial APA usage across morphological layers of the MOB. We compiled a comprehensive list of genes with spatial APA dynamics and obtained several major spatial expression patterns that represent spatial APA dynamics in different morphological layers. By extending this analysis to two additional replicates of the MOB ST data, we observed that the spatial APA patterns of several genes were reproducible among replicates. stAPAminer employs the power of ST to explore the transcriptional atlas of spatial APA patterns with spatial resolution. This toolkit is available at https://github.com/BMILAB/stAPAminer and https://ngdc.cncb.ac.cn/biocode/tools/BT007320.


Subject(s)
Polyadenylation , Transcriptome , Animals , Mice , Sequence Analysis, RNA , Gene Expression Profiling , Gene Expression Regulation , 3' Untranslated Regions
5.
BMC Genomics ; 23(1): 782, 2022 Nov 30.
Article in English | MEDLINE | ID: mdl-36451086

ABSTRACT

BACKGROUND: The identification of gene regulatory networks (GRNs) facilitates the understanding of the underlying molecular mechanism of various biological processes and complex diseases. With the availability of single-cell RNA sequencing data, it is essential to infer GRNs from single-cell expression. Although some GRN methods originally developed for bulk expression data can be applicable to single-cell data and several single-cell specific GRN algorithms were developed, recent benchmarking studies have emphasized the need of developing more accurate and robust GRN modeling methods that are compatible for single-cell expression data. RESULTS: We present SRGS, SPLS (sparse partial least squares)-based recursive gene selection, to infer GRNs from bulk or single-cell expression data. SRGS recursively selects and scores the genes which may have regulations on the considered target gene based on SPLS. When dealing with gene expression data with dropouts, we randomly scramble samples, set some values in the expression matrix to zeroes, and generate multiple copies of data through multiple iterations to make SRGS more robust. We test SRGS on different kinds of expression data, including simulated bulk data, simulated single-cell data without and with dropouts, and experimental single-cell data, and also compared with the existing GRN methods, including the ones originally developed for bulk data, the ones developed specifically for single-cell data, and even the ones recommended by recent benchmarking studies. CONCLUSIONS: It has been shown that SRGS is competitive with the existing GRN methods and effective in the gene regulatory network inference from bulk or single-cell gene expression data. SRGS is available at: https://github.com/JGuan-lab/SRGS .


Subject(s)
Algorithms , Gene Regulatory Networks , Least-Squares Analysis , Benchmarking , Exome Sequencing
6.
Nat Commun ; 13(1): 6467, 2022 10 29.
Article in English | MEDLINE | ID: mdl-36309516

ABSTRACT

Metastatic prostate cancer remains a major clinical challenge and metastatic lesions are highly heterogeneous and difficult to biopsy. Liquid biopsy provides opportunities to gain insights into the underlying biology. Here, using the highly sensitive enrichment-based sequencing technology, we provide analysis of 60 and 175 plasma DNA methylomes from patients with localized and metastatic prostate cancer, respectively. We show that the cell-free DNA methylome can capture variations beyond the tumor. A global hypermethylation in metastatic samples is observed, coupled with hypomethylation in the pericentromeric regions. Hypermethylation at the promoter of a glucocorticoid receptor gene NR3C1 is associated with a decreased immune signature. The cell-free DNA methylome is reflective of clinical outcomes and can distinguish different disease types with 0.989 prediction accuracy. Finally, we show the ability of predicting copy number alterations from the data, providing opportunities for joint genetic and epigenetic analysis on limited biological samples.


Subject(s)
Cell-Free Nucleic Acids , Prostatic Neoplasms , Male , Humans , Epigenome , Cell-Free Nucleic Acids/genetics , Prostatic Neoplasms/pathology , Prostate/pathology , DNA Methylation/genetics
7.
Int J Mol Sci ; 23(15)2022 Jul 23.
Article in English | MEDLINE | ID: mdl-35897701

ABSTRACT

Alternative polyadenylation (APA) is a key layer of gene expression regulation, and APA choice is finely modulated in cells. Advances in single-cell RNA-seq (scRNA-seq) have provided unprecedented opportunities to study APA in cell populations. However, existing studies that investigated APA in single cells were either confined to a few cells or focused on profiling APA dynamics between cell types or identifying APA sites. The diversity and pattern of APA usages on a genomic scale in single cells remains unappreciated. Here, we proposed an analysis framework based on a Gaussian mixture model, scAPAmod, to identify patterns of APA usage from homogeneous or heterogeneous cell populations at the single-cell level. We systematically evaluated the performance of scAPAmod using simulated data and scRNA-seq data. The results show that scAPAmod can accurately identify different patterns of APA usages at the single-cell level. We analyzed the dynamic changes in the pattern of APA usage using scAPAmod in different cell differentiation and developmental stages during mouse spermatogenesis and found that even the same gene has different patterns of APA usages in different differentiation stages. The preference of patterns of usages of APA sites in different genomic regions was also analyzed. We found that patterns of APA usages of the same gene in 3' UTRs (3' untranslated region) and non-3' UTRs are different. Moreover, we analyzed cell-type-specific APA usage patterns and changes in patterns of APA usages across cell types. Different from the conventional analysis of single-cell heterogeneity based on gene expression profiling, this study profiled the heterogeneous pattern of APA isoforms, which contributes to revealing the heterogeneity of single-cell gene expression with higher resolution.


Subject(s)
Gene Expression Profiling , Polyadenylation , 3' Untranslated Regions , Animals , Mice , Polyadenylation/genetics , RNA-Seq , Sequence Analysis, RNA/methods
8.
Front Genet ; 13: 865371, 2022.
Article in English | MEDLINE | ID: mdl-35646047

ABSTRACT

Human brain-related disorders, such as autism spectrum disorder (ASD), are often characterized by cell heterogeneity, as the cell atlas of brains consists of diverse cell types. There are commonality and specificity in gene expression among different cell types of brains; hence, there may also be commonality and specificity in dysregulated gene expression affected by ASD among brain cells. Moreover, as genes interact together, it is important to identify shared and cell-type-specific ASD-related gene modules for studying the cell heterogeneity of ASD. To this end, we propose integrative regularized non-negative matrix factorization (iRNMF) by imposing a new regularization based on integrative non-negative matrix factorization. Using iRNMF, we analyze gene expression data of multiple cell types of the human brain to obtain shared and cell-type-specific gene modules. Based on ASD risk genes, we identify shared and cell-type-specific ASD-associated gene modules. By analyzing these gene modules, we study the commonality and specificity among different cell types in dysregulated gene expression affected by ASD. The shared ASD-associated gene modules are mostly relevant to the functioning of synapses, while in different cell types, different kinds of gene functions may be specifically dysregulated in ASD, such as inhibitory extracellular ligand-gated ion channel activity in GABAergic interneurons and excitatory postsynaptic potential and ionotropic glutamate receptor signaling pathway in glutamatergic neurons. Our results provide new insights into the molecular mechanism and pathogenesis of ASD. The identification of shared and cell-type-specific ASD-related gene modules can facilitate the development of more targeted biomarkers and treatments for ASD.

9.
Nucleic Acids Res ; 50(D1): D365-D370, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34508354

ABSTRACT

Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3'-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.


Subject(s)
3' Untranslated Regions , Databases, Genetic , Polyadenylation , RNA, Messenger/genetics , RNA-Binding Proteins/genetics , User-Computer Interface , Animals , Atlases as Topic , Binding Sites , Cell Lineage/genetics , Chlamydomonas reinhardtii/genetics , Chlamydomonas reinhardtii/metabolism , Eukaryotic Cells/cytology , Eukaryotic Cells/metabolism , Humans , Internet , Mice , MicroRNAs/classification , MicroRNAs/genetics , MicroRNAs/metabolism , Organ Specificity , Plants/genetics , Plants/metabolism , Protein Binding , RNA, Messenger/classification , RNA, Messenger/metabolism , RNA-Binding Proteins/classification , RNA-Binding Proteins/metabolism , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods
10.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34913057

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) allows quantitative analysis of gene expression at the level of single cells, beneficial to study cell heterogeneity. The recognition of cell types facilitates the construction of cell atlas in complex tissues or organisms, which is the basis of almost all downstream scRNA-seq data analyses. Using disease-related scRNA-seq data to perform the prediction of disease status can facilitate the specific diagnosis and personalized treatment of disease. Since single-cell gene expression data are high-dimensional and sparse with dropouts, we propose scIAE, an integrative autoencoder-based ensemble classification framework, to firstly perform multiple random projections and apply integrative and devisable autoencoders (integrating stacked, denoising and sparse autoencoders) to obtain compressed representations. Then base classifiers are built on the lower-dimensional representations and the predictions from all base models are integrated. The comparison of scIAE and common feature extraction methods shows that scIAE is effective and robust, independent of the choice of dimension, which is beneficial to subsequent cell classification. By testing scIAE on different types of data and comparing it with existing general and single-cell-specific classification methods, it is proven that scIAE has a great classification power in cell type annotation intradataset, across batches, across platforms and across species, and also disease status prediction. The architecture of scIAE is flexible and devisable, and it is available at https://github.com/JGuan-lab/scIAE.


Subject(s)
Data Analysis , Single-Cell Analysis , Gene Expression Profiling , RNA-Seq , Sequence Analysis, RNA , Single-Cell Analysis/methods , Exome Sequencing
11.
J Biomed Inform ; 122: 103899, 2021 10.
Article in English | MEDLINE | ID: mdl-34481921

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) is fast becoming a powerful technology that revolutionizes biomedical studies related to development, immunology and cancer by providing genome-scale transcriptional profiles at unprecedented throughput and resolution. However, due to the low capture rate and frequent drop-out events in the sequencing process, scRNA-seq data suffer from extremely high sparsity and variability, challenging the data analysis. Here we proposed a novel method called scLINE for learning low dimensional representations of scRNA-seq data. scLINE is based on the network embedding model that jointly considers multiple gene-gene interaction networks, facilitating the incorporation of prior biological knowledge for signal extraction. We comprehensively evaluated scLINE on eight single-cell datasets. Results show that scLINE achieved comparable or higher performance than competing methods, including PCA, t-SNE and Isomap, in terms of internal validation metrics and clustering accuracy. The low dimensional representations learned by scLINE are effective for downstream single-cell analysis, such as visualization, clustering and cell typing. We have implemented scLINE as an easy-to-use R package, which can be incorporated in other existing scRNA-seq analysis pipelines or tools for data preprocessing.


Subject(s)
Gene Regulatory Networks , Single-Cell Analysis , Cluster Analysis , Gene Expression Profiling , RNA-Seq , Sequence Analysis, RNA
12.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34255024

ABSTRACT

The dynamic choice of different polyadenylation sites in a gene is referred to as alternative polyadenylation, which functions in many important biological processes. Large-scale messenger RNA 3' end sequencing has revealed that cleavage sites for polyadenylation are presented with microheterogeneity. To date, the conventional determination of polyadenylation site clusters is subjective and arbitrary, leading to inaccurate annotations. Here, we present a weighted density peak clustering method, QuantifyPoly(A), to accurately quantify genome-wide polyadenylation choices. Applying QuantifyPoly(A) on published 3' end sequencing datasets from both animals and plants, their polyadenylation profiles are reshaped into myriads of novel polyadenylation site clusters. Most of these novel polyadenylation site clusters show significantly dynamic usage across different biological samples or associate with binding sites of trans-acting factors. Upstream sequences of these clusters are enriched with polyadenylation signals UGUA, UAAA and/or AAUAAA in a species-dependent manner. Polyadenylation site clusters also exhibit species specificity, while plants ones generally show higher microheterogeneity than that of animals. QuantifyPoly(A) is broadly applicable to any types of 3' end sequencing data and species for accurate quantification and construction of the complex and dynamic polyadenylation landscape and enables us to decode alternative polyadenylation events invisible to conventional methods at a much higher resolution.


Subject(s)
Poly A/metabolism , Animals , Arabidopsis/metabolism , Oryza/metabolism , Polyadenylation
13.
Biomedicines ; 9(4)2021 Apr 10.
Article in English | MEDLINE | ID: mdl-33920310

ABSTRACT

Multiple genetic factors contribute to the pathogenesis of autism spectrum disorder (ASD), a kind of neurodevelopmental disorder. Genes were usually studied separately for their associations with ASD. However, genes associated with ASD do not act alone but interact with each other in a network module. The identification of these modules is the basis for the systematic understanding of the pathogenesis of ASD. Moreover, ASD is characterized by highly pathogenic heterogeneity, and gene modules associated with ASD are cell-type-specific. In this study, based on the single-nucleus RNA sequencing data of 41 post-mortem tissue samples from the prefrontal cortex and anterior cingulate cortex of 19 ASD patients and 16 control individuals, we applied sparse module activity factorization, a matrix decomposition method consistent with the multi-factor and heterogeneous characteristics of ASD pathogenesis, to identify cell-type-specific gene modules. Then, statistical procedures were performed to detect highly reproducible cell-type-specific ASD-associated gene modules. Through the enrichment analysis of cell markers, 31 cell-type-specific gene modules related to ASD were further screened out. These 31 gene modules are all enriched with curated ASD risk genes. Finally, we utilized the expression patterns of these cell-type-specific ASD-associated gene modules to build predictive models for ASD. The excellent predictive performance also proved the associations between these gene modules and ASD. Our study confirmed the multifactorial and cell-type-specific characteristics of ASD pathogeneses. The results showed that excitatory neurons such as L2/3, L4, and L5/6-CC play essential roles in ASD's pathogenic processes. We identified the potential ASD target genes that act together in cell-type-specific modules, such as NRG3, KCNIP4, BAI3, PTPRD, LRRTM4, and LINGO2 in the L2/3 gene modules. Our study offers new potential genomic targets for ASD and provides a novel method to study gene modules involved in the pathogenesis of ASD.

14.
J Transl Med ; 19(1): 20, 2021 01 06.
Article in English | MEDLINE | ID: mdl-33407556

ABSTRACT

BACKGROUND: Genome-wide association studies have identified genetic variants associated with the risk of brain-related diseases, such as neurological and psychiatric disorders, while the causal variants and the specific vulnerable cell types are often needed to be studied. Many disease-associated genes are expressed in multiple cell types of human brains, while the pathologic variants affect primarily specific cell types. We hypothesize a model in which what determines the manifestation of a disease in a cell type is the presence of disease module comprised of disease-associated genes, instead of individual genes. Therefore, it is essential to identify the presence/absence of disease gene modules in cells. METHODS: To characterize the cell type-specificity of brain-related diseases, we construct human brain cell type-specific gene interaction networks integrating human brain nucleus gene expression data with a referenced tissue-specific gene interaction network. Then from the cell type-specific gene interaction networks, we identify significant cell type-specific disease gene modules by performing statistical tests. RESULTS: Between neurons and glia cells, the constructed cell type-specific gene networks and their gene functions are distinct. Then we identify cell type-specific disease gene modules associated with autism spectrum disorder and find that different gene modules are formed and distinct gene functions may be dysregulated in different cells. We also study the similarity and dissimilarity in cell type-specific disease gene modules among autism spectrum disorder, schizophrenia and bipolar disorder. The functions of neurons-specific disease gene modules are associated with synapse for all three diseases, while those in glia cells are different. To facilitate the use of our method, we develop an R package, CtsDGM, for the identification of cell type-specific disease gene modules. CONCLUSIONS: The results support our hypothesis that a disease manifests itself in a cell type through forming a statistically significant disease gene module. The identification of cell type-specific disease gene modules can promote the development of more targeted biomarkers and treatments for the disease. Our method can be applied for depicting the cell type heterogeneity of a given disease, and also for studying the similarity and dissimilarity between different disorders, providing new insights into the molecular mechanisms underlying the pathogenesis and progression of diseases.


Subject(s)
Autism Spectrum Disorder , Gene Regulatory Networks , Autism Spectrum Disorder/genetics , Gene Expression Profiling , Genome-Wide Association Study , Humans , Phenotype
15.
Bioinformatics ; 37(16): 2470-2472, 2021 08 25.
Article in English | MEDLINE | ID: mdl-33258917

ABSTRACT

MOTIVATION: Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3' end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. RESULTS: We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3' UTR shortening/lengthening events between conditions. APA site switching involving non-3' UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. AVAILABILITY AND IMPLEMENTATION: https://github.com/BMILAB/movAPA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Oryza , Polyadenylation , 3' Untranslated Regions , Animals , Mice , Oryza/genetics , Poly A/metabolism , RNA-Seq , Software
16.
Brief Bioinform ; 22(4)2021 07 20.
Article in English | MEDLINE | ID: mdl-33142319

ABSTRACT

Alternative polyadenylation (APA) generates diverse mRNA isoforms, which contributes to transcriptome diversity and gene expression regulation by affecting mRNA stability, translation and localization in cells. The rapid development of 3' tag-based single-cell RNA-sequencing (scRNA-seq) technologies, such as CEL-seq and 10x Genomics, has led to the emergence of computational methods for identifying APA sites and profiling APA dynamics at single-cell resolution. However, existing methods fail to detect the precise location of poly(A) sites or sites with low read coverage. Moreover, they rely on priori genome annotation and can only detect poly(A) sites located within or near annotated genes. Here we proposed a tool called scAPAtrap for detecting poly(A) sites at the whole genome level in individual cells from 3' tag-based scRNA-seq data. scAPAtrap incorporates peak identification and poly(A) read anchoring, enabling the identification of the precise location of poly(A) sites, even for sites with low read coverage. Moreover, scAPAtrap can identify poly(A) sites without using priori genome annotation, which helps locate novel poly(A) sites in previously overlooked regions and improve genome annotation. We compared scAPAtrap with two latest methods, scAPA and Sierra, using scRNA-seq data from different experimental technologies and species. Results show that scAPAtrap identified poly(A) sites with higher accuracy and sensitivity than competing methods and could be used to explore APA dynamics among cell types or the heterogeneous APA isoform expression in individual cells. scAPAtrap is available at https://github.com/BMILAB/scAPAtrap.


Subject(s)
Databases, Nucleic Acid , Genome , RNA 3' Polyadenylation Signals , RNA-Seq , Single-Cell Analysis , Software , Molecular Sequence Annotation
17.
NPJ Schizophr ; 6(1): 9, 2020 Apr 03.
Article in English | MEDLINE | ID: mdl-32245959

ABSTRACT

Schizophrenia (SCZ) is a severe, highly heterogeneous psychiatric disorder with varied clinical presentations. The polygenic genetic architecture of SCZ makes identification of causal variants a daunting task. Gene expression analyses hold the promise of revealing connections between dysregulated transcription and underlying variants in SCZ. However, the most commonly used differential expression analysis often assumes grouped samples are from homogeneous populations and thus cannot be used to detect expression variance differences between samples. Here, we applied the test for equality of variances to normalized expression data, generated by the CommonMind Consortium (CMC), from brains of 212 SCZ and 214 unaffected control (CTL) samples. We identified 87 genes, including VEGFA (vascular endothelial growth factor) and BDNF (brain-derived neurotrophic factor), that showed a significantly higher expression variance among SCZ samples than CTL samples. In contrast, only one gene showed the opposite pattern. To extend our analysis to gene sets, we proposed a Mahalanobis distance-based test for multivariate homogeneity of group dispersions, with which we identified 110 gene sets with a significantly higher expression variability in SCZ, including sets of genes encoding phosphatidylinositol 3-kinase (PI3K) complex and several others involved in cerebellar cortex morphogenesis, neuromuscular junction development, and cerebellar Purkinje cell layer development. Taken together, our results suggest that SCZ brains are characterized by overdispersed gene expression-overall gene expression variability among SCZ samples is significantly higher than that among CTL samples. Our study showcases the application of variability-centric analyses in SCZ research.

18.
Front Cell Neurosci ; 14: 59, 2020.
Article in English | MEDLINE | ID: mdl-32265661

ABSTRACT

Autism spectrum disorder (ASD) is a complex neuropsychiatric disorder characterized by substantial heterogeneity. To identify the convergence of disease pathology on common pathways, it is essential to understand the correlations among ASD candidate genes and study shared molecular pathways between them. Investigating functional interactions between ASD candidate genes in different cell types of normal human brains may shed new light on the genetic heterogeneity of ASD. Here we apply cell type-specific gene network-based analysis to analyze human brain nucleus gene expression data and identify cell type-specific ASD-associated gene modules. ASD-associated modules specific to different cell types are relevant to different gene functions, for instance, the astrocytes-specific module is involved in functions of axon and neuron projection guidance, GABAergic interneuron-specific modules are involved in functions of postsynaptic membrane, extracellular matrix structural constituent, and ion transmembrane transporter activity. Our findings can promote the study of cell type heterogeneity of ASD, providing new insights into the pathogenesis of ASD. Our method has been shown to be effective in discovering cell type-specific disease-associated gene expression patterns and can be applied to other complex diseases.

19.
Plant Cell Physiol ; 61(5): 882-896, 2020 May 01.
Article in English | MEDLINE | ID: mdl-32044993

ABSTRACT

Spartina alterniflora (Spartina) is the only halophyte in the salt marsh. However, the molecular basis of its high salt tolerance remains elusive. In this study, we used Pacific Biosciences (PacBio) full-length single-molecule long-read sequencing and RNA-seq to elucidate the transcriptome dynamics of high salt tolerance in Spartina by salt gradient experiments. High-quality unigenes, transcription factors, non-coding RNA and Spartina-specific transcripts were identified. Co-expression network analysis found that protein kinase-encoding genes (SaOST1, SaCIPK10 and SaLRRs) are hub genes in the salt tolerance regulatory network. High salt stress induced the expression of transcription factors but repressed the expression of long non-coding RNAs. The Spartina transcriptome is closer to rice than Arabidopsis, and a higher proportion of transporter and transcription factor-encoding transcripts have been found in Spartina. Transcriptome analysis showed that high salt stress induced the expression of carbohydrate metabolism, especially cell-wall biosynthesis-related genes in Spartina, and repressed its expression in rice. Compared with rice, high salt stress highly induced the expression of stress response, protein modification and redox-related gene expression and greatly inhibited translation in Spartina. High salt stress also induced alternative splicing in Spartina, while differentially expressed alternative splicing events associated with photosynthesis were overrepresented in Spartina but not in rice. Finally, we built the SAPacBio website for visualizing full-length transcriptome sequences, transcription factors, ncRNAs, salt-tolerant genes and alternative splicing events in Spartina. Overall, this study suggests that the salt tolerance mechanism in Spartina is different from rice in many aspects and is far more complex than expected.


Subject(s)
Poaceae/genetics , Poaceae/physiology , Salt Tolerance/genetics , Salt-Tolerant Plants/genetics , Transcriptome/genetics , Alternative Splicing/genetics , Arabidopsis/genetics , Gene Expression Regulation, Plant , Gene Ontology , Gene Regulatory Networks , Genes, Plant , Oryza/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Stress, Physiological/genetics , Transcription Factors/metabolism
20.
Front Genet ; 11: 628539, 2020.
Article in English | MEDLINE | ID: mdl-33519924

ABSTRACT

Bulk transcriptomic analyses of autism spectrum disorder (ASD) have revealed dysregulated pathways, while the brain cell type-specific molecular pathology of ASD still needs to be studied. Machine learning-based studies can be conducted for ASD, prioritizing high-confidence gene candidates and promoting the design of effective interventions. Using human brain nucleus gene expression of ASD and controls, we construct cell type-specific predictive models for ASD based on individual genes and gene sets, respectively, to screen cell type-specific ASD-associated genes and gene sets. These two kinds of predictive models can predict the diagnosis of a nucleus with known cell type. Then, we construct a multi-label predictive model for predicting the cell type and diagnosis of a nucleus at the same time. Our findings suggest that layer 2/3 and layer 4 excitatory neurons, layer 5/6 cortico-cortical projection neurons, parvalbumin interneurons, and protoplasmic astrocytes are preferentially affected in ASD. The functions of genes with predictive power for ASD are different and the top important genes are distinct across different cells, highlighting the cell-type heterogeneity of ASD. The constructed predictive models can promote the diagnosis of ASD, and the prioritized cell type-specific ASD-associated genes and gene sets may be used as potential biomarkers of ASD.

SELECTION OF CITATIONS
SEARCH DETAIL
...