Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 24.699
Filter
1.
Gigascience ; 122022 Dec 28.
Article in English | MEDLINE | ID: mdl-36691728

ABSTRACT

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) methods have been advantageous for quantifying cell-to-cell variation by profiling the transcriptomes of individual cells. For scRNA-seq data, variability in gene expression reflects the degree of variation in gene expression from one cell to another. Analyses that focus on cell-cell variability therefore are useful for going beyond changes based on average expression and, instead, identifying genes with homogeneous expression versus those that vary widely from cell to cell. RESULTS: We present a novel statistical framework, scShapes, for identifying differential distributions in single-cell RNA-sequencing data using generalized linear models. Most approaches for differential gene expression detect shifts in the mean value. However, as single-cell data are driven by overdispersion and dropouts, moving beyond means and using distributions that can handle excess zeros is critical. scShapes quantifies gene-specific cell-to-cell variability by testing for differences in the expression distribution while flexibly adjusting for covariates if required. We demonstrate that scShapes identifies subtle variations that are independent of altered mean expression and detects biologically relevant genes that were not discovered through standard approaches. CONCLUSIONS: This analysis also draws attention to genes that switch distribution shapes from a unimodal distribution to a zero-inflated distribution and raises open questions about the plausible biological mechanisms that may give rise to this, such as transcriptional bursting. Overall, the results from scShapes help to expand our understanding of the role that gene expression plays in the transcriptional regulation of a specific perturbation or cellular phenotype. Our framework scShapes is incorporated into a Bioconductor R package (https://www.bioconductor.org/packages/release/bioc/html/scShapes.html).


Subject(s)
Software , Transcriptome , Sequence Analysis, RNA/methods , Gene Expression Regulation , RNA/genetics , Single-Cell Analysis/methods , Gene Expression Profiling/methods
2.
Methods Mol Biol ; 2630: 103-115, 2023.
Article in English | MEDLINE | ID: mdl-36689179

ABSTRACT

Next-generation sequencing (NGS) of small RNA (sRNA) cDNA libraries permits the identification and characterization of sRNA species de novo. However, the method through which these libraries are constructed can often introduce artifacts such as over- or underrepresentation of specific sequences or adapter oligonucleotides due to sequence biases held by the enzymes used. In this chapter we describe a protocol for sRNA library construction making use of high-definition (HD) adapters for the Illumina sequencing platform, which reduce ligation bias. This protocol leads to drastically reduced direct 5'/3' adapter ligation products and can be used for the synthesis of sRNA libraries from total RNA or sRNA of various plant, animal, and fungal samples. This protocol also includes a method for total RNA extraction from plant leaf and cultured cells or body fluids.


Subject(s)
RNA, Small Untranslated , RNA , Animals , Gene Library , Oligonucleotides , High-Throughput Nucleotide Sequencing/methods , Cloning, Molecular , Sequence Analysis, RNA/methods , RNA, Small Untranslated/genetics
3.
Methods Mol Biol ; 2630: 179-213, 2023.
Article in English | MEDLINE | ID: mdl-36689184

ABSTRACT

The current versions of the microRNA databases MiRgeneDB, miRBase, and PmiREN contain annotations for a total of 358 different species. Public repositories, however, host small RNA sequencing data for over 800 species. This discrepancy implies that microRNA research is also very active in species that neither have an available high-quality genome assembly nor annotations for microRNAs or other types of noncoding genes. These cases are particularly challenging to analyze because reference sequences need to be collected from different sources and processed and formatted appropriately so that the dedicated small RNA analysis tools can make use of them. In this protocol we describe how small RNA sequencing data can be easily analyzed by means of a dockerized version of the well-established sRNAtoolbox/sRNAbench small RNA tools. We outline the analysis of two publicly available datasets to demonstrate basic aspects like the preparation of the local database, expression profiling, or differential expression analysis as well as more advanced features such as quantification of exogenous RNA content and data analysis in non-model species.


Subject(s)
MicroRNAs , Software , MicroRNAs/genetics , Sequence Analysis, RNA , Databases, Nucleic Acid , Base Sequence , High-Throughput Nucleotide Sequencing/methods
4.
Sci Rep ; 13(1): 1197, 2023 Jan 21.
Article in English | MEDLINE | ID: mdl-36681709

ABSTRACT

Effective dimension reduction is essential for single cell RNA-seq (scRNAseq) analysis. Principal component analysis (PCA) is widely used, but requires continuous, normally-distributed data; therefore, it is often coupled with log-transformation in scRNAseq applications, which can distort the data and obscure meaningful variation. We describe correspondence analysis (CA), a count-based alternative to PCA. CA is based on decomposition of a chi-squared residual matrix, avoiding distortive log-transformation. To address overdispersion and high sparsity in scRNAseq data, we propose five adaptations of CA, which are fast, scalable, and outperform standard CA and glmPCA, to compute cell embeddings with more performant or comparable clustering accuracy in 8 out of 9 datasets. In particular, we find that CA with Freeman-Tukey residuals performs especially well across diverse datasets. Other advantages of the CA framework include visualization of associations between genes and cell populations in a "CA biplot," and extension to multi-table analysis; we introduce corralm for integrative multi-table dimension reduction of scRNAseq data. We implement CA for scRNAseq data in corral, an R/Bioconductor package which interfaces directly with single cell classes in Bioconductor. Switching from PCA to CA is achieved through a simple pipeline substitution and improves dimension reduction of scRNAseq datasets.


Subject(s)
Single-Cell Analysis , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Principal Component Analysis , Cluster Analysis
5.
Sci Rep ; 13(1): 1223, 2023 Jan 21.
Article in English | MEDLINE | ID: mdl-36681719

ABSTRACT

We report the generation and analysis of single-cell RNA-Seq data (> 38,000 cells) from mouse native retinae and induced pluripotent stem cell (iPSC)-derived retinal organoids at four matched stages of development spanning the emergence of the major retinal cell types. We combine information from temporal sampling, visualization of 3D UMAP manifolds, pseudo-time and RNA velocity analyses, to show that iPSC-derived 3D retinal organoids broadly recapitulate the native developmental trajectories. However, we observe relaxation of spatial and temporal transcriptome control, premature emergence and dominance of photoreceptor precursor cells, and susceptibility of dynamically regulated pathways and transcription factors to culture conditions in retinal organoids. We demonstrate that genes causing human retinopathies are enriched in cell-type specifying genes and identify a subset of disease-causing genes with expression profiles that are highly conserved between human retinae and murine retinal organoids. This study provides a resource to the community that will be useful to assess and further improve protocols for ex vivo recapitulation and study of retinal development.


Subject(s)
Induced Pluripotent Stem Cells , Mice , Humans , Animals , Transcriptome , Retina/metabolism , Photoreceptor Cells , Organoids/metabolism , Sequence Analysis, RNA , Cell Differentiation/genetics
6.
Biomolecules ; 13(1)2022 Dec 27.
Article in English | MEDLINE | ID: mdl-36671432

ABSTRACT

The ovary is a female reproductive organ that plays a key role in fertility and the maintenance of endocrine homeostasis, which is of great importance to women's health. It is characterized by a high heterogeneity, with different cellular subpopulations primarily containing oocytes, granulosa cells, stromal cells, endothelial cells, vascular smooth muscle cells, and diverse immune cell types. Each has unique and important functions. From the fetal period to old age, the ovary experiences continuous structural and functional changes, with the gene expression of each cell type undergoing dramatic changes. In addition, ovarian development strongly relies on the communication between germ and somatic cells. Compared to traditional bulk RNA sequencing techniques, the single-cell RNA sequencing (scRNA-seq) approach has substantial advantages in analyzing individual cells within an ever-changing and complicated tissue, classifying them into cell types, characterizing single cells, delineating the cellular developmental trajectory, and studying cell-to-cell interactions. In this review, we present single-cell transcriptome mapping of the ovary, summarize the characteristics of the important constituent cells of the ovary and the critical cellular developmental processes, and describe key signaling pathways for cell-to-cell communication in the ovary, as revealed by scRNA-seq. This review will undoubtedly improve our understanding of the characteristics of ovarian cells and development, thus enabling the identification of novel therapeutic targets for ovarian-related diseases.


Subject(s)
Endothelial Cells , Oocytes , Female , Animals , Oocytes/metabolism , Ovary/metabolism , Granulosa Cells/metabolism , Sequence Analysis, RNA
7.
Biomolecules ; 13(1)2023 Jan 12.
Article in English | MEDLINE | ID: mdl-36671541

ABSTRACT

Development from single cells to multicellular tissues and organs involves more than just the exact replication of cells, which is known as differentiation. The primary focus of research into the mechanism of differentiation has been differences in gene expression profiles between individual cells. However, it has predominantly been conducted at low throughput and bulk levels, challenging the efforts to understand molecular mechanisms of differentiation during the developmental process in animals and humans. During the last decades, rapid methodological advancements in genomics facilitated the ability to study developmental processes at a genome-wide level and finer resolution. Particularly, sequencing transcriptomes at single-cell resolution, enabled by single-cell RNA-sequencing (scRNA-seq), was a breath-taking innovation, allowing scientists to gain a better understanding of differentiation and cell lineage during the developmental process. However, single-cell isolation during scRNA-seq results in the loss of the spatial information of individual cells and consequently limits our understanding of the specific functions of the cells performed by different spatial regions of tissues or organs. This greatly encourages the emergence of the spatial transcriptomic discipline and tools. Here, we summarize the recent application of scRNA-seq and spatial transcriptomic tools for developmental biology. We also discuss the limitations of current spatial transcriptomic tools and approaches, as well as possible solutions and future prospects.


Subject(s)
Single-Cell Analysis , Transcriptome , Humans , Animals , Transcriptome/genetics , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Cell Differentiation/genetics , Sequence Analysis, RNA/methods , Developmental Biology
8.
Biomolecules ; 13(1)2023 Jan 13.
Article in English | MEDLINE | ID: mdl-36671556

ABSTRACT

The etiology of osteonecrosis of the femoral head (ONFH) is not yet fully understood. However, ONFH is a common disease with high morbidity, and approximately one-third of cases are caused by glucocorticoids. We performed single-cell RNA sequencing of bone marrow to explore the effect of glucocorticoid on ONFH. Bone marrow samples of the proximal femur were extracted from four participants during total hip arthroplasty, including two participants diagnosed with ONFH for systemic lupus erythematosus (SLE) treated with glucocorticoids (the case group) and two participants with femoral neck fracture (the control group). Unbiased transcriptome-wide single-cell RNA sequencing analysis and computational analyses were performed. Seventeen molecularly defined cell types were identified in the studied samples, including significantly dysregulated neutrophils and B cells in the case group. Additionally, fatty acid synthesis and aerobic oxidation were repressed, while fatty acid beta-oxidation was enhanced. Our results also preliminarily clarified the roles of the inflammatory response, substance metabolism, vascular injury, angiogenesis, cell proliferation, apoptosis, and dysregulated coagulation and fibrinolysis in glucocorticoid-induced ONFH. Notably, we list the pathways that were markedly altered in glucocorticoid-induced ONFH with SLE compared with femoral head fracture, as well as their common genes, which are potential early therapeutic targets. Our results provide new insights into the mechanism of glucocorticoid-induced ONFH and present potential clues for effective and functional manipulation of human glucocorticoid-induced ONFH, which could improve patient outcomes.


Subject(s)
Femur Head Necrosis , Lupus Erythematosus, Systemic , Humans , Glucocorticoids/metabolism , Femur Head Necrosis/chemically induced , Femur Head Necrosis/genetics , Femur Head Necrosis/metabolism , Femur Head/metabolism , Lupus Erythematosus, Systemic/metabolism , Sequence Analysis, RNA , Fatty Acids/metabolism
9.
Cells ; 12(2)2023 Jan 05.
Article in English | MEDLINE | ID: mdl-36672162

ABSTRACT

Colorectal cancer has proven to be difficult to treat as it is the second leading cause of cancer death for both men and women worldwide. Recent work has shown the importance of microRNA (miRNA) in the progression and metastasis of colorectal cancer. Here, we develop a metric based on miRNA-gene target interactions, previously validated to be associated with colorectal cancer. We use this metric with a regularized Cox model to produce a small set of top-performing genes related to colon cancer. We show that using the miRNA metric and a Cox model led to a meaningful improvement in colon cancer survival prediction and correct patient risk stratification. We show that our approach outperforms existing methods and that the top genes identified by our process are implicated in NOTCH3 signaling and general metabolism pathways, which are essential to colon cancer progression.


Subject(s)
Colonic Neoplasms , MicroRNAs , Male , Humans , Female , MicroRNAs/genetics , MicroRNAs/metabolism , Colonic Neoplasms/pathology , Signal Transduction/genetics , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Sequence Analysis, RNA
10.
Genes (Basel) ; 14(1)2023 Jan 13.
Article in English | MEDLINE | ID: mdl-36672949

ABSTRACT

Fuzzless Gossypium hirsutum mutants are ideal materials for investigating cotton fiber initiation and development. In this study, we used the fuzzless G. hirsutum mutant Xinluzao 50 FLM as the research material and combined it with other fuzzless materials for verification by RNA sequencing to explore the gene expression patterns and differences between genes in upland cotton during the fuzz period. A gene ontology (GO) enrichment analysis showed that differentially expressed genes (DEGs) were mainly enriched in the metabolic process, microtubule binding, and other pathways. A weighted gene co-expression network analysis (WGCNA) showed that two modules of Xinluzao 50 and Xinluzao 50 FLM and four modules of CSS386 and Sicala V-2 were highly correlated with fuzz. We selected the hub gene with the highest KME value among the six modules and constructed an interaction network. In addition, we selected some genes with high KME values from the six modules that were highly associated with fuzz in the four materials and found 19 common differential genes produced by the four materials. These 19 genes are likely involved in the formation of fuzz in upland cotton. Several hub genes belong to the arabinogalactan protein and GDSL lipase, which play important roles in fiber development. According to the differences in expression level, 4 genes were selected from the 19 genes and tested for their expression level in some fuzzless materials. The modules, hub genes, and common genes identified in this study can provide new insights into the formation of fiber and fuzz, and provide a reference for molecular design breeding for the genetic improvement of cotton fiber.


Subject(s)
Cotton Fiber , Gossypium , Gene Expression Profiling , Genes, Plant , Sequence Analysis, RNA
11.
Sci Adv ; 9(3): eabq5072, 2023 Jan 20.
Article in English | MEDLINE | ID: mdl-36662851

ABSTRACT

Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discovery and quantification of transcript isoforms from error-prone long reads. ESPRESSO jointly considers alignments of all long reads aligned to a gene and uses error profiles of individual reads to improve the identification of splice junctions and the discovery of their corresponding transcript isoforms. On both a synthetic spike-in RNA sample and human RNA samples, ESPRESSO outperforms multiple contemporary tools in not only transcript isoform discovery but also transcript isoform quantification. In total, we generated and analyzed ~1.1 billion nanopore RNA-seq reads covering 30 human tissue samples and three human cell lines. ESPRESSO and its companion dataset provide a useful resource for studying the RNA repertoire of eukaryotic transcriptomes.


Subject(s)
RNA , Transcriptome , Humans , RNA/genetics , RNA-Seq , Sequence Analysis, RNA , Protein Isoforms/genetics , Gene Expression Profiling
13.
Sci Rep ; 13(1): 807, 2023 Jan 16.
Article in English | MEDLINE | ID: mdl-36646776

ABSTRACT

Autism spectrum disorder (ASD) is a neurodevelopmental condition with onset in early childhood, still diagnosed only through clinical observation due to the lack of laboratory biomarkers. Early detection strategies would be especially useful in screening high-risk newborn siblings of children already diagnosed with ASD. We performed RNA sequencing on peripheral blood, comparing 27 pairs of ASD children vs their sex- and age-matched unaffected siblings. Differential gene expression profiling, performed applying an unpaired model found two immune genes, EGR1 and IGKV3D-15, significantly upregulated in ASD patients (both p adj = 0.037). Weighted gene correlation network analysis identified 18 co-expressed modules. One of these modules was downregulated among autistic individuals (p = 0.035) and a ROC curve using its eigengene values yielded an AUC of 0.62. Genes in this module are primarily involved in transcriptional control and its hub gene, RACK1, encodes for a signaling protein critical for neurodevelopment and innate immunity, whose expression is influenced by various hormones and known "endocrine disruptors". These results indicate that transcriptomic biomarkers can contribute to the sensitivity of an intra-familial multimarker panel for ASD and provide further evidence that neurodevelopment, innate immunity and transcriptional regulation are key to ASD pathogenesis.


Subject(s)
Autism Spectrum Disorder , Autistic Disorder , Child , Infant, Newborn , Humans , Child, Preschool , Autism Spectrum Disorder/diagnosis , Siblings , Autistic Disorder/genetics , Biomarkers , Sequence Analysis, RNA
14.
BMC Genomics ; 24(1): 21, 2023 Jan 14.
Article in English | MEDLINE | ID: mdl-36641451

ABSTRACT

BACKGROUND: Salt-alkali stress represents one of the most stressful events with deleterious consequences for plant growth and crop productivity. Despite studies focusing on the effects of salt-alkali stress on morphology and physiology, its molecular mechanisms remain unclear. Here, we employed RNA-sequencing (RNA-seq) to understand how Na2CO3 stress inhibits rice seedling growth. RESULTS: Na2CO3 stress significantly inhibited the growth of rice seedlings. Through RNA-seq, many differentially expressed genes (DEGs) were shown to be potentially involved in the rice seedling response to salt-alkali stress. After 1-day and 5-day treatments, RNA-seq identified 1780 and 2315 DEGs in the Na2CO3-treated versus -untreated rice seedling shoots, respectively. According to the gene ontology enrichment and the Kyoto Encylopedia of Genes and Genomes annotation of DEGs, the growth-inhibition processes associated with salt-alkali stress involve a myriad of molecular events, including biosynthesis and metabolism, enzyme activity, and binding, etc. CONCLUSION: Collectively, the transcriptome analyses in the present work revealed several potential key regulators of plant response to salt-alkali stress, and might pave a way to improve salt-alkali stress tolerance in rice.


Subject(s)
Oryza , Seedlings , Oryza/metabolism , Alkalies/pharmacology , Salt Stress/genetics , Gene Expression Profiling , Sequence Analysis, RNA , Gene Expression Regulation, Plant , Stress, Physiological/genetics , Transcriptome
15.
Nat Commun ; 14(1): 223, 2023 Jan 14.
Article in English | MEDLINE | ID: mdl-36641532

ABSTRACT

Consistent annotation transfer from reference dataset to query dataset is fundamental to the development and reproducibility of single-cell research. Compared with traditional annotation methods, deep learning based methods are faster and more automated. A series of useful single cell analysis tools based on autoencoder architecture have been developed but these struggle to strike a balance between depth and interpretability. Here, we present TOSICA, a multi-head self-attention deep learning model based on Transformer that enables interpretable cell type annotation using biologically understandable entities, such as pathways or regulons. We show that TOSICA achieves fast and accurate one-stop annotation and batch-insensitive integration while providing biologically interpretable insights for understanding cellular behavior during development and disease progressions. We demonstrate TOSICA's advantages by applying it to scRNA-seq data of tumor-infiltrating immune cells, and CD14+ monocytes in COVID-19 to reveal rare cell types, heterogeneity and dynamic trajectories associated with disease progression and severity.


Subject(s)
COVID-19 , Humans , Reproducibility of Results , Single-Cell Analysis/methods , Disease Progression , Sequence Analysis, RNA/methods
17.
Acta Neuropathol Commun ; 11(1): 6, 2023 Jan 11.
Article in English | MEDLINE | ID: mdl-36631900

ABSTRACT

The most common malignant brain tumour in children, medulloblastoma (MB), is subdivided into four clinically relevant molecular subgroups, although targeted therapy options informed by understanding of different cellular features are lacking. Here, by comparing the most aggressive subgroup (Group 3) with the intermediate (SHH) subgroup, we identify crucial differences in tumour heterogeneity, including unique metabolism-driven subpopulations in Group 3 and matrix-producing subpopulations in SHH. To analyse tumour heterogeneity, we profiled individual tumour nodules at the cellular level in 3D MB hydrogel models, which recapitulate subgroup specific phenotypes, by single cell RNA sequencing (scRNAseq) and 3D OrbiTrap Secondary Ion Mass Spectrometry (3D OrbiSIMS) imaging. In addition to identifying known metabolites characteristic of MB, we observed intra- and internodular heterogeneity and identified subgroup-specific tumour subpopulations. We showed that extracellular matrix factors and adhesion pathways defined unique SHH subpopulations, and made up a distinct shell-like structure of sulphur-containing species, comprising a combination of small leucine-rich proteoglycans (SLRPs) including the collagen organiser lumican. In contrast, the Group 3 tumour model was characterized by multiple subpopulations with greatly enhanced oxidative phosphorylation and tricarboxylic acid (TCA) cycle activity. Extensive TCA cycle metabolite measurements revealed very high levels of succinate and fumarate with malate levels almost undetectable particularly in Group 3 tumour models. In patients, high fumarate levels (NMR spectroscopy) alongside activated stress response pathways and high Nuclear Factor Erythroid 2-Related Factor 2 (NRF2; gene expression analyses) were associated with poorer survival. Based on these findings we predicted and confirmed that NRF2 inhibition increased sensitivity to vincristine in a long-term 3D drug treatment assay of Group 3 MB. Thus, by combining scRNAseq and 3D OrbiSIMS in a relevant model system we were able to define MB subgroup heterogeneity at the single cell level and elucidate new druggable biomarkers for aggressive Group 3 and low-risk SHH MB.


Subject(s)
Cerebellar Neoplasms , Medulloblastoma , Humans , Medulloblastoma/metabolism , Hydrogels/therapeutic use , NF-E2-Related Factor 2 , Cerebellar Neoplasms/metabolism , Hedgehog Proteins/metabolism , Biomarkers , Sequence Analysis, RNA
18.
Brief Bioinform ; 24(1)2023 Jan 19.
Article in English | MEDLINE | ID: mdl-36631401

ABSTRACT

The advances in single-cell ribonucleic acid sequencing (scRNA-seq) allow researchers to explore cellular heterogeneity and human diseases at cell resolution. Cell clustering is a prerequisite in scRNA-seq analysis since it can recognize cell identities. However, the high dimensionality, noises and significant sparsity of scRNA-seq data have made it a big challenge. Although many methods have emerged, they still fail to fully explore the intrinsic properties of cells and the relationship among cells, which seriously affects the downstream clustering performance. Here, we propose a new deep contrastive clustering algorithm called scDCCA. It integrates a denoising auto-encoder and a dual contrastive learning module into a deep clustering framework to extract valuable features and realize cell clustering. Specifically, to better characterize and learn data representations robustly, scDCCA utilizes a denoising Zero-Inflated Negative Binomial model-based auto-encoder to extract low-dimensional features. Meanwhile, scDCCA incorporates a dual contrastive learning module to capture the pairwise proximity of cells. By increasing the similarities between positive pairs and the differences between negative ones, the contrasts at both the instance and the cluster level help the model learn more discriminative features and achieve better cell segregation. Furthermore, scDCCA joins feature learning with clustering, which realizes representation learning and cell clustering in an end-to-end manner. Experimental results of 14 real datasets validate that scDCCA outperforms eight state-of-the-art methods in terms of accuracy, generalizability, scalability and efficiency. Cell visualization and biological analysis demonstrate that scDCCA significantly improves clustering and facilitates downstream analysis for scRNA-seq data. The code is available at https://github.com/WJ319/scDCCA.


Subject(s)
Gene Expression Profiling , Humans , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Algorithms , Cluster Analysis
19.
Brief Bioinform ; 24(1)2023 Jan 19.
Article in English | MEDLINE | ID: mdl-36631398

ABSTRACT

Computational cell type deconvolution on bulk transcriptomics data can reveal cell type proportion heterogeneity across samples. One critical factor for accurate deconvolution is the reference signature matrix for different cell types. Compared with inferring reference signature matrices from cell lines, rapidly accumulating single-cell RNA-sequencing (scRNA-seq) data provide a richer and less biased resource. However, deriving cell type signature from scRNA-seq data is challenging due to high biological and technical noises. In this article, we introduce a novel Bayesian framework, tranSig, to improve signature matrix inference from scRNA-seq by leveraging shared cell type-specific expression patterns across different tissues and studies. Our simulations show that tranSig is robust to the number of signature genes and tissues specified in the model. Applications of tranSig to bulk RNA sequencing data from peripheral blood, bronchoalveolar lavage and aorta demonstrate its accuracy and power to characterize biological heterogeneity across groups. In summary, tranSig offers an accurate and robust approach to defining gene expression signatures of different cell types, facilitating improved in silico cell type deconvolutions.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Bayes Theorem , Transcriptome , Sequence Analysis, RNA
20.
Brief Bioinform ; 24(1)2023 Jan 19.
Article in English | MEDLINE | ID: mdl-36627114

ABSTRACT

Dimension reduction (DR) plays an important role in single-cell RNA sequencing (scRNA-seq), such as data interpretation, visualization and other downstream analysis. A desired DR method should be applicable to various application scenarios, including identifying cell types, preserving the inherent structure of data and handling with batch effects. However, most of the existing DR methods fail to accommodate these requirements simultaneously, especially removing batch effects. In this paper, we develop a novel structure-preserved dimension reduction (SPDR) method using intra- and inter-batch triplets sampling. The constructed triplets jointly consider each anchor's mutual nearest neighbors from inter-batch, k-nearest neighbors from intra-batch and randomly selected cells from the whole data, which capture higher order structure information and meanwhile account for batch information of the data. Then we minimize a robust loss function for the chosen triplets to obtain a structure-preserved and batch-corrected low-dimensional representation. Comprehensive evaluations show that SPDR outperforms other competing DR methods, such as INSCT, IVIS, Trimap, Scanorama, scVI and UMAP, in removing batch effects, preserving biological variation, facilitating visualization and improving clustering accuracy. Besides, the two-dimensional (2D) embedding of SPDR presents a clear and authentic expression pattern, and can guide researchers to determine how many cell types should be identified. Furthermore, SPDR is robust to complex data characteristics (such as down-sampling, duplicates and outliers) and varying hyperparameter settings. We believe that SPDR will be a valuable tool for characterizing complex cellular heterogeneity.


Subject(s)
Algorithms , Transcriptome , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Cluster Analysis , Sequence Analysis, RNA/methods
SELECTION OF CITATIONS
SEARCH DETAIL