Search | VHL Regional Portal

1.

Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios.

Duo, Hongrui; Li, Yinghong; Lan, Yang; Tao, Jingxin; Yang, Qingxia; Xiao, Yingxue; Sun, Jing; Li, Lei; Nie, Xiner; Zhang, Xiaoxi; Liang, Guizhao; Liu, Mingwei; Hao, Youjin; Li, Bo.

Genome Biol ; 25(1): 145, 2024 Jun 03.

Article in English | MEDLINE | ID: mdl-38831386

ABSTRACT

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS: We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS: No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.

Subject(s)

Gene Expression Profiling , Single-Cell Analysis , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Humans , Software , Computer Simulation , Transcriptome , Computational Biology/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods , RNA-Seq/standards

2.

scCRT: a contrastive-based dimensionality reduction model for scRNA-seq trajectory inference.

Shi, Yuchen; Wan, Jian; Zhang, Xin; Liang, Tingting; Yin, Yuyu.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38701412

ABSTRACT

Trajectory inference is a crucial task in single-cell RNA-sequencing downstream analysis, which can reveal the dynamic processes of biological development, including cell differentiation. Dimensionality reduction is an important step in the trajectory inference process. However, most existing trajectory methods rely on cell features derived from traditional dimensionality reduction methods, such as principal component analysis and uniform manifold approximation and projection. These methods are not specifically designed for trajectory inference and fail to fully leverage prior information from upstream analysis, limiting their performance. Here, we introduce scCRT, a novel dimensionality reduction model for trajectory inference. In order to utilize prior information to learn accurate cells representation, scCRT integrates two feature learning components: a cell-level pairwise module and a cluster-level contrastive module. The cell-level module focuses on learning accurate cell representations in a reduced-dimensionality space while maintaining the cell-cell positional relationships in the original space. The cluster-level contrastive module uses prior cell state information to aggregate similar cells, preventing excessive dispersion in the low-dimensional space. Experimental findings from 54 real and 81 synthetic datasets, totaling 135 datasets, highlighted the superior performance of scCRT compared with commonly used trajectory inference methods. Additionally, an ablation study revealed that both cell-level and cluster-level modules enhance the model's ability to learn accurate cell features, facilitating cell lineage inference. The source code of scCRT is available at https://github.com/yuchen21-web/scCRT-for-scRNA-seq.

Subject(s)

Algorithms , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Computational Biology/methods , Software , Sequence Analysis, RNA/methods , Animals , Single-Cell Gene Expression Analysis

3.

scEWE: high-order element-wise weighted ensemble clustering for heterogeneity analysis of single-cell RNA-sequencing data.

Huang, Yixiang; Jiang, Hao; Ching, Wai-Ki.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38701413

ABSTRACT

With the emergence of large amount of single-cell RNA sequencing (scRNA-seq) data, the exploration of computational methods has become critical in revealing biological mechanisms. Clustering is a representative for deciphering cellular heterogeneity embedded in scRNA-seq data. However, due to the diversity of datasets, none of the existing single-cell clustering methods shows overwhelming performance on all datasets. Weighted ensemble methods are proposed to integrate multiple results to improve heterogeneity analysis performance. These methods are usually weighted by considering the reliability of the base clustering results, ignoring the performance difference of the same base clustering on different cells. In this paper, we propose a high-order element-wise weighting strategy based self-representative ensemble learning framework: scEWE. By assigning different base clustering weights to individual cells, we construct and optimize the consensus matrix in a careful and exquisite way. In addition, we extracted the high-order information between cells, which enhanced the ability to represent the similarity relationship between cells. scEWE is experimentally shown to significantly outperform the state-of-the-art methods, which strongly demonstrates the effectiveness of the method and supports the potential applications in complex single-cell data analytical problems.

Subject(s)

Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Sequence Analysis, RNA/methods , Algorithms , Computational Biology/methods , Humans , RNA-Seq/methods

4.

Analysis of Bladder Cancer Staging Prediction Using Deep Residual Neural Network, Radiomics, and RNA-Seq from High-Definition CT Images.

Zhou, Yao; Zheng, Xingju; Sun, Zhucheng; Wang, Bo.

Genet Res (Camb) ; 2024: 4285171, 2024.

Article in English | MEDLINE | ID: mdl-38715622

ABSTRACT

Bladder cancer has recently seen an alarming increase in global diagnoses, ascending as a predominant cause of cancer-related mortalities. Given this pressing scenario, there is a burgeoning need to identify effective biomarkers for both the diagnosis and therapeutic guidance of bladder cancer. This study focuses on evaluating the potential of high-definition computed tomography (CT) imagery coupled with RNA-sequencing analysis to accurately predict bladder tumor stages, utilizing deep residual networks. Data for this study, including CT images and RNA-Seq datasets for 82 high-grade bladder cancer patients, were sourced from the TCIA and TCGA databases. We employed Cox and lasso regression analyses to determine radiomics and gene signatures, leading to the identification of a three-factor radiomics signature and a four-gene signature in our bladder cancer cohort. ROC curve analyses underscored the strong predictive capacities of both these signatures. Furthermore, we formulated a nomogram integrating clinical features, radiomics, and gene signatures. This nomogram's AUC scores stood at 0.870, 0.873, and 0.971 for 1-year, 3-year, and 5-year predictions, respectively. Our model, leveraging radiomics and gene signatures, presents significant promise for enhancing diagnostic precision in bladder cancer prognosis, advocating for its clinical adoption.

Subject(s)

Neoplasm Staging , Neural Networks, Computer , Tomography, X-Ray Computed , Urinary Bladder Neoplasms , Urinary Bladder Neoplasms/genetics , Urinary Bladder Neoplasms/diagnostic imaging , Urinary Bladder Neoplasms/pathology , Humans , Tomography, X-Ray Computed/methods , Male , Female , RNA-Seq/methods , Aged , Nomograms , Middle Aged , Biomarkers, Tumor/genetics , ROC Curve , Prognosis , Transcriptome , Radiomics

5.

A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies.

Van, Richard; Alvarez, Daniel; Mize, Travis; Gannavarapu, Sravani; Chintham Reddy, Lohitha; Nasoz, Fatma; Han, Mira V.

BMC Bioinformatics ; 25(1): 181, 2024 May 08.

Article in English | MEDLINE | ID: mdl-38720247

ABSTRACT

BACKGROUND: RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins. RESULTS: We aimed to investigate the impact of data preprocessing steps-focusing on normalization, batch effect correction, and data scaling-through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer. CONCLUSION: By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.

Subject(s)

Machine Learning , Neoplasms , RNA-Seq , Humans , RNA-Seq/methods , Neoplasms/genetics , Transcriptome/genetics , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Computational Biology/methods

6.

Single-cell RNA sequencing data imputation using bi-level feature propagation.

Lee, Junseok; Yun, Sukwon; Kim, Yeongmin; Chen, Tianlong; Kellis, Manolis; Park, Chanyoung.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38706317

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) enables the exploration of cellular heterogeneity by analyzing gene expression profiles in complex tissues. However, scRNA-seq data often suffer from technical noise, dropout events and sparsity, hindering downstream analyses. Although existing works attempt to mitigate these issues by utilizing graph structures for data denoising, they involve the risk of propagating noise and fall short of fully leveraging the inherent data relationships, relying mainly on one of cell-cell or gene-gene associations and graphs constructed by initial noisy data. To this end, this study presents single-cell bilevel feature propagation (scBFP), two-step graph-based feature propagation method. It initially imputes zero values using non-zero values, ensuring that the imputation process does not affect the non-zero values due to dropout. Subsequently, it denoises the entire dataset by leveraging gene-gene and cell-cell relationships in the respective steps. Extensive experimental results on scRNA-seq data demonstrate the effectiveness of scBFP in various downstream tasks, uncovering valuable biological insights.

Subject(s)

Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Sequence Analysis, RNA/methods , Humans , Algorithms , Gene Expression Profiling/methods , Computational Biology/methods , RNA-Seq/methods

7.

Benchmarking mapping algorithms for cell-type annotating in mouse brain by integrating single-nucleus RNA-seq and Stereo-seq data.

Tao, Quyuan; Xu, Yiheng; He, Youzhe; Luo, Ting; Li, Xiaoming; Han, Lei.

Brief Bioinform ; 25(4)2024 May 23.

Article in English | MEDLINE | ID: mdl-38796691

ABSTRACT

Limited gene capture efficiency and spot size of spatial transcriptome (ST) data pose significant challenges in cell-type characterization. The heterogeneity and complexity of cell composition in the mammalian brain make it more challenging to accurately annotate ST data from brain. Many algorithms attempt to characterize subtypes of neuron by integrating ST data with single-nucleus RNA sequencing (snRNA-seq) or single-cell RNA sequencing. However, assessing the accuracy of these algorithms on Stereo-seq ST data remains unresolved. Here, we benchmarked 9 mapping algorithms using 10 ST datasets from four mouse brain regions in two different resolutions and 24 pseudo-ST datasets from snRNA-seq. Both actual ST data and pseudo-ST data were mapped using snRNA-seq datasets from the corresponding brain regions as reference data. After comparing the performance across different areas and resolutions of the mouse brain, we have reached the conclusion that both robust cell-type decomposition and SpatialDWLS demonstrated superior robustness and accuracy in cell-type annotation. Testing with publicly available snRNA-seq data from another sequencing platform in the cortex region further validated our conclusions. Altogether, we developed a workflow for assessing suitability of mapping algorithm that fits for ST datasets, which can improve the efficiency and accuracy of spatial data annotation.

Subject(s)

Algorithms , Benchmarking , Brain , Single-Cell Analysis , Animals , Mice , Brain/metabolism , Single-Cell Analysis/methods , RNA-Seq/methods , Transcriptome , Sequence Analysis, RNA/methods , Neurons/metabolism , Gene Expression Profiling/methods

8.

GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks.

Zinati, Yazdan; Takiddeen, Abdulrahman; Emad, Amin.

Nat Commun ; 15(1): 4055, 2024 May 14.

Article in English | MEDLINE | ID: mdl-38744843

ABSTRACT

We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.

Subject(s)

Algorithms , Computer Simulation , Gene Regulatory Networks , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Computational Biology/methods , Benchmarking , Sequence Analysis, RNA/methods , Single-Cell Gene Expression Analysis

9.

Quantifying 3'UTR length from scRNA-seq data reveals changes independent of gene expression.

Fansler, Mervin M; Mitschka, Sibylle; Mayr, Christine.

Nat Commun ; 15(1): 4050, 2024 May 14.

Article in English | MEDLINE | ID: mdl-38744866

ABSTRACT

Although more than half of all genes generate transcripts that differ in 3'UTR length, current analysis pipelines only quantify the amount but not the length of mRNA transcripts. 3'UTR length is determined by 3' end cleavage sites (CS). We map CS in more than 200 primary human and mouse cell types and increase CS annotations relative to the GENCODE database by 40%. Approximately half of all CS are used in few cell types, revealing that most genes only have one or two major 3' ends. We incorporate the CS annotations into a computational pipeline, called scUTRquant, for rapid, accurate, and simultaneous quantification of gene and 3'UTR isoform expression from single-cell RNA sequencing (scRNA-seq) data. When applying scUTRquant to data from 474 cell types and 2134 perturbations, we discover extensive 3'UTR length changes across cell types that are as widespread and coordinately regulated as gene expression changes but affect mostly different genes. Our data indicate that mRNA abundance and mRNA length are two largely independent axes of gene regulation that together determine the amount and spatial organization of protein synthesis.

Subject(s)

3' Untranslated Regions , RNA, Messenger , Single-Cell Analysis , 3' Untranslated Regions/genetics , Humans , Animals , Mice , RNA, Messenger/genetics , RNA, Messenger/metabolism , Single-Cell Analysis/methods , Sequence Analysis, RNA/methods , Gene Expression Regulation , RNA-Seq/methods , Computational Biology/methods , Gene Expression Profiling/methods , Single-Cell Gene Expression Analysis

10.

Multi-omics integration of scRNA-seq time series data predicts new intervention points for Parkinson's disease.

Mihajlovic, Katarina; Ceddia, Gaia; Malod-Dognin, Noël; Novak, Gabriela; Kyriakis, Dimitrios; Skupin, Alexander; Przulj, Natasa.

Sci Rep ; 14(1): 10983, 2024 05 14.

Article in English | MEDLINE | ID: mdl-38744869

ABSTRACT

Parkinson's disease (PD) is a complex neurodegenerative disorder without a cure. The onset of PD symptoms corresponds to 50% loss of midbrain dopaminergic (mDA) neurons, limiting early-stage understanding of PD. To shed light on early PD development, we study time series scRNA-seq datasets of mDA neurons obtained from patient-derived induced pluripotent stem cell differentiation. We develop a new data integration method based on Non-negative Matrix Tri-Factorization that integrates these datasets with molecular interaction networks, producing condition-specific "gene embeddings". By mining these embeddings, we predict 193 PD-related genes that are largely supported (49.7%) in the literature and are specific to the investigated PINK1 mutation. Enrichment analysis in Kyoto Encyclopedia of Genes and Genomes pathways highlights 10 PD-related molecular mechanisms perturbed during early PD development. Finally, investigating the top 20 prioritized genes reveals 12 previously unrecognized genes associated with PD that represent interesting drug targets.

Subject(s)

Dopaminergic Neurons , Parkinson Disease , Parkinson Disease/genetics , Parkinson Disease/pathology , Humans , Dopaminergic Neurons/metabolism , Dopaminergic Neurons/pathology , RNA-Seq/methods , Induced Pluripotent Stem Cells/metabolism , Mesencephalon/metabolism , Mesencephalon/pathology , Gene Regulatory Networks , Mutation , Cell Differentiation/genetics , Multiomics , Single-Cell Gene Expression Analysis

11.

scCDC: a computational method for gene-specific contamination detection and correction in single-cell and single-nucleus RNA-seq data.

Wang, Weijian; Cen, Yihui; Lu, Zezhen; Xu, Yueqing; Sun, Tianyi; Xiao, Ying; Liu, Wanlu; Li, Jingyi Jessica; Wang, Chaochen.

Genome Biol ; 25(1): 136, 2024 May 23.

Article in English | MEDLINE | ID: mdl-38783325

ABSTRACT

In droplet-based single-cell and single-nucleus RNA-seq assays, systematic contamination of ambient RNA molecules biases the quantification of gene expression levels. Existing methods correct the contamination for all genes globally. However, there lacks specific evaluation of correction efficacy for varying contamination levels. Here, we show that DecontX and CellBender under-correct highly contaminating genes, while SoupX and scAR over-correct lowly/non-contaminating genes. Here, we develop scCDC as the first method to detect the contamination-causing genes and only correct expression levels of these genes, some of which are cell-type markers. Compared with existing decontamination methods, scCDC excels in decontaminating highly contaminating genes while avoiding over-correction of other genes.

Subject(s)

RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Humans , Computational Biology/methods , Sequence Analysis, RNA/methods , Cell Nucleus/genetics , Software , Animals

12.

Reference-free inferring of transcriptomic events in cancer cells on single-cell data.

Eralp, Batuhan; Sefer, Emre.

BMC Cancer ; 24(1): 607, 2024 May 20.

Article in English | MEDLINE | ID: mdl-38769480

ABSTRACT

BACKGROUND: Cancerous cells' identity is determined via a mixture of multiple factors such as genomic variations, epigenetics, and the regulatory variations that are involved in transcription. The differences in transcriptome expression as well as abnormal structures in peptides determine phenotypical differences. Thus, bulk RNA-seq and more recent single-cell RNA-seq data (scRNA-seq) are important to identify pathogenic differences. In this case, we rely on k-mer decomposition of sequences to identify pathogenic variations in detail which does not need a reference, so it outperforms more traditional Next-Generation Sequencing (NGS) analysis techniques depending on the alignment of the sequences to a reference. RESULTS: Via our alignment-free analysis, over esophageal and glioblastoma cancer patients, high-frequency variations over multiple different locations (repeats, intergenic regions, exons, introns) as well as multiple different forms (fusion, polyadenylation, splicing, etc.) could be discovered. Additionally, we have analyzed the importance of less-focused events systematically in a classic transcriptome analysis pipeline where these events are considered as indicators for tumor prognosis, tumor prediction, tumor neoantigen inference, as well as their connection with respect to the immune microenvironment. CONCLUSIONS: Our results suggest that esophageal cancer (ESCA) and glioblastoma processes can be explained via pathogenic microbial RNA, repeated sequences, novel splicing variants, and long intergenic non-coding RNAs (lincRNAs). We expect our application of reference-free process and analysis to be helpful in tumor and normal samples differential scRNA-seq analysis, which in turn offers a more comprehensive scheme for major cancer-associated events.

Subject(s)

Glioblastoma , Single-Cell Analysis , Transcriptome , Humans , Single-Cell Analysis/methods , Glioblastoma/genetics , Glioblastoma/pathology , Gene Expression Profiling/methods , Esophageal Neoplasms/genetics , Esophageal Neoplasms/pathology , High-Throughput Nucleotide Sequencing , RNA-Seq/methods , Sequence Analysis, RNA/methods , Gene Expression Regulation, Neoplastic , Neoplasms/genetics , Neoplasms/pathology

13.

Error modelled gene expression analysis (EMOGEA) provides a superior overview of time course RNA-seq measurements and low count gene expression.

Barra, Jasmine; Taverna, Federico; Bong, Fabian; Ahmed, Ibrahim; Karakach, Tobias K.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38770716

ABSTRACT

Temporal RNA-sequencing (RNA-seq) studies of bulk samples provide an opportunity for improved understanding of gene regulation during dynamic phenomena such as development, tumor progression or response to an incremental dose of a pharmacotherapeutic. Moreover, single-cell RNA-seq (scRNA-seq) data implicitly exhibit temporal characteristics because gene expression values recapitulate dynamic processes such as cellular transitions. Unfortunately, temporal RNA-seq data continue to be analyzed by methods that ignore this ordinal structure and yield results that are often difficult to interpret. Here, we present Error Modelled Gene Expression Analysis (EMOGEA), a framework for analyzing RNA-seq data that incorporates measurement uncertainty, while introducing a special formulation for those acquired to monitor dynamic phenomena. This method is specifically suited for RNA-seq studies in which low-count transcripts with small-fold changes lead to significant biological effects. Such transcripts include genes involved in signaling and non-coding RNAs that inherently exhibit low levels of expression. Using simulation studies, we show that this framework down-weights samples that exhibit extreme responses such as batch effects allowing them to be modeled with the rest of the samples and maintain the degrees of freedom originally envisioned for a study. Using temporal experimental data, we demonstrate the framework by extracting a cascade of gene expression waves from a well-designed RNA-seq study of zebrafish embryogenesis and an scRNA-seq study of mouse pre-implantation and provide unique biological insights into the regulation of genes in each wave. For non-ordinal measurements, we show that EMOGEA has a much higher rate of true positive calls and a vanishingly small rate of false negative discoveries compared to common approaches. Finally, we provide two packages in Python and R that are self-contained and easy to use, including test data.

Subject(s)

RNA-Seq , Zebrafish , Animals , Zebrafish/genetics , RNA-Seq/methods , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Mice , Sequence Analysis, RNA/methods , Software

14.

Normalization of RNA-Seq data using adaptive trimmed mean with multi-reference.

Singh, Vikas; Kirtipal, Nikhil; Song, Byeongsop; Lee, Sunjae.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38770720

ABSTRACT

The normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel's Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL and two simulated datasets with two-group and three-group conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared with various state-of-the-art methods, and it is better in terms of area under the receiver operating characteristic curve and differential expression.

Subject(s)

RNA-Seq , RNA-Seq/methods , Humans , Algorithms , Sequence Analysis, RNA/methods , Computational Biology/methods , Gene Expression Profiling/methods , ROC Curve , Software

15.

Multiplexed bulk and single-cell RNA-seq hybrid enables cost-efficient disease modeling with chimeric organoids.

Cheng, Chen; Wang, Gang; Zhu, Yuqing; Wu, Hangdi; Zhang, Li; Liu, Zhihong; Huang, Yuanhua; Zhang, Jin.

Nat Commun ; 15(1): 3946, 2024 May 10.

Article in English | MEDLINE | ID: mdl-38729950

ABSTRACT

Disease modeling with isogenic Induced Pluripotent Stem Cell (iPSC)-differentiated organoids serves as a powerful technique for studying disease mechanisms. Multiplexed coculture is crucial to mitigate batch effects when studying the genetic effects of disease-causing variants in differentiated iPSCs or organoids, and demultiplexing at the single-cell level can be conveniently achieved by assessing natural genetic barcodes. Here, to enable cost-efficient time-series experimental designs via multiplexed bulk and single-cell RNA-seq of hybrids, we introduce a computational method in our Vireo Suite, Vireo-bulk, to effectively deconvolve pooled bulk RNA-seq data by genotype reference, and thereby quantify donor abundance over the course of differentiation and identify differentially expressed genes among donors. Furthermore, with multiplexed scRNA-seq and bulk RNA-seq, we demonstrate the usefulness and necessity of a pooled design to reveal donor iPSC line heterogeneity during macrophage cell differentiation and to model rare WT1 mutation-driven kidney disease with chimeric organoids. Our work provides an experimental and analytic pipeline for dissecting disease mechanisms with chimeric organoids.

Subject(s)

Cell Differentiation , Induced Pluripotent Stem Cells , Organoids , RNA-Seq , Single-Cell Analysis , Organoids/metabolism , Single-Cell Analysis/methods , Induced Pluripotent Stem Cells/metabolism , Induced Pluripotent Stem Cells/cytology , Humans , Cell Differentiation/genetics , RNA-Seq/methods , Sequence Analysis, RNA/methods , Macrophages/metabolism , Macrophages/cytology , Animals , Single-Cell Gene Expression Analysis

16.

RNA-Seq transcriptome profiling of immature grain wheat is a technique for understanding comparative modeling of baking quality.

Ahmadi-Ochtapeh, Hossein; Soltanloo, Hassan; Ramezanpour, Seyyede Sanaz; Yamchi, Ahad; Shariati, Vahid.

Sci Rep ; 14(1): 10940, 2024 05 13.

Article in English | MEDLINE | ID: mdl-38740888

ABSTRACT

Improving the baking quality is a primary challenge in the wheat flour production value chain, as baking quality represents a crucial factor in determining its overall value. In the present study, we conducted a comparative RNA-Seq analysis on the high baking quality mutant "O-64.1.10" genotype and its low baking quality wild type "Omid" cultivar to recognize potential genes associated with bread quality. The cDNA libraries were constructed from immature grains that were 15 days post-anthesis, with an average of 16.24 and 18.97 million paired-end short-read sequences in the mutant and wild-type, respectively. A total number of 733 transcripts with differential expression were identified, 585 genes up-regulated and 188 genes down-regulated in the "O-64.1.10" genotype compared to the "Omid". In addition, the families of HSF, bZIP, C2C2-Dof, B3-ARF, BES1, C3H, GRF, HB-HD-ZIP, PLATZ, MADS-MIKC, GARP-G2-like, NAC, OFP and TUB were appeared as the key transcription factors with specific expression in the "O-64.1.10" genotype. At the same time, pathways related to baking quality were identified through Kyoto Encyclopedia of Genes and Genomes. Collectively, we found that the endoplasmic network, metabolic pathways, secondary metabolite biosynthesis, hormone signaling pathway, B group vitamins, protein pathways, pathways associated with carbohydrate and fat metabolism, as well as the biosynthesis and metabolism of various amino acids, have a great deal of potential to play a significant role in the baking quality. Ultimately, the RNA-seq results were confirmed using quantitative Reverse Transcription PCR for some hub genes such as alpha-gliadin, low molecular weight glutenin subunit and terpene synthase (gibberellin) and as a resource for future study, 127 EST-SSR primers were generated using RNA-seq data.

Subject(s)

Gene Expression Profiling , Gene Expression Regulation, Plant , RNA-Seq , Triticum , Triticum/genetics , Triticum/growth & development , Triticum/metabolism , RNA-Seq/methods , Gene Expression Profiling/methods , Transcriptome , Edible Grain/genetics , Edible Grain/metabolism , Cooking , Bread , Plant Proteins/genetics , Plant Proteins/metabolism , Genotype , Flour

17.

RNA-Seq Data Analysis: A Practical Guide for Model and Non-Model Organisms.

Pola-Sánchez, Enrique; Hernández-Martínez, Karen Magdalena; Pérez-Estrada, Rafael; Sélem-Mójica, Nelly; Simpson, June; Abraham-Juárez, María Jazmín; Herrera-Estrella, Alfredo; Villalobos-Escobedo, José Manuel.

Curr Protoc ; 4(5): e1054, 2024 May.

Article in English | MEDLINE | ID: mdl-38808970

ABSTRACT

RNA sequencing (RNA-seq) has emerged as a powerful tool for assessing genome-wide gene expression, revolutionizing various fields of biology. However, analyzing large RNA-seq datasets can be challenging, especially for students or researchers lacking bioinformatics experience. To address these challenges, we present a comprehensive guide to provide step-by-step workflows for analyzing RNA-seq data, from raw reads to functional enrichment analysis, starting with considerations for experimental design. This is designed to aid students and researchers working with any organism, irrespective of whether an assembled genome is available. Within this guide, we employ various recognized bioinformatics tools to navigate the landscape of RNA-seq analysis and discuss the advantages and disadvantages of different tools for the same task. Our protocol focuses on clarity, reproducibility, and practicality to enable users to navigate the complexities of RNA-seq data analysis easily and gain valuable biological insights from the datasets. Additionally, all scripts and a sample dataset are available in a GitHub repository to facilitate the implementation of the analysis pipeline. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Analysis of data from a model plant with an available reference genome Basic Protocol 2: Gene ontology enrichment analysis Basic Protocol 3: De novo assembly of data from non-model plants.

Subject(s)

RNA-Seq , RNA-Seq/methods , Computational Biology/methods , Sequence Analysis, RNA/methods , Software

18.

MultiRNAflow: integrated analysis of temporal RNA-seq data with multiple biological conditions.

Loubaton, Rodolphe; Champagnat, Nicolas; Vallois, Pierre; Vallat, Laurent.

Bioinformatics ; 40(5)2024 May 02.

Article in English | MEDLINE | ID: mdl-38810104

ABSTRACT

MOTIVATION: The dynamic transcriptional mechanisms that govern eukaryotic cell function can now be analyzed by RNA sequencing. However, the packages currently available for the analysis of raw sequencing data do not provide automatic analysis of complex experimental designs with multiple biological conditions and multiple analysis time-points. RESULTS: The MultiRNAflow suite combines several packages in a unified framework allowing exploratory and supervised statistical analyses of temporal data for multiple biological conditions. AVAILABILITY AND IMPLEMENTATION: The R package MultiRNAflow is freely available on Bioconductor (https://bioconductor.org/packages/MultiRNAflow/), and the latest version of the source code is available on a GitHub repository (https://github.com/loubator/MultiRNAflow).

Subject(s)

RNA-Seq , Software , RNA-Seq/methods , Sequence Analysis, RNA/methods , Computational Biology/methods

19.

Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference.

Dong, Xiaoru; Leary, Jack R; Yang, Chuanhao; Brusko, Maigan A; Brusko, Todd M; Bacher, Rhonda.

Brief Bioinform ; 25(3)2024 Mar 27.

Article in English | MEDLINE | ID: mdl-38725155

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics; however, researchers still encounter challenges in their analysis due to uncertainty with respect to selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a novel framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort evaluates the suitability of trajectory analysis and the combined effects of processing choices using trajectory-specific metrics. Escort navigates single-cell trajectory analysis through these data-driven assessments, reducing uncertainty and much of the decision burden inherent to trajectory inference analyses. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.

Subject(s)

RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Humans , Computational Biology/methods , Sequence Analysis, RNA/methods , Software , Algorithms , Gene Expression Profiling/methods , Single-Cell Gene Expression Analysis

20.

Data normalization for addressing the challenges in the analysis of single-cell transcriptomic datasets.

Cuevas-Diaz Duran, Raquel; Wei, Haichao; Wu, Jiaqian.

BMC Genomics ; 25(1): 444, 2024 May 06.

Article in English | MEDLINE | ID: mdl-38711017

ABSTRACT

BACKGROUND: Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY: The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis. CONCLUSIONS: According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.

Subject(s)

Single-Cell Analysis , Animals , Humans , Algorithms , Gene Expression Profiling/methods , Gene Expression Profiling/standards , RNA-Seq/methods , RNA-Seq/standards , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Transcriptome , Datasets as Topic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL