Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 7.458
Filter
1.
Mol Biol Rep ; 51(1): 720, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38824268

ABSTRACT

BACKGROUND: Tumor-associated macrophages (TAM) exert a significant influence on the progression and heterogeneity of various subtypes of breast cancer (BRCA). However, the roles of heterogeneous TAM within BRCA subtypes remain unclear. Therefore, this study sought to elucidate the role of TAM across the following three BRCA subtypes: triple-negative breast cancer, luminal, and HER2. MATERIALS AND METHODS: This investigation aimed to delineate the variations in marker genes, drug sensitivity, and cellular communication among TAM across the three BRCA subtypes. We identified specific ligand-receptor (L-R) pairs and downstream mechanisms regulated by VEGFA-VEGFR1, SPP1-CD44, and SPP1-ITGB1 L-R pairs. Experimental verification of these pairs was conducted by co-culturing macrophages with three subtypes of BRCA cells. RESULTS: Our findings reveal the heterogeneity of macrophages within the three BRCA subtypes, evidenced by variations in marker gene expression, composition, and functional characteristics. Notably, heterogeneous TAM were found to promote invasive migration and epithelial-mesenchymal transition (EMT) in MDA-MB-231, MCF-7, and SKBR3 cells, activating NF-κB pathway via P38 MAPK, TGF-ß1, and AKT, respectively, through distinct VEGFA-VEGFR1, SPP1-CD44, and SPP1-ITGB1 L-R pairs. Inhibition of these specific L-R pairs effectively reversed EMT, migration, and invasion of each cancer cells. Furthermore, we observed a correlation between ligand gene expression and TAM sensitivity to anticancer drugs, suggesting a potential strategy for optimizing personalized treatment guidance. CONCLUSION: Our study highlights the capacity of heterogeneous TAM to modulate biological functions via distinct pathways mediated by specific L-R pairs within diverse BRCA subtypes. This study might provide insights into precision immunotherapy of different subtypes of BRCA.


Subject(s)
Breast Neoplasms , Epithelial-Mesenchymal Transition , Tumor-Associated Macrophages , Humans , Female , Tumor-Associated Macrophages/metabolism , Tumor-Associated Macrophages/immunology , Epithelial-Mesenchymal Transition/genetics , Cell Line, Tumor , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Breast Neoplasms/metabolism , Gene Expression Regulation, Neoplastic , Single-Cell Analysis/methods , MCF-7 Cells , Cell Movement/genetics , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/pathology , Triple Negative Breast Neoplasms/metabolism , Sequence Analysis, RNA/methods , Vascular Endothelial Growth Factor A/metabolism , Vascular Endothelial Growth Factor A/genetics , Signal Transduction/genetics , Tumor Microenvironment/genetics
2.
Genome Biol ; 25(1): 145, 2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38831386

ABSTRACT

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS: We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS: No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Single-Cell Analysis/methods , Gene Expression Profiling/methods , Humans , Software , Computer Simulation , Transcriptome , Computational Biology/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods , RNA-Seq/standards
3.
Folia Biol (Praha) ; 70(1): 62-73, 2024.
Article in English | MEDLINE | ID: mdl-38830124

ABSTRACT

Germline DNA testing using the next-gene-ration sequencing (NGS) technology has become the analytical standard for the diagnostics of hereditary diseases, including cancer. Its increasing use places high demands on correct sample identification, independent confirmation of prioritized variants, and their functional and clinical interpretation. To streamline these processes, we introduced parallel DNA and RNA capture-based NGS using identical capture panel CZECANCA, which is routinely used for DNA analysis of hereditary cancer predisposition. Here, we present the analytical workflow for RNA sample processing and its analytical and diagnostic performance. Parallel DNA/RNA analysis allowed credible sample identification by calculating the kinship coefficient. The RNA capture-based approach enriched transcriptional targets for the majority of clinically relevant cancer predisposition genes to a degree that allowed analysis of the effect of identified DNA variants on mRNA processing. By comparing the panel and whole-exome RNA enrichment, we demonstrated that the tissue-specific gene expression pattern is independent of the capture panel. Moreover, technical replicates confirmed high reproducibility of the tested RNA analysis. We concluded that parallel DNA/RNA NGS using the identical gene panel is a robust and cost-effective diagnostic strategy. In our setting, it allows routine analysis of 48 DNA/RNA pairs using NextSeq 500/550 Mid Output Kit v2.5 (150 cycles) in a single run with sufficient coverage to analyse 226 cancer predisposition and candidate ge-nes. This approach can replace laborious Sanger confirmatory sequencing, increase testing turnaround, reduce analysis costs, and improve interpretation of the impact of variants by analysing their effect on mRNA processing.


Subject(s)
Genetic Predisposition to Disease , High-Throughput Nucleotide Sequencing , Humans , High-Throughput Nucleotide Sequencing/methods , Neoplasms/genetics , Neoplasms/diagnosis , RNA/genetics , Reproducibility of Results , Sequence Analysis, DNA/methods , Sequence Analysis, RNA/methods , DNA/genetics
4.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38701412

ABSTRACT

Trajectory inference is a crucial task in single-cell RNA-sequencing downstream analysis, which can reveal the dynamic processes of biological development, including cell differentiation. Dimensionality reduction is an important step in the trajectory inference process. However, most existing trajectory methods rely on cell features derived from traditional dimensionality reduction methods, such as principal component analysis and uniform manifold approximation and projection. These methods are not specifically designed for trajectory inference and fail to fully leverage prior information from upstream analysis, limiting their performance. Here, we introduce scCRT, a novel dimensionality reduction model for trajectory inference. In order to utilize prior information to learn accurate cells representation, scCRT integrates two feature learning components: a cell-level pairwise module and a cluster-level contrastive module. The cell-level module focuses on learning accurate cell representations in a reduced-dimensionality space while maintaining the cell-cell positional relationships in the original space. The cluster-level contrastive module uses prior cell state information to aggregate similar cells, preventing excessive dispersion in the low-dimensional space. Experimental findings from 54 real and 81 synthetic datasets, totaling 135 datasets, highlighted the superior performance of scCRT compared with commonly used trajectory inference methods. Additionally, an ablation study revealed that both cell-level and cluster-level modules enhance the model's ability to learn accurate cell features, facilitating cell lineage inference. The source code of scCRT is available at https://github.com/yuchen21-web/scCRT-for-scRNA-seq.


Subject(s)
Algorithms , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Computational Biology/methods , Software , Sequence Analysis, RNA/methods , Animals , Single-Cell Gene Expression Analysis
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38701413

ABSTRACT

With the emergence of large amount of single-cell RNA sequencing (scRNA-seq) data, the exploration of computational methods has become critical in revealing biological mechanisms. Clustering is a representative for deciphering cellular heterogeneity embedded in scRNA-seq data. However, due to the diversity of datasets, none of the existing single-cell clustering methods shows overwhelming performance on all datasets. Weighted ensemble methods are proposed to integrate multiple results to improve heterogeneity analysis performance. These methods are usually weighted by considering the reliability of the base clustering results, ignoring the performance difference of the same base clustering on different cells. In this paper, we propose a high-order element-wise weighting strategy based self-representative ensemble learning framework: scEWE. By assigning different base clustering weights to individual cells, we construct and optimize the consensus matrix in a careful and exquisite way. In addition, we extracted the high-order information between cells, which enhanced the ability to represent the similarity relationship between cells. scEWE is experimentally shown to significantly outperform the state-of-the-art methods, which strongly demonstrates the effectiveness of the method and supports the potential applications in complex single-cell data analytical problems.


Subject(s)
Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Sequence Analysis, RNA/methods , Algorithms , Computational Biology/methods , Humans , RNA-Seq/methods
6.
Nat Commun ; 15(1): 3786, 2024 May 06.
Article in English | MEDLINE | ID: mdl-38710690

ABSTRACT

Expression quantitative trait loci (eQTL) studies typically consider exon expression of genes and discard intronic RNA sequencing reads despite their information on RNA metabolism. Here, we quantify genetic effects on exon and intron levels of genes and their ratio in lymphoblastoid cell lines, revealing thousands of cis-QTLs of each type. While genetic effects are often shared between cis-QTL types, 7814 (47%) are not detected as top cis-QTLs at exon levels. We show that exon levels preferentially capture genetic effects on transcriptional regulation, while exon-intron-ratios better detect those on co- and post-transcriptional processes. Considering all cis-QTL types substantially increases (by 71%) the number of colocalizing variants identified by genome-wide association studies (GWAS). It further allows dissecting the potential gene regulatory processes underlying GWAS associations, suggesting comparable contributions by transcriptional (50%) and co- and post-transcriptional regulation (46%) to complex traits. Overall, integrating intronic RNA sequencing reads in eQTL studies expands our understanding of genetic effects on gene regulatory processes.


Subject(s)
Exons , Gene Expression Regulation , Genome-Wide Association Study , Introns , Quantitative Trait Loci , Humans , Introns/genetics , Exons/genetics , Transcription, Genetic , Cell Line , Sequence Analysis, RNA/methods , Polymorphism, Single Nucleotide
7.
BMC Genomics ; 25(1): 455, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38720252

ABSTRACT

BACKGROUND: Standard ChIP-seq and RNA-seq processing pipelines typically disregard sequencing reads whose origin is ambiguous ("multimappers"). This usual practice has potentially important consequences for the functional interpretation of the data: genomic elements belonging to clusters composed of highly similar members are left unexplored. RESULTS: In particular, disregarding multimappers leads to the underrepresentation in epigenetic studies of recently active transposable elements, such as AluYa5, L1HS and SVAs. Furthermore, this common strategy also has implications for transcriptomic analysis: members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, are under-quantified. CONCLUSION: Revealing inherent biases that permeate routine tasks such as functional enrichment analysis, our results underscore the urgency of broadly adopting multimapper-aware bioinformatic pipelines -currently restricted to specific contexts or communities- to ensure the reliability of genomic and transcriptomic studies.


Subject(s)
High-Throughput Nucleotide Sequencing , Humans , DNA Transposable Elements/genetics , Computational Biology/methods , Gene Expression Profiling/methods , Genomics/methods , Sequence Analysis, RNA/methods
8.
BMC Bioinformatics ; 25(1): 181, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38720247

ABSTRACT

BACKGROUND: RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins. RESULTS: We aimed to investigate the impact of data preprocessing steps-focusing on normalization, batch effect correction, and data scaling-through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer. CONCLUSION: By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.


Subject(s)
Machine Learning , Neoplasms , RNA-Seq , Humans , RNA-Seq/methods , Neoplasms/genetics , Transcriptome/genetics , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Computational Biology/methods
9.
BMC Genom Data ; 25(1): 45, 2024 May 07.
Article in English | MEDLINE | ID: mdl-38714942

ABSTRACT

OBJECTIVES: Cellular deconvolution is a valuable computational process that can infer the cellular composition of heterogeneous tissue samples from bulk RNA-sequencing data. Benchmark testing is a crucial step in the development and evaluation of new cellular deconvolution algorithms, and also plays a key role in the process of building and optimizing deconvolution pipelines for specific experimental applications. However, few in vivo benchmarking datasets exist, particularly for whole blood, which is the single most profiled human tissue. Here, we describe a unique dataset containing whole blood gene expression profiles and matched circulating leukocyte counts from a large cohort of human donors with utility for benchmarking cellular deconvolution pipelines. DATA DESCRIPTION: To produce this dataset, venous whole blood was sampled from 138 total donors recruited at an academic medical center. Genome-wide expression profiling was subsequently performed via next-generation RNA sequencing, and white blood cell differentials were collected in parallel using flow cytometry. The resultant final dataset contains donor-level expression data for over 45,000 protein coding and non-protein coding genes, as well as matched neutrophil, lymphocyte, monocyte, and eosinophil counts.


Subject(s)
Benchmarking , Humans , Leukocyte Count , Gene Expression Profiling/methods , Transcriptome , Sequence Analysis, RNA/methods , Leukocytes/metabolism , High-Throughput Nucleotide Sequencing , Algorithms
10.
Wiley Interdiscip Rev RNA ; 15(3): e1852, 2024.
Article in English | MEDLINE | ID: mdl-38715192

ABSTRACT

Small RNAs (sRNAs) with sizes ranging from 15 to 50 nucleotides (nt) are critical regulators of gene expression control. Prior studies have shown that sRNAs are involved in a broad range of biological processes, such as organ development, tumorigenesis, and epigenomic regulation; however, emerging evidence unveils a hidden layer of diversity and complexity of endogenously encoded sRNAs profile in eukaryotic organisms, including novel types of sRNAs and the previously unknown post-transcriptional RNA modifications. This underscores the importance for accurate, unbiased detection of sRNAs in various cellular contexts. A multitude of high-throughput methods based on next-generation sequencing (NGS) are developed to decipher the sRNA expression and their modifications. Nonetheless, distinct from mRNA sequencing, the data from sRNA sequencing suffer frequent inconsistencies and high variations emanating from the adapter contaminations and RNA modifications, which overall skew the sRNA libraries. Here, we summarize the sRNA-sequencing approaches, and discuss the considerations and challenges for the strategies and methods of sRNA library construction. The pros and cons of sRNA sequencing have significant implications for implementing RNA fragment footprinting approaches, including CLIP-seq and Ribo-seq. We envision that this review can inspire novel improvements in small RNA sequencing and RNA fragment footprinting in future. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA Processing > Processing of Small RNAs Regulatory RNAs/RNAi/Riboswitches > Biogenesis of Effector Small RNAs.


Subject(s)
RNA, Small Untranslated , RNA, Small Untranslated/genetics , RNA, Small Untranslated/metabolism , Gene Library , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, RNA/methods , Humans , Animals
11.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38706317

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) enables the exploration of cellular heterogeneity by analyzing gene expression profiles in complex tissues. However, scRNA-seq data often suffer from technical noise, dropout events and sparsity, hindering downstream analyses. Although existing works attempt to mitigate these issues by utilizing graph structures for data denoising, they involve the risk of propagating noise and fall short of fully leveraging the inherent data relationships, relying mainly on one of cell-cell or gene-gene associations and graphs constructed by initial noisy data. To this end, this study presents single-cell bilevel feature propagation (scBFP), two-step graph-based feature propagation method. It initially imputes zero values using non-zero values, ensuring that the imputation process does not affect the non-zero values due to dropout. Subsequently, it denoises the entire dataset by leveraging gene-gene and cell-cell relationships in the respective steps. Extensive experimental results on scRNA-seq data demonstrate the effectiveness of scBFP in various downstream tasks, uncovering valuable biological insights.


Subject(s)
Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Sequence Analysis, RNA/methods , Humans , Algorithms , Gene Expression Profiling/methods , Computational Biology/methods , RNA-Seq/methods
12.
Commun Biol ; 7(1): 639, 2024 May 25.
Article in English | MEDLINE | ID: mdl-38796505

ABSTRACT

Efficiently mapping of cell types in situ remains a major challenge in spatial transcriptomics. Most spot deconvolution tools ignore spatial coordinate information and perform extremely slow on large datasets. Here, we introduce SpatialPrompt, a spatially aware and scalable tool for spot deconvolution and domain identification. SpatialPrompt integrates gene expression, spatial location, and single-cell RNA sequencing (scRNA-seq) dataset as reference to accurately infer cell-type proportions of spatial spots. SpatialPrompt uses non-negative ridge regression and graph neural network to efficiently capture local microenvironment information. Our extensive benchmarking analysis on Visium, Slide-seq, and MERFISH datasets demonstrated superior performance of SpatialPrompt over 15 existing tools. On mouse hippocampus dataset, SpatialPrompt achieves spot deconvolution and domain identification within 2 minutes for 50,000 spots. Overall, domain identification using SpatialPrompt was 44 to 150 times faster than existing methods. We build a database housing 40 plus curated scRNA-seq datasets for seamless integration with SpatialPrompt for spot deconvolution.


Subject(s)
Gene Expression Profiling , Transcriptome , Animals , Mice , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Software , Sequence Analysis, RNA/methods , Hippocampus/metabolism
13.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38796691

ABSTRACT

Limited gene capture efficiency and spot size of spatial transcriptome (ST) data pose significant challenges in cell-type characterization. The heterogeneity and complexity of cell composition in the mammalian brain make it more challenging to accurately annotate ST data from brain. Many algorithms attempt to characterize subtypes of neuron by integrating ST data with single-nucleus RNA sequencing (snRNA-seq) or single-cell RNA sequencing. However, assessing the accuracy of these algorithms on Stereo-seq ST data remains unresolved. Here, we benchmarked 9 mapping algorithms using 10 ST datasets from four mouse brain regions in two different resolutions and 24 pseudo-ST datasets from snRNA-seq. Both actual ST data and pseudo-ST data were mapped using snRNA-seq datasets from the corresponding brain regions as reference data. After comparing the performance across different areas and resolutions of the mouse brain, we have reached the conclusion that both robust cell-type decomposition and SpatialDWLS demonstrated superior robustness and accuracy in cell-type annotation. Testing with publicly available snRNA-seq data from another sequencing platform in the cortex region further validated our conclusions. Altogether, we developed a workflow for assessing suitability of mapping algorithm that fits for ST datasets, which can improve the efficiency and accuracy of spatial data annotation.


Subject(s)
Algorithms , Benchmarking , Brain , Single-Cell Analysis , Animals , Mice , Brain/metabolism , Single-Cell Analysis/methods , RNA-Seq/methods , Transcriptome , Sequence Analysis, RNA/methods , Neurons/metabolism , Gene Expression Profiling/methods
14.
Methods Mol Biol ; 2775: 109-126, 2024.
Article in English | MEDLINE | ID: mdl-38758314

ABSTRACT

RNA sequencing is a next-generation sequencing approach that may be used to investigate many aspects of gene expression changes between cells. Analysis of the data is typically a multistep process using several bioinformatics tools. The following protocol utilizes a reliable pipeline for identifying differentially expressed genes among samples of Cryptococcus neoformans that is approachable for the adventurous beginner.


Subject(s)
Computational Biology , Cryptococcus neoformans , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Transcriptome , Cryptococcus neoformans/genetics , Cryptococcus neoformans/metabolism , Gene Expression Profiling/methods , Computational Biology/methods , Transcriptome/genetics , High-Throughput Nucleotide Sequencing/methods , Gene Expression Regulation, Fungal , Software , Sequence Analysis, RNA/methods
15.
Nat Commun ; 15(1): 4055, 2024 May 14.
Article in English | MEDLINE | ID: mdl-38744843

ABSTRACT

We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.


Subject(s)
Algorithms , Computer Simulation , Gene Regulatory Networks , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Computational Biology/methods , Benchmarking , Sequence Analysis, RNA/methods , Single-Cell Gene Expression Analysis
16.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38739758

ABSTRACT

The complicated process of neuronal development is initiated early in life, with the genetic mechanisms governing this process yet to be fully elucidated. Single-cell RNA sequencing (scRNA-seq) is a potent instrument for pinpointing biomarkers that exhibit differential expression across various cell types and developmental stages. By employing scRNA-seq on human embryonic stem cells, we aim to identify differentially expressed genes (DEGs) crucial for early-stage neuronal development. Our focus extends beyond simply identifying DEGs. We strive to investigate the functional roles of these genes through enrichment analysis and construct gene regulatory networks to understand their interactions. Ultimately, this comprehensive approach aspires to illuminate the molecular mechanisms and transcriptional dynamics governing early human brain development. By uncovering potential links between these DEGs and intelligence, mental disorders, and neurodevelopmental disorders, we hope to shed light on human neurological health and disease. In this study, we have used scRNA-seq to identify DEGs involved in early-stage neuronal development in hESCs. The scRNA-seq data, collected on days 26 (D26) and 54 (D54), of the in vitro differentiation of hESCs to neurons were analyzed. Our analysis identified 539 DEGs between D26 and D54. Functional enrichment of those DEG biomarkers indicated that the up-regulated DEGs participated in neurogenesis, while the down-regulated DEGs were linked to synapse regulation. The Reactome pathway analysis revealed that down-regulated DEGs were involved in the interactions between proteins located in synapse pathways. We also discovered interactions between DEGs and miRNA, transcriptional factors (TFs) and DEGs, and between TF and miRNA. Our study identified 20 significant transcription factors, shedding light on early brain development genetics. The identified DEGs and gene regulatory networks are valuable resources for future research into human brain development and neurodevelopmental disorders.


Subject(s)
Biomarkers , Brain , Gene Regulatory Networks , Human Embryonic Stem Cells , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , Human Embryonic Stem Cells/metabolism , Human Embryonic Stem Cells/cytology , Brain/metabolism , Brain/embryology , Brain/cytology , Biomarkers/metabolism , Neurons/metabolism , Neurons/cytology , Cell Differentiation/genetics , RNA-Seq , Neurogenesis/genetics , Gene Expression Regulation, Developmental , Gene Expression Profiling , Sequence Analysis, RNA/methods , Single-Cell Gene Expression Analysis
17.
Methods Mol Biol ; 2807: 209-227, 2024.
Article in English | MEDLINE | ID: mdl-38743231

ABSTRACT

The post-transcriptional processing and chemical modification of HIV RNA are understudied aspects of HIV virology, primarily due to the limited ability to accurately map and quantify RNA modifications. Modification-specific antibodies or modification-sensitive endonucleases coupled with short-read RNA sequencing technologies have allowed for low-resolution or limited mapping of important regulatory modifications of HIV RNA such as N6-methyladenosine (m6A). However, a high-resolution map of where these sites occur on HIV transcripts is needed for detailed mechanistic understanding. This has recently become possible with new sequencing technologies. Here, we describe the direct RNA sequencing of HIV transcripts using an Oxford Nanopore Technologies sequencer and the use of this technique to map m6A at near single nucleotide resolution. This technology also provides the ability to identify splice variants with long RNA reads and thus, can provide high-resolution RNA modification maps that distinguish between overlapping splice variants. The protocols outlined here for m6A also provide a powerful paradigm for studying any other RNA modifications that can be detected on the nanopore platform.


Subject(s)
Adenosine , Nanopore Sequencing , RNA, Messenger , RNA, Viral , Nanopore Sequencing/methods , RNA, Viral/genetics , Methylation , Humans , Adenosine/analogs & derivatives , Adenosine/genetics , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , HIV-1/genetics , RNA Processing, Post-Transcriptional , High-Throughput Nucleotide Sequencing/methods , HIV Infections/virology , HIV Infections/genetics , HIV/genetics
18.
Cells ; 13(9)2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38727290

ABSTRACT

Dilated cardiomyopathy (DCM) is the most common cause of heart failure, with a complex aetiology involving multiple cell types. We aimed to detect cell-specific transcriptomic alterations in DCM through analysis that leveraged recent advancements in single-cell analytical tools. Single-cell RNA sequencing (scRNA-seq) data from human DCM cardiac tissue were subjected to an updated bioinformatic workflow in which unsupervised clustering was paired with reference label transfer to more comprehensively annotate the dataset. Differential gene expression was detected primarily in the cardiac fibroblast population. Bulk RNA sequencing was performed on an independent cohort of human cardiac tissue and compared with scRNA-seq gene alterations to generate a stratified list of higher-confidence, fibroblast-specific expression candidates for further validation. Concordant gene dysregulation was confirmed in TGFß-induced fibroblasts. Functional assessment of gene candidates showed that AEBP1 may play a significant role in fibroblast activation. This unbiased approach enabled improved resolution of cardiac cell-type-specific transcriptomic alterations in DCM.


Subject(s)
Cardiomyopathy, Dilated , Fibroblasts , Sequence Analysis, RNA , Single-Cell Analysis , Transcriptome , Humans , Cardiomyopathy, Dilated/genetics , Cardiomyopathy, Dilated/pathology , Cardiomyopathy, Dilated/metabolism , Fibroblasts/metabolism , Single-Cell Analysis/methods , Transcriptome/genetics , Sequence Analysis, RNA/methods , Myocardium/metabolism , Myocardium/pathology , Gene Expression Profiling
19.
PLoS One ; 19(5): e0302947, 2024.
Article in English | MEDLINE | ID: mdl-38728288

ABSTRACT

In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models.


Subject(s)
Machine Learning , Neoplasms , Sequence Analysis, RNA , Humans , Neoplasms/genetics , Sequence Analysis, RNA/methods , Neural Networks, Computer , Support Vector Machine , ROC Curve , Decision Trees
20.
Nat Commun ; 15(1): 4050, 2024 May 14.
Article in English | MEDLINE | ID: mdl-38744866

ABSTRACT

Although more than half of all genes generate transcripts that differ in 3'UTR length, current analysis pipelines only quantify the amount but not the length of mRNA transcripts. 3'UTR length is determined by 3' end cleavage sites (CS). We map CS in more than 200 primary human and mouse cell types and increase CS annotations relative to the GENCODE database by 40%. Approximately half of all CS are used in few cell types, revealing that most genes only have one or two major 3' ends. We incorporate the CS annotations into a computational pipeline, called scUTRquant, for rapid, accurate, and simultaneous quantification of gene and 3'UTR isoform expression from single-cell RNA sequencing (scRNA-seq) data. When applying scUTRquant to data from 474 cell types and 2134 perturbations, we discover extensive 3'UTR length changes across cell types that are as widespread and coordinately regulated as gene expression changes but affect mostly different genes. Our data indicate that mRNA abundance and mRNA length are two largely independent axes of gene regulation that together determine the amount and spatial organization of protein synthesis.


Subject(s)
3' Untranslated Regions , RNA, Messenger , Single-Cell Analysis , 3' Untranslated Regions/genetics , Humans , Animals , Mice , RNA, Messenger/genetics , RNA, Messenger/metabolism , Single-Cell Analysis/methods , Sequence Analysis, RNA/methods , Gene Expression Regulation , RNA-Seq/methods , Computational Biology/methods , Gene Expression Profiling/methods , Single-Cell Gene Expression Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...