Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 49
Filter
1.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38608194

ABSTRACT

MOTIVATION: Dysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape project, researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates. RESULTS: To circumvent this, we mined ∼18 million PubMed abstracts published till May 2019 and automatically selected ∼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained bidirectional encoder representations from transformers (BERT) for language modeling from the domain of natural language processing to learn vector representation of entities such as genes, diseases, tissues, cell-types, etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in silico synthesis of hypotheses linking different biological entities such as genes and conditions. AVAILABILITY AND IMPLEMENTATION: PathoBERT pretrained model: https://github.com/Priyadarshini-Rai/Pathomap-Model. BioSentVec-based abstract classification model: https://github.com/Priyadarshini-Rai/Pathomap-Model. Pathomap R package: https://github.com/Priyadarshini-Rai/Pathomap.


Subject(s)
Data Mining , Humans , Data Mining/methods , Computational Biology/methods , Natural Language Processing
2.
Chembiochem ; 25(1): e202300577, 2024 01 02.
Article in English | MEDLINE | ID: mdl-37874183

ABSTRACT

Cellular genome is considered a dynamic blueprint of a cell since it encodes genetic information that gets temporally altered due to various endogenous and exogenous insults. Largely, the extent of genomic dynamicity is controlled by the trade-off between DNA repair processes and the genotoxic potential of the causative agent (genotoxins or potential carcinogens). A subset of genotoxins form DNA adducts by covalently binding to the cellular DNA, triggering structural or functional changes that lead to significant alterations in cellular processes via genetic (e. g., mutations) or non-genetic (e. g., epigenome) routes. Identification, quantification, and characterization of DNA adducts are indispensable for their comprehensive understanding and could expedite the ongoing efforts in predicting carcinogenicity and their mode of action. In this review, we elaborate on using Artificial Intelligence (AI)-based modeling in adducts biology and present multiple computational strategies to gain advancements in decoding DNA adducts. The proposed AI-based strategies encompass predictive modeling for adduct formation via metabolic activation, novel adducts' identification, prediction of biochemical routes for adduct formation, adducts' half-life predictions within biological ecosystems, and, establishing methods to predict the link between adducts chemistry and its location within the genomic DNA. In summary, we discuss some futuristic AI-based approaches in DNA adduct biology.


Subject(s)
DNA Adducts , Ecosystem , Artificial Intelligence , Mutagens , DNA/genetics
3.
Placenta ; 140: 109-116, 2023 09 07.
Article in English | MEDLINE | ID: mdl-37572594

ABSTRACT

INTRODUCTION: The objective was to perform placental ultrasound image texture (UPIA) in first (T1), second(T2) and third(T3) trimesters of pregnancy using machine learning( ML). METHODS: In this prospective observational study the 2D placental ultrasound (US) images from 11-14 weeks, 20-24 weeks, and 28-32 weeks were taken. The image data was divided into training, validating, and testing subsets in the ratio of 80%, 10%, and 10%. Three different ML techniques, deep learning, transfer learning, and vision transformer were used for UPIA. RESULTS: Out of 1008 cases included in the study, 59.5% (600/1008) had a normal outcome. The image texture classification was compared between T1&T2, T2 &T3 and T1&T3 pairs. Using Inception v3 model, to classify T1& T2 images, gave the accuracy, Cohen Kappa score of 83.3%, 0.662 respectively. The image classification between T1&T3 achieved best results using EfficientNetB0 model, having the accuracy, Cohen Kappa score, sensitivity and specificity of 87.5%, 0.749, 83.4%, and 88.9% respectively. Comparison of placental image texture among cases with materno-fetal adverse outcome and controls was done using Efficient Net B0. The F1 score, was found to be 0.824 , 0.820, and 0.892 in T1, T2 and T3 respectively. The sensitivity and specificity of the model was 77.4% at 80.2% at T1 but increased to 81.0% and 93.9% at T2 &T3 respectively. DISCUSSION: The study presents a novel technique to classify placental ultrasound image texture using ML models and could differentiate first and third-trimester normal placenta and normal and adverse pregnancy outcome images with good accuracy.


Subject(s)
Machine Learning , Placenta , Pregnancy , Humans , Female , Placenta/diagnostic imaging , Ultrasonography , Sensitivity and Specificity , Pregnancy Trimester, Third
4.
PLoS Comput Biol ; 19(4): e1010995, 2023 04.
Article in English | MEDLINE | ID: mdl-37068117

ABSTRACT

Our understanding of how speed and persistence of cell migration affects the growth rate and size of tumors remains incomplete. To address this, we developed a mathematical model wherein cells migrate in two-dimensional space, divide, die or intravasate into the vasculature. Exploring a wide range of speed and persistence combinations, we find that tumor growth positively correlates with increasing speed and higher persistence. As a biologically relevant example, we focused on Golgi fragmentation, a phenomenon often linked to alterations of cell migration. Golgi fragmentation was induced by depletion of Giantin, a Golgi matrix protein, the downregulation of which correlates with poor patient survival. Applying the experimentally obtained migration and invasion traits of Giantin depleted breast cancer cells to our mathematical model, we predict that loss of Giantin increases the number of intravasating cells. This prediction was validated, by showing that circulating tumor cells express significantly less Giantin than primary tumor cells. Altogether, our computational model identifies cell migration traits that regulate tumor progression and uncovers a role of Giantin in breast cancer progression.


Subject(s)
Breast Neoplasms , Membrane Proteins , Humans , Female , Membrane Proteins/metabolism , Golgi Matrix Proteins/metabolism , Breast Neoplasms/metabolism , Golgi Apparatus/metabolism , Golgi Apparatus/pathology
5.
Genome Res ; 33(2): 218-231, 2023 02.
Article in English | MEDLINE | ID: mdl-36653120

ABSTRACT

The true benefits of large single-cell transcriptome and epigenome data sets can be realized only with the development of new approaches and search tools for annotating individual cells. Matching a single-cell epigenome profile to a large pool of reference cells remains a major challenge. Here, we present scEpiSearch, which enables searching, comparison, and independent classification of single-cell open-chromatin profiles against a large reference of single-cell expression and open-chromatin data sets. Across performance benchmarks, scEpiSearch outperformed multiple methods in accuracy of search and low-dimensional coembedding of single-cell profiles, irrespective of platforms and species. Here we also demonstrate the unconventional utilities of scEpiSearch by applying it on single-cell epigenome profiles of K562 cells and samples from patients with acute leukaemia to reveal different aspects of their heterogeneity, multipotent behavior, and dedifferentiated states. Applying scEpiSearch on our single-cell open-chromatin profiles from embryonic stem cells (ESCs), we identified ESC subpopulations with more activity and poising for endoplasmic reticulum stress and unfolded protein response. Thus, scEpiSearch solves the nontrivial problem of amalgamating information from a large pool of single cells to identify and study the regulatory states of cells using their single-cell epigenomes.


Subject(s)
Chromatin , Transcriptome , Humans , Chromatin/metabolism , Epigenome , Embryonic Stem Cells/metabolism , Single-Cell Analysis
6.
Brief Funct Genomics ; 22(3): 281-290, 2023 05 18.
Article in English | MEDLINE | ID: mdl-36542133

ABSTRACT

Odorant receptors (ORs) obey mutual exclusivity and monoallelic mode of expression. Efforts are ongoing to decipher the molecular mechanism that drives the 'one-neuron-one-receptor' rule of olfaction. Recently, single-cell profiling of olfactory sensory neurons (OSNs) revealed the expression of multiple ORs in the immature neurons, suggesting that the OR gene choice mechanism is much more complex than previously described by the silence-all-and-activate-one model. These results also led to the genesis of two possible mechanistic models i.e. winner-takes-all and stochastic selection. We developed Reverse Cell Tracking (RCT), a novel computational framework that facilitates OR-guided cellular backtracking by leveraging Uniform Manifold Approximation and Projection embeddings from RNA Velocity Workflow. RCT-based trajectory backtracking, coupled with statistical analysis, revealed the OR gene choice bias for the transcriptionally advanced (highest expressed) OR during neuronal differentiation. Interestingly, the observed selection bias was uniform for all ORs across different spatial zones or their relative expression within the olfactory organ. We validated these findings on independent datasets and further confirmed that the OR gene selection may be regulated by Upf3b. Lastly, our RNA dynamics-based tracking of the differentiation cascade revealed a transition cell state that harbors mixed molecular identities of immature and mature OSNs, and their relative abundance is regulated by Upf3b.


Subject(s)
Olfactory Receptor Neurons , Receptors, Odorant , Receptors, Odorant/genetics , Receptors, Odorant/metabolism , Olfactory Receptor Neurons/metabolism , Cell Differentiation/genetics
8.
Genome Res ; 33(1): 80-95, 2023 01.
Article in English | MEDLINE | ID: mdl-36414416

ABSTRACT

The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor in the enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered phenotypes relative to the primary tumor. Intensive research both at the technical and molecular fronts led to the development of assays that ease CTC detection and identification from peripheral blood. Most CTC detection methods based on single-cell RNA sequencing (scRNA-seq) use a mix of size selection, marker-based white blood cell (WBC) depletion, and antibodies targeting tumor-associated antigens. However, the majority of these methods either miss out on atypical CTCs or suffer from WBC contamination. We present unCTC, an R package for unbiased identification and characterization of CTCs from single-cell transcriptomic data. unCTC features many standard and novel computational and statistical modules for various analyses. These include a novel method of scRNA-seq clustering, named deep dictionary learning using k-means clustering cost (DDLK), expression-based copy number variation (CNV) inference, and combinatorial, marker-based verification of the malignant phenotypes. DDLK enables robust segregation of CTCs and WBCs in the pathway space, as opposed to the gene expression space. We validated the utility of unCTC on scRNA-seq profiles of breast CTCs from six patients, captured and profiled using an integrated ClearCell FX and Polaris workflow that works by the principles of size-based separation of CTCs and marker-based WBC depletion.


Subject(s)
Neoplastic Cells, Circulating , Humans , Neoplastic Cells, Circulating/metabolism , Transcriptome , DNA Copy Number Variations , Gene Expression Profiling , Biomarkers, Tumor
9.
Commun Biol ; 5(1): 1231, 2022 11 12.
Article in English | MEDLINE | ID: mdl-36371461

ABSTRACT

Cell-cell communication and physical interactions play a vital role in cancer initiation, homeostasis, progression, and immune response. Here, we report a system that combines live capture of different cell types, co-incubation, time-lapse imaging, and gene expression profiling of doublets using a microfluidic integrated fluidic circuit that enables measurement of physical distances between cells and the associated transcriptional profiles due to cell-cell interactions. We track the temporal variations in natural killer-triple-negative breast cancer cell distances and compare them with terminal cellular transcriptome profiles. The results show the time-bound activities of regulatory modules and allude to the existence of transcriptional memory. Our experimental and bioinformatic approaches serve as a proof of concept for interrogating live-cell interactions at doublet resolution. Together, our findings highlight the use of our approach across different cancers and cell types.


Subject(s)
Transcriptome , Triple Negative Breast Neoplasms , Humans , Microfluidics , Gene Expression Profiling/methods , Gene Expression Regulation
10.
Nat Commun ; 13(1): 5680, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36167836

ABSTRACT

Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of high-throughput screening datasets has paved the way for machine learning based personalized therapy recommendations using the molecular profiles of cancer specimens. In this study, we introduce Precily, a predictive modeling approach to infer treatment response in cancers using gene expression data. In this context, we demonstrate the benefits of considering pathway activity estimates in tandem with drug descriptors as features. We apply Precily on single-cell and bulk RNA sequencing data associated with hundreds of cancer cell lines. We then assess the predictability of treatment outcomes using our in-house prostate cancer cell line and xenografts datasets exposed to differential treatment conditions. Further, we demonstrate the applicability of our approach on patient drug response data from The Cancer Genome Atlas and an independent clinical study describing the treatment journey of three melanoma patients. Our findings highlight the importance of chemo-transcriptomics approaches in cancer treatment selection.


Subject(s)
Antineoplastic Agents , Melanoma , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Gene Expression , Humans , Machine Learning , Male , Melanoma/drug therapy , Melanoma/genetics , Sequence Analysis, RNA
11.
Nat Chem Biol ; 18(11): 1204-1213, 2022 11.
Article in English | MEDLINE | ID: mdl-35953549

ABSTRACT

The genome of a eukaryotic cell is often vulnerable to both intrinsic and extrinsic threats owing to its constant exposure to a myriad of heterogeneous compounds. Despite the availability of innate DNA damage responses, some genomic lesions trigger malignant transformation of cells. Accurate prediction of carcinogens is an ever-challenging task owing to the limited information about bona fide (non-)carcinogens. We developed Metabokiller, an ensemble classifier that accurately recognizes carcinogens by quantitatively assessing their electrophilicity, their potential to induce proliferation, oxidative stress, genomic instability, epigenome alterations, and anti-apoptotic response. Concomitant with the carcinogenicity prediction, Metabokiller is fully interpretable and outperforms existing best-practice methods for carcinogenicity prediction. Metabokiller unraveled potential carcinogenic human metabolites. To cross-validate Metabokiller predictions, we performed multiple functional assays using Saccharomyces cerevisiae and human cells with two Metabokiller-flagged human metabolites, namely 4-nitrocatechol and 3,4-dihydroxyphenylacetic acid, and observed high synergy between Metabokiller predictions and experimental validations.


Subject(s)
Artificial Intelligence , Carcinogens , Humans , Carcinogens/toxicity , 3,4-Dihydroxyphenylacetic Acid , Cell Transformation, Neoplastic/genetics , Genomic Instability
12.
Brief Bioinform ; 23(5)2022 09 20.
Article in English | MEDLINE | ID: mdl-35868454

ABSTRACT

Artificial intelligence (AI)-based computational techniques allow rapid exploration of the chemical space. However, representation of the compounds into computational-compatible and detailed features is one of the crucial steps for quantitative structure-activity relationship (QSAR) analysis. Recently, graph-based methods are emerging as a powerful alternative to chemistry-restricted fingerprints or descriptors for modeling. Although graph-based modeling offers multiple advantages, its implementation demands in-depth domain knowledge and programming skills. Here we introduce deepGraphh, an end-to-end web service featuring a conglomerate of established graph-based methods for model generation for classification or regression tasks. The graphical user interface of deepGraphh supports highly configurable parameter support for model parameter tuning, model generation, cross-validation and testing of the user-supplied query molecules. deepGraphh supports four widely adopted methods for QSAR analysis, namely, graph convolution network, graph attention network, directed acyclic graph and Attentive FP. Comparative analysis revealed that deepGraphh supported methods are comparable to the descriptors-based machine learning techniques. Finally, we used deepGraphh models to predict the blood-brain barrier permeability of human and microbiome-generated metabolites. In summary, deepGraphh offers a one-stop web service for graph-based methods for chemoinformatics.


Subject(s)
Artificial Intelligence , Quantitative Structure-Activity Relationship , Humans , Machine Learning
13.
J Biol Chem ; 298(8): 102177, 2022 08.
Article in English | MEDLINE | ID: mdl-35753349

ABSTRACT

Cancers are caused by genomic alterations that may be inherited, induced by environmental carcinogens, or caused due to random replication errors. Postinduction of carcinogenicity, mutations further propagate and drastically alter the cancer genomes. Although a subset of driver mutations has been identified and characterized to date, most cancer-related somatic mutations are indistinguishable from germline variants or other noncancerous somatic mutations. Thus, such overlap impedes appreciation of many deleterious but previously uncharacterized somatic mutations. The major bottleneck arises due to patient-to-patient variability in mutational profiles, making it difficult to associate specific mutations with a given disease outcome. Here, we describe a newly developed technique Continuous Representation of Codon Switches (CRCS), a deep learning-based method that allows us to generate numerical vector representations of mutations, thereby enabling numerous machine learning-based tasks. We demonstrate three major applications of CRCS; first, we show how CRCS can help detect cancer-related somatic mutations in the absence of matched normal samples, which has applications in cell-free DNA-based assessment of tumor mutation burden. Second, the proposed approach also enables identification and exploration of driver genes; our analyses implicate DMD, RSK4, OFD1, WDR44, and AFF2 as potential cancer drivers. Finally, we used CRCS to score individual mutations in a tumor sample, which was found to be predictive of patient survival in bladder urothelial carcinoma, hepatocellular carcinoma, and lung adenocarcinoma. Taken together, we propose CRCS as a valuable computational tool for analysis of the functional significance of individual cancer mutations.


Subject(s)
Carcinoma, Transitional Cell , Deep Learning , Neoplasms , Urinary Bladder Neoplasms , Genomics/methods , Humans , Mutation , Neoplasms/genetics
14.
J Matern Fetal Neonatal Med ; 35(25): 5587-5594, 2022 Dec.
Article in English | MEDLINE | ID: mdl-33596762

ABSTRACT

BACKGROUND: The placental pathological changes in hypertensive disorders of pregnancy (HDP) starts early in pregnancy, the deep convolutional neural networks (CNN) can identify these changes before its clinical manifestation. OBJECTIVE: To compare the placental quantitative ultrasound image texture of women with HDP to those with the normal outcome. METHODS: The cases were enrolled in the first trimester of pregnancy, good quality images of the placenta were taken serially in the first, second, and third trimester of pregnancy. The women were followed till delivery, those with normal outcomes were controls, and those with HDP were cases. The images were processed and classified using validated deep learning tools. RESULTS: Total of 429 cases were fully followed till delivery, 58 of them had HDP (13.5%). In the first trimester, there was a significant difference in the placental length (p = .033), uterine artery PI (p = .019), biomarkers PAPP-A (p = .001) PlGF (p = .013) and placental image texture (p = .001) between the cases and controls. In the second trimester the uterine artery PI, serum PAPP-A (p = .010) and PlGF (p = .005) levels were significantly low among women who developed hypertension later on pregnancy. The image texture disparity between the two groups was highly significant (p < .001). The model "resnext 101_32x8d" had Cohen kappa score of 0.413 (moderate) and the accuracy score of 0.710 (good). In the first trimester the best sensitivity and specificity was observed for abnormal placental image texture (70.6% and 76.6%, respectively) followed by PlGF (64% and 50%, respectively), in the second trimester the abnormal image texture had the highest sensitivity and specificity (60.4% and 73.3%, respectively) followed by uterine artery PI (58.6% and 54.7%, respectively). Similarly in the third trimester, uterine artery PI had sensitivity and specificity of 60.3% and specificity of 50.7%, whereas the abnormal image texture had sensitivity and specificity of 83.5%. CONCLUSION: Ultrasound placental analysis using artificial intelligence (UPAAI) is a promising technique, would open avenues for more research in this field.


Subject(s)
Hypertension , Pre-Eclampsia , Female , Pregnancy , Humans , Pregnancy-Associated Plasma Protein-A , Artificial Intelligence , Placenta/diagnostic imaging , Placenta Growth Factor , Uterine Artery , Pregnancy Trimester, First
15.
Article in English | MEDLINE | ID: mdl-32750851

ABSTRACT

Single-cell RNA sequencing has been proved to be advantageous in discerning molecular heterogeneity in seemingly similar cells in a tissue. Due to the paucity of starting RNA, a large fraction of transcripts fail to amplify during the polymerase chain reaction cycle. This gets compounded by trivial biological noise such as variability in the cell cycle specific genes. As a result expression matrix obtained from a single-cell study is highly sparse with a large number of missing values. This hinders downstream analysis of single-cell expression data. It has been observed that feature engineering significantly improves the analysis outcomes. Feature extraction methods such as principal component analysis and zero-inflated factor analysis have been shown to be useful for subsequent steps of data analysis including clustering. However, too little or no visible efforts have been observed for developing feature selection techniques, which offer transparency for the analyst's consumption. We propose SelfE, a novel l2,0 -minimization algorithm that determines an optimal subset of feature vectors that preserves sub-space structures as observed in the data. We compared SelfE with the commonly used feature selection methods for single-cell expression data analysis.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Algorithms , Cluster Analysis , Sequence Analysis, RNA
16.
J Mol Biol ; 433(19): 167179, 2021 09 17.
Article in English | MEDLINE | ID: mdl-34339725

ABSTRACT

Age-dependent dysregulation of transcription regulatory machinery triggers modulations in the gene expression levels leading to the decline in cellular fitness. Tracking of these transcripts along the temporal axis in multiple species revealed a spectrum of evolutionarily conserved pathways, such as electron transport chain, translation regulation, DNA repair, etc. Recent shreds of evidence suggest that aging deteriorates the transcription machinery itself, indicating the hidden complexity of the aging transcriptomes. This reinforces the need for devising novel computational methods to view aging through the lens of transcriptomics. Here, we present Homeostatic Divergence Score (HDS) to quantify the extent of messenger RNA (mRNA) homeostasis by assessing the balance between spliced and unspliced mRNA repertoire in single cells. We validated its utility in two independent aging datasets, and identified sets of genes undergoing age-related breakdown of transcriptional homeostasis. Moreover, testing of our method on a subpopulation of human embryonic stem cells revealed a set of differentially processed transcripts segregating these subpopulations. Our preliminary analyses in this direction suggest that mRNA processing level information offered by single-cell RNA sequencing (scRNA-seq) data is a superior determinant of chronological age as compared to transcriptional noise.


Subject(s)
Aging/genetics , Computational Biology/methods , Gene Expression Profiling/methods , RNA, Messenger/genetics , Cells, Cultured , Embryonic Stem Cells/chemistry , Embryonic Stem Cells/cytology , Gene Expression Regulation , Homeostasis , Humans , RNA Splicing , Sequence Analysis, RNA , Single-Cell Analysis
17.
J Biol Chem ; 297(2): 100956, 2021 08.
Article in English | MEDLINE | ID: mdl-34265305

ABSTRACT

The molecular mechanisms of olfaction, or the sense of smell, are relatively underexplored compared with other sensory systems, primarily because of its underlying molecular complexity and the limited availability of dedicated predictive computational tools. Odorant receptors (ORs) allow the detection and discrimination of a myriad of odorant molecules and therefore mediate the first step of the olfactory signaling cascade. To date, odorant (or agonist) information for the majority of these receptors is still unknown, limiting our understanding of their functional relevance in odor-induced behavioral responses. In this study, we introduce OdoriFy, a Web server featuring powerful deep neural network-based prediction engines. OdoriFy enables (1) identification of odorant molecules for wildtype or mutant human ORs (Odor Finder); (2) classification of user-provided chemicals as odorants/nonodorants (Odorant Predictor); (3) identification of responsive ORs for a query odorant (OR Finder); and (4) interaction validation using Odorant-OR Pair Analysis. In addition, OdoriFy provides the rationale behind every prediction it makes by leveraging explainable artificial intelligence. This module highlights the basis of the prediction of odorants/nonodorants at atomic resolution and for the ORs at amino acid levels. A key distinguishing feature of OdoriFy is that it is built on a comprehensive repertoire of manually curated information of human ORs with their known agonists and nonagonists, making it a highly interactive and resource-enriched Web server. Moreover, comparative analysis of OdoriFy predictions with an alternative structure-based ligand interaction method revealed comparable results. OdoriFy is available freely as a web service at https://odorify.ahujalab.iiitd.edu.in/olfy/.


Subject(s)
Artificial Intelligence , Odorants , Ligands , Olfactory Receptor Neurons/metabolism , Signal Transduction
18.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34184038

ABSTRACT

Dramatic genomic alterations, either inducible or in a pathological state, dismantle the core regulatory networks, leading to the activation of normally silent genes. Despite possessing immense therapeutic potential, accurate detection of these transcripts is an ever-challenging task, as it requires prior knowledge of the physiological gene expression levels. Here, we introduce EcTracker, an R-/Shiny-based single-cell data analysis web server that bestows a plethora of functionalities that collectively enable the quantitative and qualitative assessments of bona fide cell types or tissue-specific transcripts and, conversely, the ectopically expressed genes in the single-cell ribonucleic acid sequencing datasets. Moreover, it also allows regulon analysis to identify the key transcriptional factors regulating the user-selected gene signatures. To demonstrate the EcTracker functionality, we reanalyzed the CRISPR interference (CRISPRi) dataset of the human embryonic stem cells differentiated into endoderm lineage and identified the prominent enrichment of a specific gene signature in the SMAD2 knockout cells whose identity was ambiguous in the original study. The key distinguishing features of EcTracker lie within its processing speed, availability of multiple add-on modules, interactive graphical user interface and comprehensiveness. In summary, EcTracker provides an easy-to-perform, integrative and end-to-end single-cell data analysis platform that allows decoding of cellular identities, identification of ectopically expressed genes and their regulatory networks, and therefore, collectively imparts a novel dimension for analyzing single-cell datasets.


Subject(s)
Computational Biology , Ectopic Gene Expression , RNA-Seq , Single-Cell Analysis , Software , Computational Biology/methods , Databases, Genetic , Gene Expression Profiling , Gene Regulatory Networks , Humans , Organ Specificity , Single-Cell Analysis/methods , Transcription Factors/metabolism , User-Computer Interface , Web Browser
19.
Genome Res ; 31(4): 689-697, 2021 04.
Article in English | MEDLINE | ID: mdl-33674351

ABSTRACT

Systematic delineation of complex biological systems is an ever-challenging and resource-intensive process. Single-cell transcriptomics allows us to study cell-to-cell variability in complex tissues at an unprecedented resolution. Accurate modeling of gene expression plays a critical role in the statistical determination of tissue-specific gene expression patterns. In the past few years, considerable efforts have been made to identify appropriate parametric models for single-cell expression data. The zero-inflated version of Poisson/negative binomial and log-normal distributions have emerged as the most popular alternatives owing to their ability to accommodate high dropout rates, as commonly observed in single-cell data. Although the majority of the parametric approaches directly model expression estimates, we explore the potential of modeling expression ranks, as robust surrogates for transcript abundance. Here we examined the performance of the discrete generalized beta distribution (DGBD) on real data and devised a Wald-type test for comparing gene expression across two phenotypically divergent groups of single cells. We performed a comprehensive assessment of the proposed method to understand its advantages compared with some of the existing best-practice approaches. We concluded that besides striking a reasonable balance between Type I and Type II errors, ROSeq, the proposed differential expression test, is exceptionally robust to expression noise and scales rapidly with increasing sample size. For wider dissemination and adoption of the method, we created an R package called ROSeq and made it available on the Bioconductor platform.


Subject(s)
Gene Expression Profiling , RNA-Seq , Single-Cell Analysis , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL
...