Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38608194

ABSTRACT

MOTIVATION: Dysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape project, researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates. RESULTS: To circumvent this, we mined ∼18 million PubMed abstracts published till May 2019 and automatically selected ∼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained bidirectional encoder representations from transformers (BERT) for language modeling from the domain of natural language processing to learn vector representation of entities such as genes, diseases, tissues, cell-types, etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in silico synthesis of hypotheses linking different biological entities such as genes and conditions. AVAILABILITY AND IMPLEMENTATION: PathoBERT pretrained model: https://github.com/Priyadarshini-Rai/Pathomap-Model. BioSentVec-based abstract classification model: https://github.com/Priyadarshini-Rai/Pathomap-Model. Pathomap R package: https://github.com/Priyadarshini-Rai/Pathomap.


Subject(s)
Data Mining , Humans , Data Mining/methods , Computational Biology/methods , Natural Language Processing
2.
Genome Res ; 33(2): 218-231, 2023 02.
Article in English | MEDLINE | ID: mdl-36653120

ABSTRACT

The true benefits of large single-cell transcriptome and epigenome data sets can be realized only with the development of new approaches and search tools for annotating individual cells. Matching a single-cell epigenome profile to a large pool of reference cells remains a major challenge. Here, we present scEpiSearch, which enables searching, comparison, and independent classification of single-cell open-chromatin profiles against a large reference of single-cell expression and open-chromatin data sets. Across performance benchmarks, scEpiSearch outperformed multiple methods in accuracy of search and low-dimensional coembedding of single-cell profiles, irrespective of platforms and species. Here we also demonstrate the unconventional utilities of scEpiSearch by applying it on single-cell epigenome profiles of K562 cells and samples from patients with acute leukaemia to reveal different aspects of their heterogeneity, multipotent behavior, and dedifferentiated states. Applying scEpiSearch on our single-cell open-chromatin profiles from embryonic stem cells (ESCs), we identified ESC subpopulations with more activity and poising for endoplasmic reticulum stress and unfolded protein response. Thus, scEpiSearch solves the nontrivial problem of amalgamating information from a large pool of single cells to identify and study the regulatory states of cells using their single-cell epigenomes.


Subject(s)
Chromatin , Transcriptome , Humans , Chromatin/metabolism , Epigenome , Embryonic Stem Cells/metabolism , Single-Cell Analysis
3.
Genome Res ; 33(1): 80-95, 2023 01.
Article in English | MEDLINE | ID: mdl-36414416

ABSTRACT

The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor in the enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered phenotypes relative to the primary tumor. Intensive research both at the technical and molecular fronts led to the development of assays that ease CTC detection and identification from peripheral blood. Most CTC detection methods based on single-cell RNA sequencing (scRNA-seq) use a mix of size selection, marker-based white blood cell (WBC) depletion, and antibodies targeting tumor-associated antigens. However, the majority of these methods either miss out on atypical CTCs or suffer from WBC contamination. We present unCTC, an R package for unbiased identification and characterization of CTCs from single-cell transcriptomic data. unCTC features many standard and novel computational and statistical modules for various analyses. These include a novel method of scRNA-seq clustering, named deep dictionary learning using k-means clustering cost (DDLK), expression-based copy number variation (CNV) inference, and combinatorial, marker-based verification of the malignant phenotypes. DDLK enables robust segregation of CTCs and WBCs in the pathway space, as opposed to the gene expression space. We validated the utility of unCTC on scRNA-seq profiles of breast CTCs from six patients, captured and profiled using an integrated ClearCell FX and Polaris workflow that works by the principles of size-based separation of CTCs and marker-based WBC depletion.


Subject(s)
Neoplastic Cells, Circulating , Humans , Neoplastic Cells, Circulating/metabolism , Transcriptome , DNA Copy Number Variations , Gene Expression Profiling , Biomarkers, Tumor
4.
Commun Biol ; 5(1): 1231, 2022 11 12.
Article in English | MEDLINE | ID: mdl-36371461

ABSTRACT

Cell-cell communication and physical interactions play a vital role in cancer initiation, homeostasis, progression, and immune response. Here, we report a system that combines live capture of different cell types, co-incubation, time-lapse imaging, and gene expression profiling of doublets using a microfluidic integrated fluidic circuit that enables measurement of physical distances between cells and the associated transcriptional profiles due to cell-cell interactions. We track the temporal variations in natural killer-triple-negative breast cancer cell distances and compare them with terminal cellular transcriptome profiles. The results show the time-bound activities of regulatory modules and allude to the existence of transcriptional memory. Our experimental and bioinformatic approaches serve as a proof of concept for interrogating live-cell interactions at doublet resolution. Together, our findings highlight the use of our approach across different cancers and cell types.


Subject(s)
Transcriptome , Triple Negative Breast Neoplasms , Humans , Microfluidics , Gene Expression Profiling/methods , Gene Expression Regulation
5.
Nat Commun ; 13(1): 5680, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36167836

ABSTRACT

Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of high-throughput screening datasets has paved the way for machine learning based personalized therapy recommendations using the molecular profiles of cancer specimens. In this study, we introduce Precily, a predictive modeling approach to infer treatment response in cancers using gene expression data. In this context, we demonstrate the benefits of considering pathway activity estimates in tandem with drug descriptors as features. We apply Precily on single-cell and bulk RNA sequencing data associated with hundreds of cancer cell lines. We then assess the predictability of treatment outcomes using our in-house prostate cancer cell line and xenografts datasets exposed to differential treatment conditions. Further, we demonstrate the applicability of our approach on patient drug response data from The Cancer Genome Atlas and an independent clinical study describing the treatment journey of three melanoma patients. Our findings highlight the importance of chemo-transcriptomics approaches in cancer treatment selection.


Subject(s)
Antineoplastic Agents , Melanoma , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Gene Expression , Humans , Machine Learning , Male , Melanoma/drug therapy , Melanoma/genetics , Sequence Analysis, RNA
6.
Brief Bioinform ; 23(4)2022 07 18.
Article in English | MEDLINE | ID: mdl-35772850

ABSTRACT

Finding direct dependencies between genetic pathways and diseases has been the target of multiple studies as it has many applications. However, due to cellular heterogeneity and limitations of the number of samples for bulk expression profiles, such studies have faced hurdles in the past. Here, we propose a method to perform single-cell expression-based inference of association between pathway, disease and cell-type (sci-PDC), which can help to understand their cause and effect and guide precision therapy. Our approach highlighted reliable relationships between a few diseases and pathways. Using the example of diabetes, we have demonstrated how sci-PDC helps in tracking variation of association between pathways and diseases with changes in age and species. The variation in pathways-disease associations in mice and humans revealed critical facts about the suitability of the mouse model for a few pathways in the context of diabetes. The coherence between results from our method and previous reports, including information about the drug target pathways, highlights its reliability for multidimensional utility.


Subject(s)
Disease , Genetic Profile , Animals , Disease/genetics , Humans , Mice
8.
Nucleic Acids Res ; 49(3): e13, 2021 02 22.
Article in English | MEDLINE | ID: mdl-33275158

ABSTRACT

Recent advances in single-cell open-chromatin and transcriptome profiling have created a challenge of exploring novel applications with a meaningful transformation of read-counts, which often have high variability in noise and drop-out among cells. Here, we introduce UniPath, for representing single-cells using pathway and gene-set enrichment scores by a transformation of their open-chromatin or gene-expression profiles. The robust statistical approach of UniPath provides high accuracy, consistency and scalability in estimating gene-set enrichment scores for every cell. Its framework provides an easy solution for handling variability in drop-out rate, which can sometimes create artefact due to systematic patterns. UniPath provides an alternative approach of dimension reduction of single-cell open-chromatin profiles. UniPath's approach of predicting temporal-order of single-cells using their pathway enrichment scores enables suppression of covariates to achieve correct order of cells. Analysis of mouse cell atlas using our approach yielded surprising, albeit biologically-meaningful co-clustering of cell-types from distant organs. By enabling an unconventional method of exploiting pathway co-occurrence to compare two groups of cells, our approach also proves to be useful in inferring context-specific regulations in cancer cells. Available at https://reggenlab.github.io/UniPathWeb/.


Subject(s)
Epigenomics/methods , RNA-Seq/methods , Single-Cell Analysis/methods , Animals , Cell Line, Tumor , Chromatin , Cluster Analysis , Epigenome , Genes , Humans , Mice , Neoplasms/genetics
9.
BMC Genomics ; 21(1): 744, 2020 Oct 27.
Article in English | MEDLINE | ID: mdl-33287695

ABSTRACT

BACKGROUND: Early diagnosis is crucial for effective medical management of cancer patients. Tissue biopsy has been widely used for cancer diagnosis, but its invasive nature limits its application, especially when repeated biopsies are needed. Over the past few years, genomic explorations have led to the discovery of various blood-based biomarkers. Tumor Educated Platelets (TEPs) have, of late, generated considerable interest due to their ability to infer tumor existence and subtype accurately. So far, a majority of the studies involving TEPs have offered marker-panels consisting of several hundreds of genes. Profiling large numbers of genes incur a significant cost, impeding its diagnostic adoption. As such, it is important to construct minimalistic molecular signatures comprising a small number of genes. RESULTS: To address the aforesaid challenges, we analyzed publicly available TEP expression profiles and identified a panel of 11 platelet-genes that reliably discriminates between cancer and healthy samples. To validate its efficacy, we chose non-small cell lung cancer (NSCLC), the most prevalent type of lung malignancy. When applied to platelet-gene expression data from a published study, our machine learning model could accurately discriminate between non-metastatic NSCLC cases and healthy samples. We further experimentally validated the panel on an in-house cohort of metastatic NSCLC patients and healthy controls via real-time quantitative Polymerase Chain Reaction (RT-qPCR) (AUC = 0.97). Model performance was boosted significantly after artificial data-augmentation using the EigenSample method (AUC = 0.99). Lastly, we demonstrated the cancer-specificity of the proposed gene-panel by benchmarking it on platelet transcriptomes from patients with Myocardial Infarction (MI). CONCLUSION: We demonstrated an end-to-end bioinformatic plus experimental workflow for identifying a minimal set of TEP associated marker-genes that are predictive of the existence of cancers. We also discussed a strategy for boosting the predictive model performance by artificial augmentation of gene expression data.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Biomarkers, Tumor/genetics , Blood Platelets , Carcinoma, Non-Small-Cell Lung/diagnosis , Carcinoma, Non-Small-Cell Lung/genetics , Gene Expression Profiling , Humans , Lung Neoplasms/diagnosis , Lung Neoplasms/genetics
10.
BMC Genomics ; 21(1): 877, 2020 Dec 08.
Article in English | MEDLINE | ID: mdl-33292182

ABSTRACT

An amendment to this paper has been published and can be accessed via the original article.

SELECTION OF CITATIONS
SEARCH DETAIL
...