Search | VHL Regional Portal

Literature mining discerns latent disease-gene relationships.

Rai, Priyadarshini; Jain, Atishay; Kumar, Shivani; Sharma, Divya; Jha, Neha; Chawla, Smriti; Raj, Abhijit; Gupta, Apoorva; Poonia, Sarita; Majumdar, Angshul; Chakraborty, Tanmoy; Ahuja, Gaurav; Sengupta, Debarka.

Bioinformatics ; 40(4)2024 Mar 29.

Article in English | MEDLINE | ID: mdl-38608194

ABSTRACT

MOTIVATION: Dysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape project, researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates. RESULTS: To circumvent this, we mined â¼18 million PubMed abstracts published till May 2019 and automatically selected â¼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained bidirectional encoder representations from transformers (BERT) for language modeling from the domain of natural language processing to learn vector representation of entities such as genes, diseases, tissues, cell-types, etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in silico synthesis of hypotheses linking different biological entities such as genes and conditions. AVAILABILITY AND IMPLEMENTATION: PathoBERT pretrained model: https://github.com/Priyadarshini-Rai/Pathomap-Model. BioSentVec-based abstract classification model: https://github.com/Priyadarshini-Rai/Pathomap-Model. Pathomap R package: https://github.com/Priyadarshini-Rai/Pathomap.

Subject(s)

Data Mining , Humans , Data Mining/methods , Computational Biology/methods , Natural Language Processing

Marker-free characterization of full-length transcriptomes of single live circulating tumor cells.

Poonia, Sarita; Goel, Anurag; Chawla, Smriti; Bhattacharya, Namrata; Rai, Priyadarshini; Lee, Yi Fang; Yap, Yoon Sim; West, Jay; Bhagat, Ali Asgar; Tayal, Juhi; Mehta, Anurag; Ahuja, Gaurav; Majumdar, Angshul; Ramalingam, Naveen; Sengupta, Debarka.

Genome Res ; 33(1): 80-95, 2023 01.

Article in English | MEDLINE | ID: mdl-36414416

ABSTRACT

The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor in the enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered phenotypes relative to the primary tumor. Intensive research both at the technical and molecular fronts led to the development of assays that ease CTC detection and identification from peripheral blood. Most CTC detection methods based on single-cell RNA sequencing (scRNA-seq) use a mix of size selection, marker-based white blood cell (WBC) depletion, and antibodies targeting tumor-associated antigens. However, the majority of these methods either miss out on atypical CTCs or suffer from WBC contamination. We present unCTC, an R package for unbiased identification and characterization of CTCs from single-cell transcriptomic data. unCTC features many standard and novel computational and statistical modules for various analyses. These include a novel method of scRNA-seq clustering, named deep dictionary learning using k-means clustering cost (DDLK), expression-based copy number variation (CNV) inference, and combinatorial, marker-based verification of the malignant phenotypes. DDLK enables robust segregation of CTCs and WBCs in the pathway space, as opposed to the gene expression space. We validated the utility of unCTC on scRNA-seq profiles of breast CTCs from six patients, captured and profiled using an integrated ClearCell FX and Polaris workflow that works by the principles of size-based separation of CTCs and marker-based WBC depletion.

Subject(s)

Neoplastic Cells, Circulating , Humans , Neoplastic Cells, Circulating/metabolism , Transcriptome , DNA Copy Number Variations , Gene Expression Profiling , Biomarkers, Tumor

Gene expression based inference of cancer drug sensitivity.

Chawla, Smriti; Rockstroh, Anja; Lehman, Melanie; Ratther, Ellca; Jain, Atishay; Anand, Anuneet; Gupta, Apoorva; Bhattacharya, Namrata; Poonia, Sarita; Rai, Priyadarshini; Das, Nirjhar; Majumdar, Angshul; Ahuja, Gaurav; Hollier, Brett G; Nelson, Colleen C; Sengupta, Debarka.

Nat Commun ; 13(1): 5680, 2022 09 27.

Article in English | MEDLINE | ID: mdl-36167836

ABSTRACT

Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of high-throughput screening datasets has paved the way for machine learning based personalized therapy recommendations using the molecular profiles of cancer specimens. In this study, we introduce Precily, a predictive modeling approach to infer treatment response in cancers using gene expression data. In this context, we demonstrate the benefits of considering pathway activity estimates in tandem with drug descriptors as features. We apply Precily on single-cell and bulk RNA sequencing data associated with hundreds of cancer cell lines. We then assess the predictability of treatment outcomes using our in-house prostate cancer cell line and xenografts datasets exposed to differential treatment conditions. Further, we demonstrate the applicability of our approach on patient drug response data from The Cancer Genome Atlas and an independent clinical study describing the treatment journey of three melanoma patients. Our findings highlight the importance of chemo-transcriptomics approaches in cancer treatment selection.

Subject(s)

Antineoplastic Agents , Melanoma , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Gene Expression , Humans , Machine Learning , Male , Melanoma/drug therapy , Melanoma/genetics , Sequence Analysis, RNA

Staging System to Predict the Risk of Relapse in Multiple Myeloma Patients Undergoing Autologous Stem Cell Transplantation.

Goswami, Chitrita; Poonia, Sarita; Kumar, Lalit; Sengupta, Debarka.

Front Oncol ; 9: 633, 2019.

Article in English | MEDLINE | ID: mdl-31355145

ABSTRACT

Over the last decade autologous stem cell transplantation (ASCT) has emerged as the standard of care in the management of Multiple Myeloma (MM). However, the cases of early relapse (within 36 months) after the stem cell rescue remains a significant challenge. For a lot of practical purposes, it is crucial to identify whether a patient undergoing ASCT falls into the high-risk group (likely to relapse within 36 months) or a low risk one. Our analysis showed that existing MM staging systems (International Staging System or ISS and Durie Salmon Staging or DSS) are not sufficient to discriminate between the risk groups significantly. To address this, we gathered a total of 39 clinical and laboratory parameters of 347 patients from the Department of Medical Oncology of All India Institute of Medical Sciences (AIIMS). We employed a stacked machine learning model consisting spectral clustering and Fast and Frugal Tree (FFT) technique to come up with a 3-factor multivariate 2-stage staging scheme, which turns out to be extremely decisive about the outcome of the stem cell rescue. Our model comes up with a three-factor (1. if patients has relapsed following remission, 2. response to induction, 3. pre-transplant Glomerular Filtration Rate or GFR) staging scheme. The resulting model stratifies patients into high-risk and low-risk groups with markedly distinct progression-free (median survival-24 months vs. 91 months) and overall survival (median survival-51 months vs. 135 months) patterns.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL