Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38608194

ABSTRACT

MOTIVATION: Dysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape project, researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates. RESULTS: To circumvent this, we mined ∼18 million PubMed abstracts published till May 2019 and automatically selected ∼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained bidirectional encoder representations from transformers (BERT) for language modeling from the domain of natural language processing to learn vector representation of entities such as genes, diseases, tissues, cell-types, etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in silico synthesis of hypotheses linking different biological entities such as genes and conditions. AVAILABILITY AND IMPLEMENTATION: PathoBERT pretrained model: https://github.com/Priyadarshini-Rai/Pathomap-Model. BioSentVec-based abstract classification model: https://github.com/Priyadarshini-Rai/Pathomap-Model. Pathomap R package: https://github.com/Priyadarshini-Rai/Pathomap.


Subject(s)
Data Mining , Humans , Data Mining/methods , Computational Biology/methods , Natural Language Processing
2.
Genome Res ; 33(1): 80-95, 2023 01.
Article in English | MEDLINE | ID: mdl-36414416

ABSTRACT

The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor in the enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered phenotypes relative to the primary tumor. Intensive research both at the technical and molecular fronts led to the development of assays that ease CTC detection and identification from peripheral blood. Most CTC detection methods based on single-cell RNA sequencing (scRNA-seq) use a mix of size selection, marker-based white blood cell (WBC) depletion, and antibodies targeting tumor-associated antigens. However, the majority of these methods either miss out on atypical CTCs or suffer from WBC contamination. We present unCTC, an R package for unbiased identification and characterization of CTCs from single-cell transcriptomic data. unCTC features many standard and novel computational and statistical modules for various analyses. These include a novel method of scRNA-seq clustering, named deep dictionary learning using k-means clustering cost (DDLK), expression-based copy number variation (CNV) inference, and combinatorial, marker-based verification of the malignant phenotypes. DDLK enables robust segregation of CTCs and WBCs in the pathway space, as opposed to the gene expression space. We validated the utility of unCTC on scRNA-seq profiles of breast CTCs from six patients, captured and profiled using an integrated ClearCell FX and Polaris workflow that works by the principles of size-based separation of CTCs and marker-based WBC depletion.


Subject(s)
Neoplastic Cells, Circulating , Humans , Neoplastic Cells, Circulating/metabolism , Transcriptome , DNA Copy Number Variations , Gene Expression Profiling , Biomarkers, Tumor
3.
Nat Commun ; 13(1): 5680, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36167836

ABSTRACT

Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of high-throughput screening datasets has paved the way for machine learning based personalized therapy recommendations using the molecular profiles of cancer specimens. In this study, we introduce Precily, a predictive modeling approach to infer treatment response in cancers using gene expression data. In this context, we demonstrate the benefits of considering pathway activity estimates in tandem with drug descriptors as features. We apply Precily on single-cell and bulk RNA sequencing data associated with hundreds of cancer cell lines. We then assess the predictability of treatment outcomes using our in-house prostate cancer cell line and xenografts datasets exposed to differential treatment conditions. Further, we demonstrate the applicability of our approach on patient drug response data from The Cancer Genome Atlas and an independent clinical study describing the treatment journey of three melanoma patients. Our findings highlight the importance of chemo-transcriptomics approaches in cancer treatment selection.


Subject(s)
Antineoplastic Agents , Melanoma , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Gene Expression , Humans , Machine Learning , Male , Melanoma/drug therapy , Melanoma/genetics , Sequence Analysis, RNA
4.
Article in English | MEDLINE | ID: mdl-32750851

ABSTRACT

Single-cell RNA sequencing has been proved to be advantageous in discerning molecular heterogeneity in seemingly similar cells in a tissue. Due to the paucity of starting RNA, a large fraction of transcripts fail to amplify during the polymerase chain reaction cycle. This gets compounded by trivial biological noise such as variability in the cell cycle specific genes. As a result expression matrix obtained from a single-cell study is highly sparse with a large number of missing values. This hinders downstream analysis of single-cell expression data. It has been observed that feature engineering significantly improves the analysis outcomes. Feature extraction methods such as principal component analysis and zero-inflated factor analysis have been shown to be useful for subsequent steps of data analysis including clustering. However, too little or no visible efforts have been observed for developing feature selection techniques, which offer transparency for the analyst's consumption. We propose SelfE, a novel l2,0 -minimization algorithm that determines an optimal subset of feature vectors that preserves sub-space structures as observed in the data. We compared SelfE with the commonly used feature selection methods for single-cell expression data analysis.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Algorithms , Cluster Analysis , Sequence Analysis, RNA
SELECTION OF CITATIONS
SEARCH DETAIL
...