Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 44
Filter
1.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38608194

ABSTRACT

MOTIVATION: Dysregulation of a gene's function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever-daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the advent of high throughput genomics platforms and community scale initiatives such as the Human Cell Landscape project, researchers have been able to create gene expression portraits of healthy tissues resolved at the level of single cells. However, a similar wealth of knowledge is currently not at our finger-tip when it comes to diseases. This is because the genetic manifestation of a disease is often quite diverse and is confounded by several clinical and demographic covariates. RESULTS: To circumvent this, we mined ∼18 million PubMed abstracts published till May 2019 and automatically selected ∼4.5 million of them that describe roles of particular genes in disease pathogenesis. Further, we fine-tuned the pretrained bidirectional encoder representations from transformers (BERT) for language modeling from the domain of natural language processing to learn vector representation of entities such as genes, diseases, tissues, cell-types, etc., in a way such that their relationship is preserved in a vector space. The repurposed BERT predicted disease-gene associations that are not cited in the training data, thereby highlighting the feasibility of in silico synthesis of hypotheses linking different biological entities such as genes and conditions. AVAILABILITY AND IMPLEMENTATION: PathoBERT pretrained model: https://github.com/Priyadarshini-Rai/Pathomap-Model. BioSentVec-based abstract classification model: https://github.com/Priyadarshini-Rai/Pathomap-Model. Pathomap R package: https://github.com/Priyadarshini-Rai/Pathomap.


Subject(s)
Data Mining , Humans , Data Mining/methods , Computational Biology/methods , Natural Language Processing
2.
IEEE J Biomed Health Inform ; 27(5): 2565-2574, 2023 05.
Article in English | MEDLINE | ID: mdl-37027562

ABSTRACT

Co-administration of two or more drugs simultaneously can result in adverse drug reactions. Identifying drug-drug interactions (DDIs) is necessary, especially for drug development and for repurposing old drugs. DDI prediction can be viewed as a matrix completion task, for which matrix factorization (MF) appears as a suitable solution. This paper presents a novel Graph Regularized Probabilistic Matrix Factorization (GRPMF) method, which incorporates expert knowledge through a novel graph-based regularization strategy within an MF framework. An efficient and sounded optimization algorithm is proposed to solve the resulting non-convex problem in an alternating fashion. The performance of the proposed method is evaluated through the DrugBank dataset, and comparisons are provided against state-of-the-art techniques. The results demonstrate the superior performance of GRPMF when compared to its counterparts.


Subject(s)
Algorithms , Drug-Related Side Effects and Adverse Reactions , Humans , Drug Interactions , Pharmaceutical Preparations
3.
Genome Res ; 33(1): 80-95, 2023 01.
Article in English | MEDLINE | ID: mdl-36414416

ABSTRACT

The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor in the enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered phenotypes relative to the primary tumor. Intensive research both at the technical and molecular fronts led to the development of assays that ease CTC detection and identification from peripheral blood. Most CTC detection methods based on single-cell RNA sequencing (scRNA-seq) use a mix of size selection, marker-based white blood cell (WBC) depletion, and antibodies targeting tumor-associated antigens. However, the majority of these methods either miss out on atypical CTCs or suffer from WBC contamination. We present unCTC, an R package for unbiased identification and characterization of CTCs from single-cell transcriptomic data. unCTC features many standard and novel computational and statistical modules for various analyses. These include a novel method of scRNA-seq clustering, named deep dictionary learning using k-means clustering cost (DDLK), expression-based copy number variation (CNV) inference, and combinatorial, marker-based verification of the malignant phenotypes. DDLK enables robust segregation of CTCs and WBCs in the pathway space, as opposed to the gene expression space. We validated the utility of unCTC on scRNA-seq profiles of breast CTCs from six patients, captured and profiled using an integrated ClearCell FX and Polaris workflow that works by the principles of size-based separation of CTCs and marker-based WBC depletion.


Subject(s)
Neoplastic Cells, Circulating , Humans , Neoplastic Cells, Circulating/metabolism , Transcriptome , DNA Copy Number Variations , Gene Expression Profiling , Biomarkers, Tumor
4.
Nat Commun ; 13(1): 5680, 2022 09 27.
Article in English | MEDLINE | ID: mdl-36167836

ABSTRACT

Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of high-throughput screening datasets has paved the way for machine learning based personalized therapy recommendations using the molecular profiles of cancer specimens. In this study, we introduce Precily, a predictive modeling approach to infer treatment response in cancers using gene expression data. In this context, we demonstrate the benefits of considering pathway activity estimates in tandem with drug descriptors as features. We apply Precily on single-cell and bulk RNA sequencing data associated with hundreds of cancer cell lines. We then assess the predictability of treatment outcomes using our in-house prostate cancer cell line and xenografts datasets exposed to differential treatment conditions. Further, we demonstrate the applicability of our approach on patient drug response data from The Cancer Genome Atlas and an independent clinical study describing the treatment journey of three melanoma patients. Our findings highlight the importance of chemo-transcriptomics approaches in cancer treatment selection.


Subject(s)
Antineoplastic Agents , Melanoma , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Gene Expression , Humans , Machine Learning , Male , Melanoma/drug therapy , Melanoma/genetics , Sequence Analysis, RNA
5.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3332-3339, 2022.
Article in English | MEDLINE | ID: mdl-35816539

ABSTRACT

Investigation of existing drugs is an effective alternative to the discovery of new drugs for treating diseases. This task of drug re-positioning can be assisted by various kinds of computational methods to predict the best indication for a drug given the open-source biological datasets. Owing to the fact that similar drugs tend to have common pathways and disease indications, the association matrix is assumed to be of low-rank structure. Hence, the problem of drug-disease association prediction can be modeled as a low-rank matrix completion problem. In this work, we propose a novel matrix completion framework that makes use of the side-information associated with drugs/diseases for the prediction of drug-disease indications modeled as neighborhood graph: Graph regularized 1-bit matrix completion (GR1BMC). The algorithm is specially designed for binary data and uses parallel proximal algorithm to solve the aforesaid minimization problem taking into account all the constraints including the neighborhood graph incorporation and restricting predicted scores within the specified range. The results have been validated on two standard databases by evaluating the AUC across the 10-fold cross-validation splits. The usage of the method is also evaluated through a case study where top 5 indications are predicted for novel drugs, which then are verified with the CTD database.


Subject(s)
Algorithms , Computational Biology , Computational Biology/methods , Research Design , Databases, Factual , Data Management
6.
J Comput Biol ; 29(5): 441-452, 2022 05.
Article in English | MEDLINE | ID: mdl-35394368

ABSTRACT

This study formulates antiviral repositioning as a matrix completion problem wherein the antiviral drugs are along the rows and the viruses are along the columns. The input matrix is partially filled, with ones in positions where the antiviral drug has been known to be effective against a virus. The curated metadata for antivirals (chemical structure and pathways) and viruses (genomic structure and symptoms) are encoded into our matrix completion framework as graph Laplacian regularization. We then frame the resulting multiple graph regularized matrix completion (GRMC) problem as deep matrix factorization. This is solved by using a novel optimization method called HyPALM (Hybrid Proximal Alternating Linearized Minimization). Results of our curated RNA drug-virus association data set show that the proposed approach excels over state-of-the-art GRMC techniques. When applied to in silico prediction of antivirals for COVID-19, our approach returns antivirals that are either used for treating patients or are under trials for the same.


Subject(s)
COVID-19 Drug Treatment , Algorithms , Antiviral Agents/pharmacology , Antiviral Agents/therapeutic use , Humans
7.
Article in English | MEDLINE | ID: mdl-32750851

ABSTRACT

Single-cell RNA sequencing has been proved to be advantageous in discerning molecular heterogeneity in seemingly similar cells in a tissue. Due to the paucity of starting RNA, a large fraction of transcripts fail to amplify during the polymerase chain reaction cycle. This gets compounded by trivial biological noise such as variability in the cell cycle specific genes. As a result expression matrix obtained from a single-cell study is highly sparse with a large number of missing values. This hinders downstream analysis of single-cell expression data. It has been observed that feature engineering significantly improves the analysis outcomes. Feature extraction methods such as principal component analysis and zero-inflated factor analysis have been shown to be useful for subsequent steps of data analysis including clustering. However, too little or no visible efforts have been observed for developing feature selection techniques, which offer transparency for the analyst's consumption. We propose SelfE, a novel l2,0 -minimization algorithm that determines an optimal subset of feature vectors that preserves sub-space structures as observed in the data. We compared SelfE with the commonly used feature selection methods for single-cell expression data analysis.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Algorithms , Cluster Analysis , Sequence Analysis, RNA
8.
Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 886-889, 2021 11.
Article in English | MEDLINE | ID: mdl-34891432

ABSTRACT

Electrocardiogram (ECG) is one of the fundamental markers to detect different cardiovascular diseases (CVDs). Owing to the widespread availability of ECG sensors (single lead) as well as smartwatches with ECG recording capability, ECG classification using wearable devices to detect different CVDs has become a basic requirement for a smart healthcare ecosystem. In this paper, we propose a novel method of model compression with robust detection capability for CVDs from ECG signals such that the sophisticated and effective baseline deep neural network model can be optimized for the resource constrained micro-controller platform suitable for wearable devices while minimizing the performance loss. We employ knowledge distillation-based model compression approach where the baseline (teacher) deep neural network model is compressed to a TinyML (student) model using piecewise linear approximation. Our proposed ECG TinyML has achieved ~156x compression factor to suit to the requirement of 100KB memory availability for model deployment on wearable devices. The proposed model requires ~5782 times (estimated) less computational load than state-of-the-art residual neural network (ResNet) model with negligible performance loss (less than 1% loss in test accuracy, test sensitivity, test precision and test F1-score). We further feel that the small footprint model size of ECG TinyML (62.3 KB) can be suitably deployed in implantable devices including implantable loop recorder (ILR).


Subject(s)
Cardiovascular Diseases , Data Compression , Wearable Electronic Devices , Ecosystem , Electrocardiography , Humans
9.
Sci Rep ; 11(1): 9047, 2021 04 27.
Article in English | MEDLINE | ID: mdl-33907209

ABSTRACT

The year 2020 witnessed a heavy death toll due to COVID-19, calling for a global emergency. The continuous ongoing research and clinical trials paved the way for vaccines. But, the vaccine efficacy in the long run is still questionable due to the mutating coronavirus, which makes drug re-positioning a reasonable alternative. COVID-19 has hence fast-paced drug re-positioning for the treatment of COVID-19 and its symptoms. This work builds computational models using matrix completion techniques to predict drug-virus association for drug re-positioning. The aim is to assist clinicians with a tool for selecting prospective antiviral treatments. Since the virus is known to mutate fast, the tool is likely to help clinicians in selecting the right set of antivirals for the mutated isolate. The main contribution of this work is a manually curated database publicly shared, comprising of existing associations between viruses and their corresponding antivirals. The database gathers similarity information using the chemical structure of drugs and the genomic structure of viruses. Along with this database, we make available a set of state-of-the-art computational drug re-positioning tools based on matrix completion. The tools are first analysed on a standard set of experimental protocols for drug target interactions. The best performing ones are applied for the task of re-positioning antivirals for COVID-19. These tools select six drugs out of which four are currently under various stages of trial, namely Remdesivir (as a cure), Ribavarin (in combination with others for cure), Umifenovir (as a prophylactic and cure) and Sofosbuvir (as a cure). Another unanimous prediction is Tenofovir alafenamide, which is a novel Tenofovir prodrug developed in order to improve renal safety when compared to its original counterpart (older version) Tenofovir disoproxil. Both are under trail, the former as a cure and the latter as a prophylactic. These results establish that the computational methods are in sync with the state-of-practice. We also demonstrate how the drugs to be used against the virus would vary as SARS-Cov-2 mutates over time by predicting the drugs for the mutated strains, suggesting the importance of such a tool in drug prediction. We believe this work would open up possibilities for applying machine learning models to clinical research for drug-virus association prediction and other similar biological problems.


Subject(s)
Antiviral Agents/therapeutic use , COVID-19 Drug Treatment , Algorithms , Area Under Curve , COVID-19/virology , Databases, Factual , Drug Repositioning , Evolution, Molecular , Humans , Mutation , ROC Curve , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification
10.
Neural Netw ; 128: 248-253, 2020 Aug.
Article in English | MEDLINE | ID: mdl-32454369

ABSTRACT

Transform learning is a new representation learning framework where we learn an operator/transform that analyses the data to generate the coefficient/representation. We propose a variant of it called the graph transform learning; in this we explicitly account for the correlation in the dataset in terms of graph Laplacian. We will give two variants; in the first one the graph is computed from the data and fixed during the operation. In the second, the graph is learnt iteratively from the data during operation. The first technique will be applied for clustering, and the second one for solving inverse problems.


Subject(s)
Magnetic Resonance Imaging/methods , Unsupervised Machine Learning , Algorithms , Cluster Analysis , Humans , Magnetic Resonance Imaging/trends , Problem Solving , Unsupervised Machine Learning/trends
11.
PLoS One ; 15(1): e0226484, 2020.
Article in English | MEDLINE | ID: mdl-31945078

ABSTRACT

The identification of potential interactions between drugs and target proteins is crucial in pharmaceutical sciences. The experimental validation of interactions in genomic drug discovery is laborious and expensive; hence, there is a need for efficient and accurate in-silico techniques which can predict potential drug-target interactions to narrow down the search space for experimental verification. In this work, we propose a new framework, namely, Multi-Graph Regularized Nuclear Norm Minimization, which predicts the interactions between drugs and target proteins from three inputs: known drug-target interaction network, similarities over drugs and those over targets. The proposed method focuses on finding a low-rank interaction matrix that is structured by the proximities of drugs and targets encoded by graphs. Previous works on Drug Target Interaction (DTI) prediction have shown that incorporating drug and target similarities helps in learning the data manifold better by preserving the local geometries of the original data. But, there is no clear consensus on which kind and what combination of similarities would best assist the prediction task. Hence, we propose to use various multiple drug-drug similarities and target-target similarities as multiple graph Laplacian (over drugs/targets) regularization terms to capture the proximities exhaustively. Extensive cross-validation experiments on four benchmark datasets using standard evaluation metrics (AUPR and AUC) show that the proposed algorithm improves the predictive performance and outperforms recent state-of-the-art computational methods by a large margin. Software is publicly available at https://github.com/aanchalMongia/MGRNNMforDTI.


Subject(s)
Algorithms , Computer Graphics , Drug Development/methods , Drug Discovery/methods , Drug Interactions , Pharmaceutical Preparations/metabolism , Proteins/metabolism , Computer Simulation , Humans , Pharmaceutical Preparations/chemistry , Proteins/chemistry
12.
NAR Genom Bioinform ; 2(4): lqaa091, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33575635

ABSTRACT

The advent of single-cell open-chromatin profiling technology has facilitated the analysis of heterogeneity of activity of regulatory regions at single-cell resolution. However, stochasticity and availability of low amount of relevant DNA, cause high drop-out rate and noise in single-cell open-chromatin profiles. We introduce here a robust method called as forest of imputation trees (FITs) to recover original signals from highly sparse and noisy single-cell open-chromatin profiles. FITs makes multiple imputation trees to avoid bias during the restoration of read-count matrices. It resolves the challenging issue of recovering open chromatin signals without blurring out information at genomic sites with cell-type-specific activity. Besides visualization and classification, FITs-based imputation also improved accuracy in the detection of enhancers, calculating pathway enrichment score and prediction of chromatin-interactions. FITs is generalized for wider applicability, especially for highly sparse read-count matrices. The superiority of FITs in recovering signals of minority cells also makes it highly useful for single-cell open-chromatin profile from in vivo samples. The software is freely available at https://reggenlab.github.io/FITs/.

13.
J Comput Biol ; 27(7): 1011-1019, 2020 07.
Article in English | MEDLINE | ID: mdl-31657645

ABSTRACT

Single-cell RNA-seq has inspired new discoveries and innovation in the field of developmental and cell biology for the past few years and is useful for studying cellular responses at individual cell resolution. But, due to the paucity of starting RNA, the data acquired have dropouts. To address this, we propose a deep matrix factorization-based method, deepMc, to impute missing values in gene expression data. For the deep architecture of our approach, we draw our motivation from great success of deep learning in solving various machine learning problems. In this study, we support our method with positive results on several evaluation metrics such as clustering of cell populations, differential expression analysis, and cell type separability.


Subject(s)
Computational Biology/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Animals , Blastocyst/cytology , Deep Learning , HEK293 Cells , Humans , Jurkat Cells , Mice , Sequence Analysis, RNA/statistics & numerical data , Single-Cell Analysis/statistics & numerical data
14.
Neural Netw ; 118: 271-279, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31326661

ABSTRACT

Recurrent neural networks (RNN) model time series by feeding back the representation from the previous time instant as an input for the current instant along with exogenous inputs. Two main shortcomings of RNN are - 1. The problem of vanishing gradients while backpropagating through time, and 2. Inability to learn in an unsupervised manner. Variants like long-short term memory (LSTM) network and gated recurrent units (GRU) have partially circumvented the first issue; the second issue still remains. In this work we propose a new variant of RNN based on the transform learning model - named recurrent transform learning (RTL). It can learn in an unsupervised, supervised and semi-supervised fashion; it does not require backpropagation and hence do not suffer from the pitfalls of vanishing gradients. The proposed model is applied on a real-life example of short-term load forecasting, where we show that RTL improves over existing variants of RNN as well as on a state-of-the-art technique in load forecasting based on sparse coding.


Subject(s)
Machine Learning , Neural Networks, Computer , Forecasting
15.
Front Genet ; 10: 9, 2019.
Article in English | MEDLINE | ID: mdl-30761179

ABSTRACT

Motivation: Single-cell RNA sequencing has been proved to be revolutionary for its potential of zooming into complex biological systems. Genome-wide expression analysis at single-cell resolution provides a window into dynamics of cellular phenotypes. This facilitates the characterization of transcriptional heterogeneity in normal and diseased tissues under various conditions. It also sheds light on the development or emergence of specific cell populations and phenotypes. However, owing to the paucity of input RNA, a typical single cell RNA sequencing data features a high number of dropout events where transcripts fail to get amplified. Results: We introduce mcImpute, a low-rank matrix completion based technique to impute dropouts in single cell expression data. On a number of real datasets, application of mcImpute yields significant improvements in the separation of true zeros from dropouts, cell-clustering, differential expression analysis, cell type separability, the performance of dimensionality reduction techniques for cell visualization, and gene distribution. Availability and Implementation: https://github.com/aanchalMongia/McImpute_scRNAseq.

16.
Sci Rep ; 8(1): 16329, 2018 11 05.
Article in English | MEDLINE | ID: mdl-30397240

ABSTRACT

The emergence of single-cell RNA sequencing (scRNA-seq) technologies has enabled us to measure the expression levels of thousands of genes at single-cell resolution. However, insufficient quantities of starting RNA in the individual cells cause significant dropout events, introducing a large number of zero counts in the expression matrix. To circumvent this, we developed an autoencoder-based sparse gene expression matrix imputation method. AutoImpute, which learns the inherent distribution of the input scRNA-seq data and imputes the missing values accordingly with minimal modification to the biologically silent genes. When tested on real scRNA-seq datasets, AutoImpute performed competitively wrt., the existing single-cell imputation methods, on the grounds of expression recovery from subsampled data, cell-clustering accuracy, variance stabilization and cell-type separability.


Subject(s)
Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Analysis of Variance , Automation , Cluster Analysis , Gene Expression Profiling
17.
Neural Netw ; 106: 271-280, 2018 Oct.
Article in English | MEDLINE | ID: mdl-30099322

ABSTRACT

In this work, we introduce the graph regularized autoencoder. We propose three variants. The first one is the unsupervised version. The second one is tailored for clustering, by incorporating subspace clustering terms into the autoencoder formulation. The third is a supervised label consistent autoencoder suitable for single label and multi-label classification problems. Each of these has been compared with the state-of-the-art on benchmark datasets. The problems addressed here are image denoising, clustering and classification. Our proposed methods excel of the existing techniques in all of the problems.


Subject(s)
Pattern Recognition, Automated/methods , Photic Stimulation/methods , Algorithms , Cluster Analysis , Humans
18.
Article in English | MEDLINE | ID: mdl-29994276

ABSTRACT

The term ``blind denoising'' refers to the fact that the basis used for denoising is learned from the noisy sample itself during denoising. Dictionary learning- and transform learning-based formulations for blind denoising are well known. But there has been no autoencoder-based solution for the said blind denoising approach. So far, autoencoder-based denoising formulations have learned the model on a separate training data and have used the learned model to denoise test samples. Such a methodology fails when the test image (to denoise) is not of the same kind as the models learned with. This will be the first work, where we learn the autoencoder from the noisy sample while denoising. Experimental results show that our proposed method performs better than dictionary learning (K-singular value decomposition), transform learning, sparse stacked denoising autoencoder, and the gold standard BM3D algorithm.

19.
Magn Reson Imaging ; 52: 62-68, 2018 10.
Article in English | MEDLINE | ID: mdl-29883751

ABSTRACT

This work proposes a new formulation for image reconstruction based on the autoencoder framework. The work follows the adaptive approach used in prior dictionary and transform learning based reconstruction techniques. Existing autoencoder based reconstructions are non-adaptive; they are trained on a separate training set and applied on another. In this work, the autoencoder is learnt from the patches of the image it is reconstructing. Experimental studies on MRI reconstruction shows that the proposed method outperforms state-of-the-art methods in dictionary learning, transform learning and (non-adaptive) autoencoder based approaches.


Subject(s)
Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Spinal Cord/diagnostic imaging , Algorithms , Animals , Data Compression/methods , Female , Models, Animal , Rats , Rats, Sprague-Dawley
20.
Magn Reson Imaging ; 45: 105-112, 2018 01.
Article in English | MEDLINE | ID: mdl-28964843

ABSTRACT

In multi-echo imaging, multiple T1/T2 weighted images of the same cross section is acquired. Acquiring multiple scans is time consuming. In order to accelerate, compressed sensing based techniques have been proposed. In recent times, it has been observed in several areas of traditional compressed sensing, that instead of using fixed basis (wavelet, DCT etc.), considerably better results can be achieved by learning the basis adaptively from the data. Motivated by these studies, we propose to employ such adaptive learning techniques to improve reconstruction of multi-echo scans. This work will be based on two basis learning models - synthesis (better known as dictionary learning) and analysis (known as transform learning). We modify these basic methods by incorporating structure of the multi-echo scans. Our work shows that we can indeed significantly improve multi-echo imaging over compressed sensing based techniques and other unstructured adaptive sparse recovery methods.


Subject(s)
Image Interpretation, Computer-Assisted , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Spine/anatomy & histology , Algorithms , Animals , Female , Humans , Rats , Rats, Sprague-Dawley , Signal-To-Noise Ratio
SELECTION OF CITATIONS
SEARCH DETAIL
...