Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38980373

ABSTRACT

Inferring gene regulatory networks (GRNs) allows us to obtain a deeper understanding of cellular function and disease pathogenesis. Recent advances in single-cell RNA sequencing (scRNA-seq) technology have improved the accuracy of GRN inference. However, many methods for inferring individual GRNs from scRNA-seq data are limited because they overlook intercellular heterogeneity and similarities between different cell subpopulations, which are often present in the data. Here, we propose a deep learning-based framework, DeepGRNCS, for jointly inferring GRNs across cell subpopulations. We follow the commonly accepted hypothesis that the expression of a target gene can be predicted based on the expression of transcription factors (TFs) due to underlying regulatory relationships. We initially processed scRNA-seq data by discretizing data scattering using the equal-width method. Then, we trained deep learning models to predict target gene expression from TFs. By individually removing each TF from the expression matrix, we used pre-trained deep model predictions to infer regulatory relationships between TFs and genes, thereby constructing the GRN. Our method outperforms existing GRN inference methods for various simulated and real scRNA-seq datasets. Finally, we applied DeepGRNCS to non-small cell lung cancer scRNA-seq data to identify key genes in each cell subpopulation and analyzed their biological relevance. In conclusion, DeepGRNCS effectively predicts cell subpopulation-specific GRNs. The source code is available at https://github.com/Nastume777/DeepGRNCS.


Subject(s)
Deep Learning , Gene Regulatory Networks , Single-Cell Analysis , Humans , Single-Cell Analysis/methods , Transcription Factors/genetics , Transcription Factors/metabolism , Computational Biology/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods
2.
Front Pharmacol ; 13: 936758, 2022.
Article in English | MEDLINE | ID: mdl-36081949

ABSTRACT

Lung cancer is the leading cause of cancer deaths globally, and lung adenocarcinoma (LUAD) is the most common type of lung cancer. Gene dysregulation plays an essential role in the development of LUAD. Drug repositioning based on associations between drug target genes and LUAD target genes are useful to discover potential new drugs for the treatment of LUAD, while also reducing the monetary and time costs of new drug discovery and development. Here, we developed a pipeline based on machine learning to predict potential LUAD-related target genes through established graph attention networks (GATs). We then predicted potential drugs for the treatment of LUAD through gene coincidence-based and gene network distance-based methods. Using data from 535 LUAD tissue samples and 59 precancerous tissue samples from The Cancer Genome Atlas, 48,597 genes were identified and used for the prediction model building of the GAT. The GAT model achieved good predictive performance, with an area under the receiver operating characteristic curve of 0.90. 1,597 potential LUAD-related genes were identified from the GAT model. These LUAD-related genes were then used for drug repositioning. The gene overlap and network distance with the target genes were calculated for 3,070 drugs and 672 preclinical compounds approved by the US Food and Drug Administration. At which, bromoethylamine was predicted as a novel potential preclinical compound for the treatment of LUAD, and cimetidine and benzbromarone were predicted as potential therapeutic drugs for LUAD. The pipeline established in this study presents new approach for developing targeted therapies for LUAD.

3.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35021191

ABSTRACT

Networks consisting of molecular interactions are intrinsically dynamical systems of an organism. These interactions curated in molecular interaction databases are still not complete and contain false positives introduced by high-throughput screening experiments. In this study, we propose a framework to integrate interactions of functional associated protein-coding genes from 31 data sources to reconstruct a network with high coverage and quality. For each interaction, 369 features were constructed including properties of both the interaction and the involved genes. The training and validation sets were built on the pathway interactions as positives and the potential negative instances resulting from our proposed semi-supervised strategy. Random forest classification method was then applied to train and predict multiple times to give a score for each interaction. After setting a threshold estimated by a Binomial distribution, a Human protein-coding Gene Functional Association Network (HuGFAN) was reconstructed with 20 383 genes and 1185 429 high confidence interactions. Then, HuGFAN was compared with other networks from data sources with respect to network properties, suggesting that HuGFAN is more function and pathway related. Finally, HuGFAN was applied to identify cancer driver through two famous network-based methods (DriverNet and HotNet2) to show its outstanding performance compared with other networks. HuGFAN and other supplementary files are freely available at https://github.com/xthuang226/HuGFAN.


Subject(s)
Gene Regulatory Networks , Machine Learning , Databases, Factual , Humans
4.
Bioinformatics ; 33(10): 1528-1535, 2017 May 15.
Article in English | MEDLINE | ID: mdl-28011782

ABSTRACT

MOTIVATION: Cell fate specification plays a key role to generate distinct cell types during metazoan development. However, most of the underlying signaling networks at cellular level are not well understood. Availability of time lapse single-cell gene expression data collected throughout Caenorhabditis elegans embryogenesis provides an excellent opportunity for investigating signaling networks underlying cell fate specification at systems, cellular and molecular levels. RESULTS: We propose a framework to infer signaling networks at cellular level by exploring the single-cell gene expression data. Through analyzing the expression data of nhr-25 , a hypodermis-specific transcription factor, in every cells of both wild-type and mutant C.elegans embryos through RNAi against 55 genes, we have inferred a total of 23 genes that regulate (activate or inhibit) nhr-25 expression in cell-specific fashion. We also infer the signaling pathways consisting of each of these genes and nhr-25 based on a probabilistic graphical model for the selected five founder cells, 'ABarp', 'ABpla', 'ABpra', 'Caa' and 'Cpa', which express nhr-25 and mostly develop into hypodermis. By integrating the inferred pathways, we reconstruct five signaling networks with one each for the five founder cells. Using RNAi gene knockdown as a validation method, the inferred networks are able to predict the effects of the knockdown genes. These signaling networks in the five founder cells are likely to ensure faithful hypodermis cell fate specification in C.elegans at cellular level. AVAILABILITY AND IMPLEMENTATION: All source codes and data are available at the github repository https://github.com/xthuang226/Worm_Single_Cell_Data_and_Codes.git . CONTACT: zhuyuan@cug.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Caenorhabditis elegans/growth & development , Embryonic Development/genetics , Gene Expression Profiling/methods , Gene Expression Regulation, Developmental , Signal Transduction/genetics , Single-Cell Analysis/methods , Animals , Caenorhabditis elegans/genetics , Caenorhabditis elegans/metabolism , Caenorhabditis elegans Proteins/genetics , Caenorhabditis elegans Proteins/metabolism , Cell Differentiation/genetics , DNA-Binding Proteins/genetics , RNA Interference , Transcription Factors/genetics
5.
Mol Biosyst ; 12(1): 85-92, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26555698

ABSTRACT

In Caenorhabditis elegans, a large number of protein-protein interactions (PPIs) are identified by different experiments. However, a comprehensive weighted PPI network, which is essential for signaling pathway inference, is not yet available in this model organism. Therefore, we firstly construct an integrative PPI network in C. elegans with 12,951 interactions involving 5039 proteins from seven molecular interaction databases. Then, a reliability score based on a probabilistic graphical model (RSPGM) is proposed to assess PPIs. It assumes that the random number of interactions between two proteins comes from the Bernoulli distribution to avoid multi-links. The main parameter of the RSPGM score contains a few latent variables which can be considered as several common properties between two proteins. Validations on high-confidence yeast datasets show that RSPGM provides more accurate evaluation than other approaches, and the PPIs in the reconstructed PPI network have higher biological relevance than that in the original network in terms of gene ontology, gene expression, essentiality and the prediction of known protein complexes. Furthermore, this weighted integrative PPI network in C. elegans is employed on inferring interaction path of the canonical Wnt/ß-catenin pathway as well. Most genes on the inferred interaction path have been validated to be Wnt pathway components. Therefore, RSPGM is essential and effective for evaluating PPIs and inferring interaction path. Finally, the PPI network with RSPGM scores can be queried and visualized on a user interactive website, which is freely available at .


Subject(s)
Caenorhabditis elegans Proteins/metabolism , Computational Biology/methods , Models, Biological , Models, Statistical , Protein Interaction Mapping , Protein Interaction Maps , Algorithms , Animals , Databases, Protein , Protein Interaction Mapping/methods , Reproducibility of Results , Web Browser
SELECTION OF CITATIONS
SEARCH DETAIL
...