Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
1.
Comput Struct Biotechnol J ; 21: 4187-4195, 2023.
Article in English | MEDLINE | ID: mdl-37680266

ABSTRACT

Motivation: Lead identification is a fundamental step to prioritize candidate compounds for downstream drug discovery process. Machine learning (ML) and deep learning (DL) approaches are widely used to identify lead compounds using both chemical property and experimental information. However, ML or DL methods rarely consider compound similarity information directly since ML and DL models use abstract representation of molecules for model construction. Alternatively, data mining approaches are also used to explore chemical space with drug candidates by screening undesirable compounds. A major challenge for data mining approaches is to develop efficient data mining methods that search large chemical space for desirable lead compounds with low false positive rate. Results: In this work, we developed a network propagation (NP) based data mining method for lead identification that performs search on an ensemble of chemical similarity networks. We compiled 14 fingerprint-based similarity networks. Given a target protein of interest, we use a deep learning-based drug target interaction model to narrow down compound candidates and then we use network propagation to prioritize drug candidates that are highly correlated with drug activity score such as IC50. In an extensive experiment with BindingDB, we showed that our approach successfully discovered intentionally unlabeled compounds for given targets. To further demonstrate the prediction power of our approach, we identified 24 candidate leads for CLK1. Two out of five synthesizable candidates were experimentally validated in binding assays. In conclusion, our framework can be very useful for lead identification from very large compound databases such as ZINC.

2.
Nat Commun ; 14(1): 3570, 2023 06 15.
Article in English | MEDLINE | ID: mdl-37322032

ABSTRACT

Computational drug repurposing aims to identify new indications for existing drugs by utilizing high-throughput data, often in the form of biomedical knowledge graphs. However, learning on biomedical knowledge graphs can be challenging due to the dominance of genes and a small number of drug and disease entities, resulting in less effective representations. To overcome this challenge, we propose a "semantic multi-layer guilt-by-association" approach that leverages the principle of guilt-by-association - "similar genes share similar functions", at the drug-gene-disease level. Using this approach, our model DREAMwalk: Drug Repurposing through Exploring Associations using Multi-layer random walk uses our semantic information-guided random walk to generate drug and disease-populated node sequences, allowing for effective mapping of both drugs and diseases in a unified embedding space. Compared to state-of-the-art link prediction models, our approach improves drug-disease association prediction accuracy by up to 16.8%. Moreover, exploration of the embedding space reveals a well-aligned harmony between biological and semantic contexts. We demonstrate the effectiveness of our approach through repurposing case studies for breast carcinoma and Alzheimer's disease, highlighting the potential of multi-layer guilt-by-association perspective for drug repurposing on biomedical knowledge graphs.


Subject(s)
Drug Repositioning , Pattern Recognition, Automated , Learning
3.
iScience ; 26(1): 105677, 2023 Jan 20.
Article in English | MEDLINE | ID: mdl-36654861

ABSTRACT

Drug-induced liver injury (DILI) is the main cause of drug failure in clinical trials. The characterization of toxic compounds in terms of chemical structure is important because compounds can be metabolized to toxic substances in the liver. Traditional machine learning approaches have had limited success in predicting DILI, and emerging deep graph neural network (GNN) models are yet powerful enough to predict DILI. In this study, we developed a completely different approach, supervised subgraph mining (SSM), a strategy to mine explicit subgraph features by iteratively updating individual graph transitions to maximize DILI fidelity. Our method outperformed previous methods including state-of-the-art GNN tools in classifying DILI on two different datasets: DILIst and TDC-benchmark. We also combined the subgraph features by using SMARTS-based frequent structural pattern matching and associated them with drugs' ATC code.

4.
Comput Struct Biotechnol J ; 20: 4288-4304, 2022.
Article in English | MEDLINE | ID: mdl-36051875

ABSTRACT

A large number of chemical compounds are available in databases such as PubChem and ZINC. However, currently known compounds, though large, represent only a fraction of possible compounds, which is known as chemical space. Many of these compounds in the databases are annotated with properties and assay data that can be used for drug discovery efforts. For this goal, a number of machine learning algorithms have been developed and recent deep learning technologies can be effectively used to navigate chemical space, especially for unknown chemical compounds, in terms of drug-related tasks. In this article, we survey how deep learning technologies can model and utilize chemical compound information in a task-oriented way by exploiting annotated properties and assay data in the chemical compounds databases. We first compile what kind of tasks are trying to be accomplished by machine learning methods. Then, we survey deep learning technologies to show their modeling power and current applications for accomplishing drug related tasks. Next, we survey deep learning techniques to address the insufficiency issue of annotated data for more effective navigation of chemical space. Chemical compound information alone may not be powerful enough for drug related tasks, thus we survey what kind of information, such as assay and gene expression data, can be used to improve the prediction power of deep learning models. Finally, we conclude this survey with four important newly developed technologies that are yet to be fully incorporated into computational analysis of chemical information.

5.
Cell Metab ; 34(5): 702-718.e5, 2022 05 03.
Article in English | MEDLINE | ID: mdl-35417665

ABSTRACT

Emerging evidence indicates that the accretion of senescent cells is linked to metabolic disorders. However, the underlying mechanisms and metabolic consequences of cellular senescence in obesity remain obscure. In this study, we found that obese adipocytes are senescence-susceptible cells accompanied with genome instability. Additionally, we discovered that SREBP1c may play a key role in genome stability and senescence in adipocytes by modulating DNA-damage responses. Unexpectedly, SREBP1c interacted with PARP1 and potentiated PARP1 activity during DNA repair, independent of its canonical lipogenic function. The genetic depletion of SREBP1c accelerated adipocyte senescence, leading to immune cell recruitment into obese adipose tissue. These deleterious effects provoked unhealthy adipose tissue remodeling and insulin resistance in obesity. In contrast, the elimination of senescent adipocytes alleviated adipose tissue inflammation and improved insulin resistance. These findings revealed distinctive roles of SREBP1c-PARP1 axis in the regulation of adipocyte senescence and will help decipher the metabolic significance of senescence in obesity.


Subject(s)
Insulin Resistance , Adipocytes/metabolism , Adipose Tissue/metabolism , Humans , Insulin Resistance/physiology , Obesity/metabolism , Poly (ADP-Ribose) Polymerase-1/genetics , Poly (ADP-Ribose) Polymerase-1/metabolism , Sterol Regulatory Element Binding Protein 1/genetics , Sterol Regulatory Element Binding Protein 1/metabolism
7.
Sci Rep ; 11(1): 23992, 2021 12 14.
Article in English | MEDLINE | ID: mdl-34907266

ABSTRACT

Cervical lymph node metastasis is the leading cause of poor prognosis in oral tongue squamous cell carcinoma and also occurs in the early stages. The current clinical diagnosis depends on a physical examination that is not enough to determine whether micrometastasis remains. The transcriptome profiling technique has shown great potential for predicting micrometastasis by capturing the dynamic activation state of genes. However, there are several technical challenges in using transcriptome data to model patient conditions: (1) An Insufficient number of samples compared to the number of genes, (2) Complex dependence between genes that govern the cancer phenotype, and (3) Heterogeneity between patients between cohorts that differ geographically and racially. We developed a computational framework to learn the subnetwork representation of the transcriptome to discover network biomarkers and determine the potential of metastasis in early oral tongue squamous cell carcinoma. Our method achieved high accuracy in predicting the potential of metastasis in two geographically and racially different groups of patients. The robustness of the model and the reproducibility of the discovered network biomarkers show great potential as a tool to diagnose lymph node metastasis in early oral cancer.


Subject(s)
Biomarkers, Tumor/biosynthesis , Carcinoma, Squamous Cell/metabolism , Databases, Nucleic Acid , Gene Expression Regulation, Neoplastic , Models, Biological , Mouth Neoplasms/metabolism , Transcriptome , Adult , Aged , Carcinoma, Squamous Cell/pathology , Female , Humans , Lymphatic Metastasis , Male , Middle Aged , Mouth Neoplasms/pathology
8.
Front Genet ; 12: 778490, 2021.
Article in English | MEDLINE | ID: mdl-34759964

ABSTRACT

[This corrects the article DOI: 10.3389/fgene.2021.682841.].

9.
Front Genet ; 12: 682841, 2021.
Article in English | MEDLINE | ID: mdl-34567063

ABSTRACT

Multi-omics data is frequently measured to enrich the comprehension of biological mechanisms underlying certain phenotypes. However, due to the complex relations and high dimension of multi-omics data, it is difficult to associate omics features to certain biological traits of interest. For example, the clinically valuable breast cancer subtypes are well-defined at the molecular level, but are poorly classified using gene expression data. Here, we propose a multi-omics analysis method called MONTI (Multi-Omics Non-negative Tensor decomposition for Integrative analysis), which goal is to select multi-omics features that are able to represent trait specific characteristics. Here, we demonstrate the strength of multi-omics integrated analysis in terms of cancer subtyping. The multi-omics data are first integrated in a biologically meaningful manner to form a three dimensional tensor, which is then decomposed using a non-negative tensor decomposition method. From the result, MONTI selects highly informative subtype specific multi-omics features. MONTI was applied to three case studies of 597 breast cancer, 314 colon cancer, and 305 stomach cancer cohorts. For all the case studies, we found that the subtype classification accuracy significantly improved when utilizing all available multi-omics data. MONTI was able to detect subtype specific gene sets that showed to be strongly regulated by certain omics, from which correlation between omics types could be inferred. Furthermore, various clinical attributes of nine cancer types were analyzed using MONTI, which showed that some clinical attributes could be well explained using multi-omics data. We demonstrated that integrating multi-omics data in a gene centric manner improves detecting cancer subtype specific features and other clinical features, which may be used to further understand the molecular characteristics of interest. The software and data used in this study are available at: https://github.com/inukj/MONTI.

10.
Diabetes ; 70(12): 2756-2770, 2021 12.
Article in English | MEDLINE | ID: mdl-34521642

ABSTRACT

Reactive oxygen species (ROS) are associated with various roles of brown adipocytes. Glucose-6-phosphate dehydrogenase (G6PD) controls cellular redox potentials by producing NADPH. Although G6PD upregulates cellular ROS levels in white adipocytes, the roles of G6PD in brown adipocytes remain elusive. Here, we found that G6PD defect in brown adipocytes impaired thermogenic function through excessive cytosolic ROS accumulation. Upon cold exposure, G6PD-deficient mutant (G6PDmut) mice exhibited cold intolerance and downregulated thermogenic gene expression in brown adipose tissue (BAT). In addition, G6PD-deficient brown adipocytes had increased cytosolic ROS levels, leading to extracellular signal-regulated kinase (ERK) activation. In BAT of G6PDmut mice, administration of antioxidant restored the thermogenic activity by potentiating thermogenic gene expression and relieving ERK activation. Consistently, body temperature and thermogenic execution were rescued by ERK inhibition in cold-exposed G6PDmut mice. Taken together, these data suggest that G6PD in brown adipocytes would protect against cytosolic oxidative stress, leading to cold-induced thermogenesis.


Subject(s)
Adipocytes, Brown/metabolism , Glucosephosphate Dehydrogenase/genetics , Reactive Oxygen Species/metabolism , Thermogenesis/genetics , 3T3-L1 Cells , Adipose Tissue, Brown/metabolism , Animals , Cells, Cultured , Glucosephosphate Dehydrogenase/metabolism , Male , Mice , Mice, Inbred C3H , Mice, Inbred C57BL , Mice, Transgenic
11.
Front Genet ; 12: 652623, 2021.
Article in English | MEDLINE | ID: mdl-34093651

ABSTRACT

Gene expression profile or transcriptome can represent cellular states, thus understanding gene regulation mechanisms can help understand how cells respond to external stress. Interaction between transcription factor (TF) and target gene (TG) is one of the representative regulatory mechanisms in cells. In this paper, we present a novel computational method to construct condition-specific transcriptional networks from transcriptome data. Regulatory interaction between TFs and TGs is very complex, specifically multiple-to-multiple relations. Experimental data from TF Chromatin Immunoprecipitation sequencing is useful but produces one-to-multiple relations between TF and TGs. On the other hand, co-expression networks of genes can be useful for constructing condition transcriptional networks, but there are many false positive relations in co-expression networks. In this paper, we propose a novel method to construct a condition-specific and combinatorial transcriptional network, applying kernel canonical correlation analysis (kernel CCA) to identify multiple-to-multiple TF-TG relations in certain biological condition. Kernel CCA is a well-established statistical method for computing the correlation of a group of features vs. another group of features. We, therefore, employed kernel CCA to embed TFs and TGs into a new space where the correlation of TFs and TGs are reflected. To demonstrate the usefulness of our network construction method, we used the blood transcriptome data for the investigation on the response to high fat diet in a human and an arabidopsis data set for the investigation on the response to cold/heat stress. Our method detected not only important regulatory interactions reported in previous studies but also novel TF-TG relations where a module of TF is regulating a module of TGs upon specific stress.

12.
Bioinformatics ; 38(1): 275-277, 2021 12 22.
Article in English | MEDLINE | ID: mdl-34185062

ABSTRACT

MOTIVATION: Multi-omics data in molecular biology has accumulated rapidly over the years. Such data contains valuable information for research in medicine and drug discovery. Unfortunately, data-driven research in medicine and drug discovery is challenging for a majority of small research labs due to the large volume of data and the complexity of analysis pipeline. RESULTS: We present BioVLAB-Cancer-Pharmacogenomics, a bioinformatics system that facilitates analysis of multi-omics data from breast cancer to analyze and investigate intratumor heterogeneity and pharmacogenomics on Amazon Web Services. Our system takes multi-omics data as input to perform tumor heterogeneity analysis in terms of TCGA data and deconvolve-and-match the tumor gene expression to cell line data in CCLE using DNA methylation profiles. We believe that our system can help small research labs perform analysis of tumor multi-omics without worrying about computational infrastructure and maintenance of databases and tools. AVAILABILITY AND IMPLEMENTATION: http://biohealth.snu.ac.kr/software/biovlab_cancer_pharmacogenomics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Breast Neoplasms , Software , Humans , Female , Multiomics , Pharmacogenetics , Breast Neoplasms/drug therapy , Breast Neoplasms/genetics , Databases, Factual
13.
Comput Struct Biotechnol J ; 19: 1541-1556, 2021.
Article in English | MEDLINE | ID: mdl-33841755

ABSTRACT

There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality. Computational methods have become powerful ever to analyze such complex the data. Thus, recent successes in the improved quality of CPI prediction are due to use of both sophisticated computational techniques and higher quality information in the databases. The goal of this article is to provide reviews of topics related to CPI, such as data, format, representation, to computational models, so that researchers can take full advantages of these resources to develop novel prediction methods. Chemical compounds and protein data from various resources were discussed in terms of data formats and encoding schemes. For the CPI methods, we grouped prediction methods into five categories from traditional machine learning techniques to state-of-the-art deep learning techniques. In closing, we discussed emerging machine learning topics to help both experimental and computational scientists leverage the current knowledge and strategies to develop more powerful and accurate CPI prediction methods.

14.
Proc Natl Acad Sci U S A ; 118(11)2021 03 16.
Article in English | MEDLINE | ID: mdl-33836591

ABSTRACT

White adipose tissue (WAT) is a key regulator of systemic energy metabolism, and impaired WAT plasticity characterized by enlargement of preexisting adipocytes associates with WAT dysfunction, obesity, and metabolic complications. However, the mechanisms that retain proper adipose tissue plasticity required for metabolic fitness are unclear. Here, we comprehensively showed that adipocyte-specific DNA methylation, manifested in enhancers and CTCF sites, directs distal enhancer-mediated transcriptomic features required to conserve metabolic functions of white adipocytes. Particularly, genetic ablation of adipocyte Dnmt1, the major methylation writer, led to increased adiposity characterized by increased adipocyte hypertrophy along with reduced expansion of adipocyte precursors (APs). These effects of Dnmt1 deficiency provoked systemic hyperlipidemia and impaired energy metabolism both in lean and obese mice. Mechanistically, Dnmt1 deficiency abrogated mitochondrial bioenergetics by inhibiting mitochondrial fission and promoted aberrant lipid metabolism in adipocytes, rendering adipocyte hypertrophy and WAT dysfunction. Dnmt1-dependent DNA methylation prevented aberrant CTCF binding and, in turn, sustained the proper chromosome architecture to permit interactions between enhancer and dynamin-1-like protein gene Dnm1l (Drp1) in adipocytes. Also, adipose DNMT1 expression inversely correlated with adiposity and markers of metabolic health but positively correlated with AP-specific markers in obese human subjects. Thus, these findings support strategies utilizing Dnmt1 action on mitochondrial bioenergetics in adipocytes to combat obesity and related metabolic pathology.


Subject(s)
Adipocytes/metabolism , DNA (Cytosine-5-)-Methyltransferase 1/metabolism , Epigenesis, Genetic , Mitochondrial Dynamics , Adipocytes/pathology , Adipose Tissue/metabolism , Adipose Tissue/pathology , Adiposity , Animals , CCCTC-Binding Factor/metabolism , Chromosome Structures , DNA (Cytosine-5-)-Methyltransferase 1/deficiency , DNA (Cytosine-5-)-Methyltransferase 1/genetics , DNA Methylation , Dynamins/genetics , Dynamins/metabolism , Energy Metabolism , Enhancer Elements, Genetic , Gene Expression Profiling , Lipid Metabolism , Mice , Mitochondria/metabolism , Obesity/metabolism , Obesity/pathology , Promoter Regions, Genetic , Protein Binding
15.
Front Genet ; 11: 564792, 2020.
Article in English | MEDLINE | ID: mdl-33281870

ABSTRACT

Pharmacogenomics is the study of how genes affect a person's response to drugs. Thus, understanding the effect of drug at the molecular level can be helpful in both drug discovery and personalized medicine. Over the years, transcriptome data upon drug treatment has been collected and several databases compiled before drug treatment cancer cell multi-omics data with drug sensitivity (IC 50, AUC) or time-series transcriptomic data after drug treatment. However, analyzing transcriptome data upon drug treatment is challenging since more than 20,000 genes interact in complex ways. In addition, due to the difficulty of both time-series analysis and multi-omics integration, current methods can hardly perform analysis of databases with different data characteristics. One effective way is to interpret transcriptome data in terms of well-characterized biological pathways. Another way is to leverage state-of-the-art methods for multi-omics data integration. In this paper, we developed Drug Response analysis Integrating Multi-omics and time-series data (DRIM), an integrative multi-omics and time-series data analysis framework that identifies perturbed sub-pathways and regulation mechanisms upon drug treatment. The system takes drug name and cell line identification numbers or user's drug control/treat time-series gene expression data as input. Then, analysis of multi-omics data upon drug treatment is performed in two perspectives. For the multi-omics perspective analysis, IC 50-related multi-omics potential mediator genes are determined by embedding multi-omics data to gene-centric vector space using a tensor decomposition method and an autoencoder deep learning model. Then, perturbed pathway analysis of potential mediator genes is performed. For the time-series perspective analysis, time-varying perturbed sub-pathways upon drug treatment are constructed. Additionally, a network involving transcription factors (TFs), multi-omics potential mediator genes, and perturbed sub-pathways is constructed, and paths to perturbed pathways from TFs are determined by an influence maximization method. To demonstrate the utility of our system, we provide analysis results of sub-pathway regulatory mechanisms in breast cancer cell lines of different drug sensitivity. DRIM is available at: http://biohealth.snu.ac.kr/software/DRIM/.

16.
Bioinformatics ; 36(12): 3818-3824, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32207514

ABSTRACT

MOTIVATION: Biological pathway is an important curated knowledge of biological processes. Thus, cancer subtype classification based on pathways will be very useful to understand differences in biological mechanisms among cancer subtypes. However, pathways include only a fraction of the entire gene set, only one-third of human genes in KEGG, and pathways are fragmented. For this reason, there are few computational methods to use pathways for cancer subtype classification. RESULTS: We present an explainable deep-learning model with attention mechanism and network propagation for cancer subtype classification. Each pathway is modeled by a graph convolutional network. Then, a multi-attention-based ensemble model combines several hundreds of pathways in an explainable manner. Lastly, network propagation on pathway-gene network explains why gene expression profiles in subtypes are different. In experiments with five TCGA cancer datasets, our method achieved very good classification accuracies and, additionally, identified subtype-specific pathways and biological functions. AVAILABILITY AND IMPLEMENTATION: The source code is available at http://biohealth.snu.ac.kr/software/GCN_MAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neoplasms , Software , Attention , Humans , Neoplasms/genetics , Transcriptome
17.
Brief Bioinform ; 21(1): 36-46, 2020 Jan 17.
Article in English | MEDLINE | ID: mdl-30462155

ABSTRACT

MOTIVATION: Biological pathways are extensively used for the analysis of transcriptome data to characterize biological mechanisms underlying various phenotypes. There are a number of computational tools that summarize transcriptome data at the pathway level. However, there is no comparative study on how well these tools produce useful information at the cohort level, enabling comparison of many samples or patients. RESULTS: In this study, we systematically compared and evaluated 13 different pathway activity inference tools based on 5 comparison criteria using pan-cancer data set. This study has two major contributions. First, our study provides a comprehensive survey on computational techniques used by existing pathway activity inference tools. The tools use different strategies and assume different requirements on data: input transformation, use of labels, necessity of cohort-level input data, use of gene relations and scoring metric. Second, we performed extensive evaluations on the performance of these tools. Because different tools use different methods to map samples to the pathway dimension, the tools are evaluated at the pathway level using five comparison criteria. Starting from measuring how well a tool maintains the characteristics of original gene expression values, robustness was also investigated by adding noise into gene expression data. Classification tasks on three clinical variables (tumor versus normal, survival and cancer subtypes) were performed to evaluate the utility of tools for their clinical applications. In addition, the inferred activity values were compared between the tools to see how similar they are along with the scoring schemes they use.

18.
PLoS One ; 14(10): e0223520, 2019.
Article in English | MEDLINE | ID: mdl-31644551

ABSTRACT

MOTIVATION: Intratumor heterogeneity (ITH) represents the diversity of cell populations that make up cancer tissue. The level of ITH in a tumor is usually measured by a genomic variation profile, such as copy number variation and somatic mutation. However, a recent study has identified ITH at the transcriptome level and suggested that ITH at gene expression levels is useful for predicting prognosis. Measuring ITH levels at the spliceome level is a natural extension. There are serious technical challenges in measuring spliceomic ITH (sITH) from bulk tumor RNA sequencing (RNA-seq) due to the complex splicing patterns. RESULTS: We propose an information-theoretic method to measure the sITH of bulk tumors to overcome the above challenges. This method has been extensively tested in experiments using synthetic data, xenograft tumor data, and TCGA pan-cancer data. As a result, we showed that sITH is closely related to cancer progression and clonal heterogeneity, along with clinically significant features such as cancer stage, survival outcome and PAM50 subtype. As far as we know, it is the first study to define ITH at the spliceome level. This method can greatly improve the understanding of cancer spliceome and has great potential as a diagnostic and prognostic tool.


Subject(s)
Biomarkers, Tumor , Genetic Heterogeneity , Neoplasms/genetics , RNA Splicing , Algorithms , Computational Biology/methods , DNA Copy Number Variations , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/metabolism , Reproducibility of Results , Sequence Analysis, RNA , Spliceosomes
19.
Methods ; 166: 48-56, 2019 08 15.
Article in English | MEDLINE | ID: mdl-30905748

ABSTRACT

Enhancer is a DNA sequence of a genome that controls transcription of downstream target genes. Enhancers are known to be associated with certain epigenetic signatures. Machine learning tools, such as CSI-ANN, ChromHMM, and RFECS, were developed for predicting enhancers using various epigenetic features. However, predictions by different tools vary widely and quite a significant portion of enhancer predictions does not agree. Thus, computational methods for enhancer prediction should be further developed. In this paper, a hybrid neural network called Enhancer-CRNN, a convolutional neural network (CNN) followed by a recurrent neural network (RNN), was developed and they were used to predict enhancer regions with histone modification marks as input. The CNN in our model is to reflect local characteristics and the RNN is to learn sequential dependencies among the histone marks. Hybridization of both neural networks outperformed existing prediction tools in experiments with GM12878, H1hesc, HeLaS3, and HepG2 cell lines. On average, 13-17 percent of the enhancers predicted by our method were cell type-specific. With the trained model, optimized virtual input histone marks was generated to provide a deeper insight into how histone modification marks can represent enhancer regions in which histone marks indicate active or repressed enhancers. In summary, our model produced accurate annotation of enhancers with detailed information on how histone profiles contribute to the presence of putative enhancers.


Subject(s)
Computational Biology/methods , Enhancer Elements, Genetic/genetics , Histone Code/genetics , Neural Networks, Computer , Chromatin/genetics , Epigenomics , Histones/genetics , Humans , Machine Learning
20.
BMC Syst Biol ; 11(Suppl 2): 15, 2017 03 14.
Article in English | MEDLINE | ID: mdl-28361687

ABSTRACT

BACKGROUND: Identifying perturbed pathways in a given condition is crucial in understanding biological phenomena. In addition to identifying perturbed pathways individually, pathway analysis should consider interactions among pathways. Currently available pathway interaction prediction methods are based on the existence of overlapping genes between pathways, protein-protein interaction (PPI) or functional similarities. However, these approaches just consider the pathways as a set of genes, thus they do not take account of topological features. In addition, most of the existing approaches do not handle the explicit gene expression quantity information that is routinely measured by RNA-sequecing. RESULTS: To overcome these technical issues, we developed a new pathway interaction network construction method using PPI, closeness centrality and shortest paths. We tested our approach on three different high-throughput RNA-seq data sets: pregnant mice data to reveal the role of serotonin on beta cell mass, bone-metastatic breast cancer data and autoimmune thyroiditis data to study the role of IFN- α. Our approach successfully identified the pathways reported in the original papers. For the pathways that are not directly mentioned in the original papers, we were able to find evidences of pathway interactions by the literature search. Our method outperformed two existing approaches, overlapping gene-based approach (OGB) and protein-protein interaction-based approach (PB), in experiments with the three data sets. CONCLUSION: Our results show that PINTnet successfully identified condition-specific perturbed pathways and the interactions between the pathways. We believe that our method will be very useful in characterizing biological mechanisms at the pathway level. PINTnet is available at http://biohealth.snu.ac.kr/software/PINTnet/ .


Subject(s)
Computational Biology/methods , Protein Interaction Mapping/methods , Gene Expression Regulation , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL
...