Search | VHL Regional Portal

Identifying promoter and enhancer sequences by graph convolutional networks.

Tenekeci, Samet; Tekir, Selma.

Comput Biol Chem ; 110: 108040, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38430611

ABSTRACT

Identification of promoters, enhancers, and their interactions helps understand genetic regulation. This study proposes a graph-based semi-supervised learning model (GCN4EPI) for the enhancer-promoter classification problem. We adopt a graph convolutional network (GCN) architecture to integrate interaction information with sequence features. Nodes of the constructed graph hold word embeddings of DNA sequences while edges hold the Enhancer-Promoter Interaction (EPI) information. By means of semi-supervised learning, much less data (16%) and time are needed in model training. Comparisons on a benchmark dataset of six human cell lines show that the proposed approach outperforms the state-of-the-art methods by a large margin (10% higher F1 score) and has the fastest training time (up to 3 times). Moreover, GCN4EPI's performance on cross-cell line data is also better than the baselines (3% higher F1 score). Our qualitative analyses with graph explainability models prove that GCN4EPI learns from both text and graph structure. The results suggest that integrating interaction information with sequence features improves predictive performance and compensates for the number of training instances.

Subject(s)

Promoter Regions, Genetic , Humans , Promoter Regions, Genetic/genetics , Enhancer Elements, Genetic/genetics , Neural Networks, Computer

Integrative Biological Network Analysis to Identify Shared Genes in Metabolic Disorders.

Tenekeci, Samet; Isik, Zerrin.

IEEE/ACM Trans Comput Biol Bioinform ; 19(1): 522-530, 2022.

Article in English | MEDLINE | ID: mdl-32396100

ABSTRACT

Identification of common molecular mechanisms in interrelated diseases is essential for better prognoses and targeted therapies. However, complexity of metabolic pathways makes it difficult to discover common disease genes underlying metabolic disorders; and it requires more sophisticated bioinformatics models that combine different types of biological data and computational methods. Accordingly, we built an integrative network analysis model to identify shared disease genes in metabolic syndrome (MS), type 2 diabetes (T2D), and coronary artery disease (CAD). We constructed weighted gene co-expression networks by combining gene expression, protein-protein interaction, and gene ontology data from multiple sources. For 90 different configurations of disease networks, we detected the significant modules by using MCL, SPICi, and Linkcomm graph clustering algorithms. We also performed a comparative evaluation on disease modules to determine the best method providing the highest biological validity. By overlapping the disease modules, we identified 22 shared genes for MS-CAD and T2D-CAD. Moreover, 19 out of these genes were directly or indirectly associated with relevant diseases in the previous medical studies. This study does not only demonstrate the performance of different biological data sources and computational methods in disease-gene discovery, but also offers potential insights into common genetic mechanisms of the metabolic disorders.

Subject(s)

Diabetes Mellitus, Type 2 , Cluster Analysis , Computational Biology , Diabetes Mellitus, Type 2/genetics , Gene Expression Profiling , Gene Ontology , Gene Regulatory Networks/genetics , Humans

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL