Search | VHL Regional Portal

CapsNh-Kcr: Capsule network-based prediction of lysine crotonylation sites in human non-histone proteins.

Khanal, Jhabindra; Kandel, Jeevan; Tayara, Hilal; Chong, Kil To.

Comput Struct Biotechnol J ; 21: 120-127, 2023.

Article in English | MEDLINE | ID: mdl-36544479

ABSTRACT

Lysine crotonylation (Kcr) is one of the most important post-translational modifications (PTMs) that is widely detected in both histone and non-histone proteins. In fact, Kcr is reported to be involved in various biological processes, such as metabolism and cell differentiation. However, the available experimental methods for Kcr site identification are laborious and costly. To effectively replace existing experimental approaches, some computational methods have been developed in the last few years. The available computational methods still lack some important aspects, as they can only identify Kcr sites on either histone-only or combined histone and nonhistone proteins. Although a tool was developed to identify Kcr sites on non-histone proteins only, its performance is inadequate and the exploration of hidden Kcr patterns (motifs) has been completely ignored, which might be significant for detailed Kcr studies. Therefore, algorithms that can more effectively predict Kcr sites on non-histone proteins with their biological meaning need to be designed. Accordingly, we developed a novel deep learning (capsule network)-based model, named CapsNh-Kcr, for Kcr site prediction, particularly focusing on non-histone proteins. Based on the independent results, the proposed model achieves an AUC of 0.9120, which is approximately 6% higher than that of previous nhKcr model in the prediction of Kcr sites on non-histone proteins. Further, we revealed, for the first time, that the proposed model can represent obvious motif distribution across Kcr sites in non-histone proteins. The source code (in Python) is publicly available at https://github.com/Jhabindra-bioinfo/CapsNh-Kcr.

DeepCap-Kcr: accurate identification and investigation of protein lysine crotonylation sites based on capsule network.

Khanal, Jhabindra; Tayara, Hilal; Zou, Quan; To Chong, Kil.

Brief Bioinform ; 23(1)2022 01 17.

Article in English | MEDLINE | ID: mdl-34882222

ABSTRACT

Lysine crotonylation (Kcr) is a posttranslational modification widely detected in histone and nonhistone proteins. It plays a vital role in human disease progression and various cellular processes, including cell cycle, cell organization, chromatin remodeling and a key mechanism to increase proteomic diversity. Thus, accurate information on such sites is beneficial for both drug development and basic research. Existing computational methods can be improved to more effectively identify Kcr sites in proteins. In this study, we proposed a deep learning model, DeepCap-Kcr, a capsule network (CapsNet) based on a convolutional neural network (CNN) and long short-term memory (LSTM) for robust prediction of Kcr sites on histone and nonhistone proteins (mammals). The proposed model outperformed the existing CNN architecture Deep-Kcr and other well-established tools in most cases and provided promising outcomes for practical use; in particular, the proposed model characterized the internal hierarchical representation as well as the important features from multiple levels of abstraction automatically learned from a small number of samples. The trained model was well generalized in other species (papaya). Moreover, we showed the features and properties generated by the internal capsule layer that can explore the internal data distribution related to biological significance (as a motif detector). The source code and data are freely available at https://github.com/Jhabindra-bioinfo/DeepCap-Kcr.

Subject(s)

Lysine , Protein Processing, Post-Translational , Proteomics , Animals , Histones/metabolism , Humans , Lysine/metabolism , Mammals/metabolism , Neural Networks, Computer , Software

Identifying DNA N4-methylcytosine sites in the rosaceae genome with a deep learning model relying on distributed feature representation.

Khanal, Jhabindra; Tayara, Hilal; Zou, Quan; Chong, Kil To.

Comput Struct Biotechnol J ; 19: 1612-1619, 2021.

Article in English | MEDLINE | ID: mdl-33868598

ABSTRACT

DNA N4-methylcytosine (4mC), an epigenetic modification found in prokaryotic and eukaryotic species, is involved in numerous biological functions, including host defense, transcription regulation, gene expression, and DNA replication. To identify 4mC sites, previous computational studies mostly focused on finding hand-crafted features. This area of research, therefore, would benefit from the development of a computational approach that relies on automatic feature selection to identify relevant sites. We here report 4mC-w2vec, a computational method that learned automatic feature discrimination in the Rosaceae genomes, especially in Rosa chinensis (R. chinensis) and Fragaria vesca (F. vesca), based on distributed feature representation and through the word embedding technique 'word2vec'. While a few bioinformatics tools are currently employed to identify 4mC sites in these genomes, their prediction performance is inadequate. Our system processed 4mC and non-4mC sites through a word embedding process, including sub-word information of its biological words through k-mer, which then served as features that were fed into a double layer of convolutional neural network (CNN) to classify whether the sample sequences contained 4mCs or non-4mCs sites. Our tool demonstrated performance superior to current tools that use the same genomic datasets. Additionally, 4mC-w2vec is effective for balanced and imbalanced class datasets alike, and the online web-server is currently available at: http://nsclbio.jbnu.ac.kr/tools/4mC-w2vec/.

i6mA-stack: A stacking ensemble-based computational prediction of DNA N6-methyladenine (6mA) sites in the Rosaceae genome.

Khanal, Jhabindra; Lim, Dae Young; Tayara, Hilal; Chong, Kil To.

Genomics ; 113(1 Pt 2): 582-592, 2021 01.

Article in English | MEDLINE | ID: mdl-33010390

ABSTRACT

DNA N6-methyladenine (6â¯mA) is an epigenetic modification that plays a vital role in a variety of cellular processes in both eukaryotes and prokaryotes. Accurate information of 6â¯mA sites in the Rosaceae genome may assist in understanding genomic 6â¯mA distributions and various biological functions such as epigenetic inheritance. Various studies have shown the possibility of identifying 6â¯mA sites through experiments, but the procedures are time-consuming and costly. To overcome the drawbacks of experimental methods, we propose an accurate computational paradigm based on a machine learning (ML) technique to identify 6â¯mA sites in Rosa chinensis (R.chinensis) and Fragaria vesca (F.vesca). To improve the performance of the proposed model and to avoid overfitting, a recursive feature elimination with cross-validation (RFECV) strategy is used to extract the optimal number of features (ONF) subset from five different DNA sequence encoding schemes, i.e., Binary Encoding (BE), Ring-Function-Hydrogen-Chemical Properties (RFHC), Electron-Ion-Interaction Pseudo Potentials of Nucleotides (EIIP), Dinucleotide Physicochemical Properties (DPCP), and Trinucleotide Physicochemical Properties (TPCP). Subsequently, we use the ONF subset to train a double layers of ML-based stacking model to create a bioinformatics tool named 'i6mA-stack'. This tool outperforms its peer tool in general and is currently available at http://nsclbio.jbnu.ac.kr/tools/i6mA-stack/.

Subject(s)

Adenine/analogs & derivatives , DNA Methylation , Rosaceae/genetics , Sequence Analysis, DNA/methods , Software , Adenine/metabolism

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL