Search | VHL Regional Portal

Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.

Mahmud, S M Hasan; Goh, Kah Ong Michael; Hosen, Md Faruk; Nandi, Dip; Shoombuatong, Watshara.

Sci Rep ; 14(1): 2961, 2024 02 05.

Article in English | MEDLINE | ID: mdl-38316843

ABSTRACT

DNA-binding proteins (DBPs) play a significant role in all phases of genetic processes, including DNA recombination, repair, and modification. They are often utilized in drug discovery as fundamental elements of steroids, antibiotics, and anticancer drugs. Predicting them poses the most challenging task in proteomics research. Conventional experimental methods for DBP identification are costly and sometimes biased toward prediction. Therefore, developing powerful computational methods that can accurately and rapidly identify DBPs from sequence information is an urgent need. In this study, we propose a novel deep learning-based method called Deep-WET to accurately identify DBPs from primary sequence information. In Deep-WET, we employed three powerful feature encoding schemes containing Global Vectors, Word2Vec, and fastText to encode the protein sequence. Subsequently, these three features were sequentially combined and weighted using the weights obtained from the elements learned through the differential evolution (DE) algorithm. To enhance the predictive performance of Deep-WET, we applied the SHapley Additive exPlanations approach to remove irrelevant features. Finally, the optimal feature subset was input into convolutional neural networks to construct the Deep-WET predictor. Both cross-validation and independent tests indicated that Deep-WET achieved superior predictive performance compared to conventional machine learning classifiers. In addition, in extensive independent test, Deep-WET was effective and outperformed than several state-of-the-art methods for DBP prediction, with accuracy of 78.08%, MCC of 0.559, and AUC of 0.805. This superior performance shows that Deep-WET has a tremendous predictive capacity to predict DBPs. The web server of Deep-WET and curated datasets in this study are available at https://deepwet-dna.monarcatechnical.com/ . The proposed Deep-WET is anticipated to serve the community-wide effort for large-scale identification of potential DBPs.

Subject(s)

DNA-Binding Proteins , Deep Learning , Neural Networks, Computer , Algorithms , Machine Learning , Computational Biology/methods

DeepDNAbP: A deep learning-based hybrid approach to improve the identification of deoxyribonucleic acid-binding proteins.

Hosen, Md Faruk; Mahmud, S M Hasan; Ahmed, Kawsar; Chen, Wenyu; Moni, Mohammad Ali; Deng, Hong-Wen; Shoombuatong, Watshara; Hasan, Md Mehedi.

Comput Biol Med ; 145: 105433, 2022 06.

Article in English | MEDLINE | ID: mdl-35378437

ABSTRACT

Accurate identification of DNA-binding proteins (DBPs) is critical for both understanding protein function and drug design. DBPs also play essential roles in different kinds of biological activities such as DNA replication, repair, transcription, and splicing. As experimental identification of DBPs is time-consuming and sometimes biased toward prediction, constructing an effective DBP model represents an urgent need, and computational methods that can accurately predict potential DBPs based on sequence information are highly desirable. In this paper, a novel predictor called DeepDNAbP has been developed to accurately predict DBPs from sequences using a convolutional neural network (CNN) model. First, we perform three feature extraction methods, namely position-specific scoring matrix (PSSM), pseudo-amino acid composition (PseAAC) and tripeptide composition (TPC), to represent protein sequence patterns. Secondly, SHapley Additive exPlanations (SHAP) are employed to remove the redundant and irrelevant features for predicting DBPs. Finally, the best features are provided to the CNN classifier to construct the DeepDNAbP model for identifying DBPs. The final DeepDNAbP predictor achieves superior prediction performance in K-fold cross-validation tests and outperforms other existing predictors of DNA-protein binding methods. DeepDNAbP is poised to be a powerful computational resource for the prediction of DBPs. The web application and curated datasets in this study are freely available at: http://deepdbp.sblog360.blog/.

Subject(s)

Deep Learning , Computational Biology/methods , DNA , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Neural Networks, Computer , Position-Specific Scoring Matrices

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL