Search | VHL Regional Portal

A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction.

Idhaya, T; Suruliandi, A; Raja, S P.

Protein J ; 43(2): 171-186, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38427271

ABSTRACT

Proteomics is a field dedicated to the analysis of proteins in cells, tissues, and organisms, aiming to gain insights into their structures, functions, and interactions. A crucial aspect within proteomics is protein family prediction, which involves identifying evolutionary relationships between proteins by examining similarities in their sequences or structures. This approach holds great potential for applications such as drug discovery and functional annotation of genomes. However, current methods for protein family prediction have certain limitations, including limited accuracy, high false positive rates, and challenges in handling large datasets. Some methods also rely on homologous sequences or protein structures, which introduce biases and restrict their applicability to specific protein families or structures. To overcome these limitations, researchers have turned to machine learning (ML) approaches that can identify connections between protein features and simplify complex high-dimensional datasets. This paper presents a comprehensive survey of articles that employ various ML techniques for predicting protein families. The primary objective is to explore and improve ML techniques specifically for protein family prediction, thus advancing future research in the field. Through qualitative and quantitative analyses of ML techniques, it is evident that multiple methods utilizing a range of classifiers have been applied for protein family prediction. However, there has been limited focus on developing novel classifiers for protein family classification, highlighting the urgent need for improved approaches in this area. By addressing these challenges, this research aims to enhance the accuracy and effectiveness of protein family prediction, ultimately facilitating advancements in proteomics and its diverse applications.

Subject(s)

Machine Learning , Proteins , Proteins/chemistry , Proteomics/methods , Databases, Protein , Computational Biology/methods , Humans

Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques.

Idhaya, T; Suruliandi, A; Raja, S P.

Curr Drug Metab ; 24(12): 817-834, 2023.

Article in English | MEDLINE | ID: mdl-38270152

ABSTRACT

BACKGROUND: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction. METHODS: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI prediction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base. RESULTS: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction. CONCLUSION: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.

Subject(s)

Drug Discovery , Machine Learning , Humans , Databases, Factual , Drug Interactions

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL