Your browser doesn't support javascript.
Insights into performance evaluation of compound-protein interaction prediction methods.
Yaseen, Adiba; Amin, Imran; Akhter, Naeem; Ben-Hur, Asa; Minhas, Fayyaz.
  • Yaseen A; Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan.
  • Amin I; National Institute for Biotechnology and Genetic Engineering, Faisalabad 38000, Pakistan.
  • Akhter N; Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Islamabad 45650, Pakistan.
  • Ben-Hur A; Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA.
  • Minhas F; Department of Computer Science, University of Warwick, Coventry CV4 7AL, UK.
Bioinformatics ; 38(Supplement_2): ii75-ii81, 2022 Sep 16.
Article in English | MEDLINE | ID: covidwho-2037396
ABSTRACT
MOTIVATION Machine-learning-based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing. Despite numerous recent publication with increasing methodological sophistication claiming consistent improvements in predictive accuracy, we have observed a number of fundamental issues in experiment design that produce overoptimistic estimates of model performance.

RESULTS:

We systematically analyze the impact of several factors affecting generalization performance of CPI predictors that are overlooked in existing work (i) similarity between training and test examples in cross-validation; (ii) synthesizing negative examples in absence of experimentally verified negative examples and (iii) alignment of evaluation protocol and performance metrics with real-world use of CPI predictors in screening large compound libraries. Using both state-of-the-art approaches by other researchers as well as a simple kernel-based baseline, we have found that effective assessment of generalization performance of CPI predictors requires careful control over similarity between training and test examples. We show that, under stringent performance assessment protocols, a simple kernel-based approach can exceed the predictive performance of existing state-of-the-art methods. We also show that random pairing for generating synthetic negative examples for training and performance evaluation results in models with better generalization in comparison to more sophisticated strategies used in existing studies. Our analyses indicate that using proposed experiment design strategies can offer significant improvements for CPI prediction leading to effective target compound screening for drug repurposing and discovery of putative chemical ligands of SARS-CoV-2-Spike and Human-ACE2 proteins. AVAILABILITY AND IMPLEMENTATION Code and supplementary material available at https//github.com/adibayaseen/HKRCPI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Subject(s)

Full text: Available Collection: International databases Database: MEDLINE Main subject: Machine Learning / Angiotensin-Converting Enzyme 2 Type of study: Experimental Studies / Prognostic study / Randomized controlled trials / Reviews Limits: Humans Language: English Journal: Bioinformatics Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: Bioinformatics

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Main subject: Machine Learning / Angiotensin-Converting Enzyme 2 Type of study: Experimental Studies / Prognostic study / Randomized controlled trials / Reviews Limits: Humans Language: English Journal: Bioinformatics Journal subject: Medical Informatics Year: 2022 Document Type: Article Affiliation country: Bioinformatics