Chemoinformatics and Machine Learning Approaches for Identifying Antiviral Compounds.

John, Lijo; Soujanya, Yarasi; Mahanta, Hridoy Jyoti; Narahari Sastry, G

John, Lijo; Soujanya, Yarasi; Mahanta, Hridoy Jyoti; Narahari Sastry, G.

John L; Centre for Molecular Modeling, CSIR-Indian Institute of Chemical Technology, Tarnaka, Hyderabad, 500 007, India.
Soujanya Y; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India.
Mahanta HJ; Centre for Molecular Modeling, CSIR-Indian Institute of Chemical Technology, Tarnaka, Hyderabad, 500 007, India.
Narahari Sastry G; Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, Uttar Pradesh, India.

Mol Inform ; 41(4): e2100190, 2022 04.

Article in English | MEDLINE | ID: covidwho-1527453

ABSTRACT

ABSTRACT

Current pandemics propelled research efforts in unprecedented fashion, primarily triggering computational efforts towards new vaccine and drug development as well as drug repurposing. There is an urgent need to design novel drugs with targeted biological activity and minimum adverse reactions that may be useful to manage viral outbreaks. Hence an attempt has been made to develop Machine Learning based predictive models that can be used to assess whether a compound has the potency to be antiviral or not. To this end, a set of 2358 antiviral compounds were compiled from the CAS COVID-19 antiviral SAR dataset whose activity was reported based on IC50 value. A total 1157 two-dimensional molecular descriptors were computed among which, the most highly correlated descriptors were selected using Tree-based, Correlation-based and Mutual information-based feature selection methods. Seven Machine Learning algorithms i. e., Random Forest, XGBoost, Support Vector Machine, KNN, Decision Tree, MLP Classifier and Logistic Regression were benchmarked. The best performance was achieved by the models developed using Random Forest and XGBoost algorithms in all the feature selection methods. The maximum predictive accuracy of both these models was 88 % with internal validation. Whereas, with an external dataset, a maximum accuracy of 93.10 % for XGBoost and 100 % for Random Forest based model was achievable. Furthermore, the study demonstrated scaffold analysis of the molecules as a pragmatic approach to explore the importance of structurally diverse compounds in data driven studies.

Subject(s)

COVID-19; Cheminformatics; Antiviral Agents/pharmacology; Humans; Machine Learning; Support Vector Machine

Keywords

Antivirals; Chemoinformatics; Feature Selection; MCC; Machine Learning; Molecular Descriptors; SARS-COVID-19

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Main subject: Cheminformatics / COVID-19 Type of study: Prognostic study / Randomized controlled trials Topics: Vaccines Limits: Humans Language: English Journal: Mol Inform Year: 2022 Document Type: Article Affiliation country: Minf.202100190

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google