Search | VHL Regional Portal

Predicting the cytotoxicity of chemicals using ensemble learning methods and molecular fingerprints.

Yin, Zimo; Ai, Haixin; Zhang, Li; Ren, Guofei; Wang, Yuming; Zhao, Qi; Liu, Hongsheng.

J Appl Toxicol ; 39(10): 1366-1377, 2019 10.

Article in English | MEDLINE | ID: mdl-30763981

ABSTRACT

The prediction of compound cytotoxicity is an important part of the drug discovery process. However, it usually appears as poor predictive performance because the datasets are high-throughput and have a class-imbalance problem. In this study, several strategies of performing a structure-activity relationship study for a cytotoxic endpoint in the AID364 dataset were explored to solve the class-imbalance problem. Random forest adaboost was used as the base learners for 10 types of molecular fingerprints and an ensemble method and six data-balancing methods were applied to balance the classes. As a result, the ensemble model using MACCS fingerprint was found to be the best, giving area under the curve of 85.2% ± 0.35%, sensitivity of 81.8% ± 0.65%, and specificity of 76.0% ± 0.12% in fivefold cross-validation and area under the curve of 78.8%, sensitivity of 55.5% and specificity of 78.5% in external validation. Good performance also appeared on other datasets with different sizes/degrees of imbalance. To explore the structural commonality of cytotoxic compounds, several substructures were identified as an important reference for substructure alerts. The convincing results indicate that the proposed models are helpful in predicting the cytotoxicity of chemicals.

Subject(s)

Carcinogens/classification , Carcinogens/toxicity , Drug Discovery/classification , Drug Discovery/methods , Machine Learning , Quantitative Structure-Activity Relationship , Algorithms , Humans

Predicting Drug-Induced Liver Injury Using Ensemble Learning Methods and Molecular Fingerprints.

Ai, Haixin; Chen, Wen; Zhang, Li; Huang, Liangchao; Yin, Zimo; Hu, Huan; Zhao, Qi; Zhao, Jian; Liu, Hongsheng.

Toxicol Sci ; 165(1): 100-107, 2018 09 01.

Article in English | MEDLINE | ID: mdl-29788510

ABSTRACT

Drug-induced liver injury (DILI) is a major safety concern in the drug-development process, and various methods have been proposed to predict the hepatotoxicity of compounds during the early stages of drug trials. In this study, we developed an ensemble model using 3 machine learning algorithms and 12 molecular fingerprints from a dataset containing 1241 diverse compounds. The ensemble model achieved an average accuracy of 71.1 ± 2.6%, sensitivity (SE) of 79.9 ± 3.6%, specificity (SP) of 60.3 ± 4.8%, and area under the receiver-operating characteristic curve (AUC) of 0.764 ± 0.026 in 5-fold cross-validation and an accuracy of 84.3%, SE of 86.9%, SP of 75.4%, and AUC of 0.904 in an external validation dataset of 286 compounds collected from the Liver Toxicity Knowledge Base. Compared with previous methods, the ensemble model achieved relatively high accuracy and SE. We also identified several substructures related to DILI. In addition, we provide a web server offering access to our models (http://ccsipb.lnu.edu.cn/toxicity/HepatoPred-EL/).

Subject(s)

Chemical and Drug Induced Liver Injury/etiology , Drug Discovery/methods , Pharmaceutical Preparations/chemistry , Algorithms , Animals , Machine Learning , Quantitative Structure-Activity Relationship , ROC Curve , Sensitivity and Specificity

CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods.

Zhang, Li; Ai, Haixin; Chen, Wen; Yin, Zimo; Hu, Huan; Zhu, Junfeng; Zhao, Jian; Zhao, Qi; Liu, Hongsheng.

Sci Rep ; 7(1): 2118, 2017 05 18.

Article in English | MEDLINE | ID: mdl-28522849

ABSTRACT

Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).

Subject(s)

Carcinogens/toxicity , Machine Learning , Software , Animals , Carcinogens/chemistry , Quantitative Structure-Activity Relationship , Rats

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL