Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 571
Filter
1.
RSC Med Chem ; 15(5): 1547-1555, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38784468

ABSTRACT

Most machine learning (ML) methods produce predictions that are hard or impossible to understand. The black box nature of predictive models obscures potential learning bias and makes it difficult to recognize and trace problems. Moreover, the inability to rationalize model decisions causes reluctance to accept predictions for experimental design. For ML, limited trust in predictions presents a substantial problem and continues to limit its impact in interdisciplinary research, including early-phase drug discovery. As a desirable remedy, approaches from explainable artificial intelligence (XAI) are increasingly applied to shed light on the ML black box and help to rationalize predictions. Among these is the concept of counterfactuals (CFs), which are best understood as test cases with small modifications yielding opposing prediction outcomes (such as different class labels in object classification). For ML applications in medicinal chemistry, for example, compound activity predictions, CFs are particularly intuitive because these hypothetical molecules enable immediate comparisons with actual test compounds that do not require expert ML knowledge and are accessible to practicing chemists. Such comparisons often reveal structural moieties in compounds that determine their predictions and can be further investigated. Herein, we adapt and extend a recently introduced concept for the systematic generation of molecular CFs to multi-task predictions of different classes of protein kinase inhibitors, analyze CFs in detail, rationalize the origins of CF formation in multi-task modeling, and present exemplary explanations of predictions.

2.
J Cheminform ; 16(1): 55, 2024 May 22.
Article in English | MEDLINE | ID: mdl-38778425

ABSTRACT

Deep learning models adapted from natural language processing offer new opportunities for the prediction of active compounds via machine translation of sequential molecular data representations. For example, chemical language models are often derived for compound string transformation. Moreover, given the principal versatility of language models for translating different types of textual representations, off-the-beaten-path design tasks might be explored. In this work, we have investigated generative design of active compounds with desired potency from target sequence embeddings, representing a rather provoking prediction task. Therefore, a dual-component conditional language model was designed for learning from multimodal data. It comprised a protein language model component for generating target sequence embeddings and a conditional transformer for predicting new active compounds with desired potency. To this end, the designated "biochemical" language model was trained to learn mappings of combined protein sequence and compound potency value embeddings to corresponding compounds, fine-tuned on individual activity classes not encountered during model derivation, and evaluated on compound test sets that were structurally distinct from training sets. The biochemical language model correctly reproduced known compounds with different potency for all activity classes, providing proof-of-concept for the approach. Furthermore, the conditional model consistently reproduced larger numbers of known compounds as well as more potent compounds than an unconditional model, revealing a substantial effect of potency conditioning. The biochemical language model also generated structurally diverse candidate compounds departing from both fine-tuning and test compounds. Overall, generative compound design based on potency value-conditioned target sequence embeddings yielded promising results, rendering the approach attractive for further exploration and practical applications. SCIENTIFIC CONTRIBUTION: The approach introduced herein combines protein language model and chemical language model components, representing an advanced architecture, and is the first methodology for predicting compounds with desired potency from conditioned protein sequence data.

3.
Eur J Med Chem ; 273: 116522, 2024 May 23.
Article in English | MEDLINE | ID: mdl-38801799

ABSTRACT

The growing number of scientific papers and document sources underscores the need for methods capable of evaluating the quality of publications. Researchers who are looking for relevant papers for their studies need ways to assess the scientific value of these documents. One approach involves using semantic search engines that can automatically extract important knowledge from the growing body of text. In this study, we introduce a new metric called "MAATrica," which serves as the foundation for an innovative method designed to evaluate research papers. MAATrica offers a new way to analyze and categorize text, focusing on the consistency of research documents in the life sciences, particularly in the fields of medicinal and nutraceutical chemistry. This method utilizes semantic descriptions to cover in silico experiments, as well as in vitro and in vivo essays. Created to aid in evaluation processes like peer review, MAATrica uses toolkits and semantic applications to build the proposed measure, identify scientific entities, and gather information. We have applied MAATrica to roughly 90,000 papers and present our findings here.

4.
Eur J Med Chem ; 271: 116413, 2024 May 05.
Article in English | MEDLINE | ID: mdl-38636127

ABSTRACT

The continued growth of data from biological screening and medicinal chemistry provides opportunities for data-driven experimental design and decision making in early-phase drug discovery. Approaches adopted from data science help to integrate internal and public domain data and extract knowledge from historical in-house data. Protein kinase (PK) drug discovery is an exemplary area where large amounts of data are accumulating, providing a valuable knowledge base for discovery projects. Herein, the evolution of PK drug discovery and development of small molecular PK inhibitors (PKIs) is reviewed, highlighting milestone developments in the field and discussing exemplary studies providing a basis for increasing data orientation of PK discovery efforts.


Subject(s)
Drug Discovery , Protein Kinase Inhibitors , Protein Kinases , Protein Kinase Inhibitors/pharmacology , Protein Kinase Inhibitors/chemistry , Humans , Protein Kinases/metabolism , Protein Kinases/chemistry , Molecular Structure
5.
STAR Protoc ; 5(2): 103010, 2024 Apr 11.
Article in English | MEDLINE | ID: mdl-38607924

ABSTRACT

Shapley values from cooperative game theory are adapted for explaining machine learning predictions. For large feature sets used in machine learning, Shapley values are approximated. We present a protocol for two techniques for explaining support vector machine predictions with exact Shapley value computation. We detail the application of these algorithms and provide ready-to-use Python scripts and custom code. The final output of the protocol includes quantitative feature analysis and mapping of important features for visualization. For complete details on the use and execution of this protocol, please refer to Feldmann and Bajorath1 and Mastropietro et al.2.

6.
Biomolecules ; 14(3)2024 Feb 21.
Article in English | MEDLINE | ID: mdl-38540679

ABSTRACT

Protein kinases (PKs) are involved in many intracellular signal transduction pathways through phosphorylation cascades and have become intensely investigated pharmaceutical targets over the past two decades. Inhibition of PKs using small-molecular inhibitors is a premier strategy for the treatment of diseases in different therapeutic areas that are caused by uncontrolled PK-mediated phosphorylation and aberrant signaling. Most PK inhibitors (PKIs) are directed against the ATP cofactor binding site that is largely conserved across the human kinome comprising 518 wild-type PKs (and many mutant forms). Hence, these PKIs often have varying degrees of multi-PK activity (promiscuity) that is also influenced by factors such as single-site mutations in the cofactor binding region, compound binding kinetics, and residence times. The promiscuity of PKIs is often-but not always-critically important for therapeutic efficacy through polypharmacology. Various in vitro and in vivo studies have also indicated that PKIs have the potential of interacting with additional targets other than PKs, and different secondary cellular targets of individual PKIs have been identified on a case-by-case basis. Given the strong interest in PKs as drug targets, a wealth of PKIs from medicinal chemistry and their activity data from many assays and biological screens have become publicly available over the years. On the basis of these data, for the first time, we conducted a systematic search for non-PK targets of PKIs across the human kinome. Starting from a pool of more than 155,000 curated human PKIs, our large-scale analysis confirmed secondary targets from diverse protein classes for 447 PKIs on the basis of high-confidence activity data. These PKIs were active against 390 human PKs, covering all kinase groups of the kinome and 210 non-PK targets, which included other popular pharmaceutical targets as well as currently unclassified proteins. The target distribution and promiscuity of the 447 PKIs were determined, and different interaction profiles with PK and non-PK targets were identified. As a part of our study, the collection of PKIs with activity against non-PK targets and the associated information are made freely available.


Subject(s)
Protein Kinases , Signal Transduction , Humans , Protein Kinases/metabolism , Phosphorylation , Binding Sites , Pharmaceutical Preparations , Protein Kinase Inhibitors/pharmacology , Protein Kinase Inhibitors/chemistry
7.
Sci Rep ; 14(1): 6536, 2024 Mar 19.
Article in English | MEDLINE | ID: mdl-38503823

ABSTRACT

The assessment of prediction variance or uncertainty contributes to the evaluation of machine learning models. In molecular machine learning, uncertainty quantification is an evolving area of research where currently no standard approaches or general guidelines are available. We have carried out a detailed analysis of deep neural network variants and simple control models for compound potency prediction to study relationships between prediction accuracy and uncertainty. For comparably accurate predictions obtained with models of different complexity, highly variable prediction uncertainties were detected using different metrics. Furthermore, a strong dependence of prediction characteristics and uncertainties on potency levels of test compounds was observed, often leading to over- or under-confident model decisions with respect to the expected variance of predictions. Moreover, neural network models responded very differently to training set modifications. Taken together, our findings indicate that there is only little, if any correlation between compound potency prediction accuracy and uncertainty, especially for deep neural network models, when predictions are assessed on the basis of currently used metrics for uncertainty quantification.

8.
Mol Inform ; 43(1): e202300288, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38010610

ABSTRACT

In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be predicted from protein sequence motifs. Novel off-the-beat-path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed.


Subject(s)
Chemistry, Pharmaceutical , Drug Discovery , Electric Power Supplies , Models, Chemical , Natural Language Processing
9.
ChemMedChem ; 19(3): e202300586, 2024 02 01.
Article in English | MEDLINE | ID: mdl-37983655

ABSTRACT

The use of black box machine learning models whose decisions cannot be understood limits the acceptance of predictions in interdisciplinary research and camouflages artificial learning characteristics leading to predictions for other than anticipated reasons. Consequently, there is increasing interest in explainable artificial intelligence to rationalize predictions and uncover potential pitfalls. Among others, relevant approaches include feature attribution methods to identify molecular structures determining predictions and counterfactuals (CFs) or contrastive explanations. CFs are defined as variants of test instances with minimal modifications leading to opposing predictions. In medicinal chemistry, CFs have thus far only been little investigated although they are particularly intuitive from a chemical perspective. We introduce a new methodology for the systematic generation of CFs that is centered on well-defined structural analogues of test compounds. The approach is transparent, computationally straightforward, and shown to provide a wealth of CFs for test sets. The method is made freely available.


Subject(s)
Artificial Intelligence , Machine Learning , Chemistry, Pharmaceutical , Recombination, Genetic
10.
Sci Rep ; 13(1): 19561, 2023 Nov 10.
Article in English | MEDLINE | ID: mdl-37949930

ABSTRACT

Machine learning (ML) algorithms are extensively used in pharmaceutical research. Most ML models have black-box character, thus preventing the interpretation of predictions. However, rationalizing model decisions is of critical importance if predictions should aid in experimental design. Accordingly, in interdisciplinary research, there is growing interest in explaining ML models. Methods devised for this purpose are a part of the explainable artificial intelligence (XAI) spectrum of approaches. In XAI, the Shapley value concept originating from cooperative game theory has become popular for identifying features determining predictions. The Shapley value concept has been adapted as a model-agnostic approach for explaining predictions. Since the computational time required for Shapley value calculations scales exponentially with the number of features used, local approximations such as Shapley additive explanations (SHAP) are usually required in ML. The support vector machine (SVM) algorithm is one of the most popular ML methods in pharmaceutical research and beyond. SVM models are often explained using SHAP. However, there is only limited correlation between SHAP and exact Shapley values, as previously demonstrated for SVM calculations using the Tanimoto kernel, which limits SVM model explanation. Since the Tanimoto kernel is a special kernel function mostly applied for assessing chemical similarity, we have developed the Shapley value-expressed radial basis function (SVERAD), a computationally efficient approach for the calculation of exact Shapley values for SVM models based upon radial basis function kernels that are widely applied in different areas. SVERAD is shown to produce meaningful explanations of SVM predictions.

11.
J Chem Inf Model ; 63(22): 7032-7044, 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-37943257

ABSTRACT

Potency predictions are popular in compound design and optimization but are complicated by intrinsic limitations. Moreover, even for nonlinear methods, activity cliffs (ACs, formed by structural analogues with large potency differences) represent challenging test cases for compound potency predictions. We have devised a new test system for potency predictions, including AC compounds, that is based on partitioned matched molecular pairs (MMP) and makes it possible to monitor prediction accuracy at the level of analogue pairs with increasing potency differences. The results of systematic predictions using different machine learning and control methods on MMP-based data sets revealed increasing prediction errors when potency differences between corresponding training and test compounds increased, including large prediction errors for AC compounds. At the global level, these prediction errors were not apparent due to the statistical dominance of analogue pairs with small potency differences. Test compounds from such pairs were accurately predicted and determined the observed global prediction accuracy. Shapley value analysis, an explainable artificial intelligence approach, was applied to identify structural features determining potency predictions using different methods. The analysis revealed that numerical predictions of different regression models were determined by features that were shared by MMP partner compounds or absent in these compounds, with opposing effects. These findings provided another rationale for accurate predictions of similar potency values for structural analogues and failures in predicting the potency of AC compounds.


Subject(s)
Artificial Intelligence , Machine Learning , Structure-Activity Relationship
12.
Sci Rep ; 13(1): 17816, 2023 10 19.
Article in English | MEDLINE | ID: mdl-37857835

ABSTRACT

Compound potency predictions play a major role in computational drug discovery. Predictive methods are typically evaluated and compared in benchmark calculations that are widely applied. Previous studies have revealed intrinsic limitations of potency prediction benchmarks including very similar performance of increasingly complex machine learning methods and simple controls and narrow error margins separating machine learning from randomized predictions. However, origins of these limitations are currently unknown. We have carried out an in-depth analysis of potential reasons leading to artificial outcomes of potency predictions using different methods. Potency predictions on activity classes typically used in benchmark settings were found to be determined by compounds with intermediate potency close to median values of the compound data sets. The potency of these compounds was consistently predicted with high accuracy, without the need for learning, which dominated the results of benchmark calculations, regardless of the activity classes used. Taken together, our findings provide a clear rationale for general limitations of compound potency benchmark predictions and a basis for the design of alternative test systems for methodological comparisons.


Subject(s)
Drug Discovery , Machine Learning
13.
Sci Rep ; 13(1): 16145, 2023 09 26.
Article in English | MEDLINE | ID: mdl-37752164

ABSTRACT

For many machine learning applications in drug discovery, only limited amounts of training data are available. This typically applies to compound design and activity prediction and often restricts machine learning, especially deep learning. For low-data applications, specialized learning strategies can be considered to limit required training data. Among these is meta-learning that attempts to enable learning in low-data regimes by combining outputs of different models and utilizing meta-data from these predictions. However, in drug discovery settings, meta-learning is still in its infancy. In this study, we have explored meta-learning for the prediction of potent compounds via generative design using transformer models. For different activity classes, meta-learning models were derived to predict highly potent compounds from weakly potent templates in the presence of varying amounts of fine-tuning data and compared to other transformers developed for this task. Meta-learning consistently led to statistically significant improvements in model performance, in particular, when fine-tuning data were limited. Moreover, meta-learning models generated target compounds with higher potency and larger potency differences between templates and targets than other transformers, indicating their potential for low-data compound design.


Subject(s)
Drug Discovery , Electric Power Supplies , Machine Learning
14.
Future Sci OA ; 9(9): FSO892, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37752915

ABSTRACT

Aim: Generation of high-quality data sets of protein kinase inhibitors (PKIs). Methodology: Publicly available PKIs with reliable activity data were curated. PKIs with very weak activity were classified as inactive. Analogue series and PKIs containing reactive groups (warheads) enabling covalent inhibition were systematically identified. Exemplary results & data: A total of 155,579 human and 3057 mouse PKIs were obtained. Human PKIs were active 440 kinases and included 13,949 covalent PKIs. The collection of qualifying PKIs and corresponding inactive compounds is made available as an open access deposition. Limitations & next steps: Potential limitations include activity data incompleteness and assay variance. The data set can be used to investigate PKIs with alternative modes of action and calibrate computational methods.


Protein kinases are proteins that play a role in how cells grow. In cancer cells, protein kinases are altered, which can cause abnormal growth. Protein kinase inhibitors (PKIs) specifically target protein kinases and are considered for treating different diseases, like cancer. In this study, we investigated a large number of PKIs that are available to the public to find ones with reliable activity data. We aim to understand how their structure affects their activity, including how these compounds bind to protein kinases. This helps us to identify different types of PKIs. Understanding PKIs is important for both basic research in the protein kinase field and drug discovery.

15.
J Chem Inf Model ; 63(18): 5916-5926, 2023 09 25.
Article in English | MEDLINE | ID: mdl-37675493

ABSTRACT

The endocannabinoid system, which includes cannabinoid receptor 1 and 2 subtypes (CB1R and CB2R, respectively), is responsible for the onset of various pathologies including neurodegeneration, cancer, neuropathic and inflammatory pain, obesity, and inflammatory bowel disease. Given the high similarity of CB1R and CB2R, generating subtype-selective ligands is still an open challenge. In this work, the Cannabinoid Iterative Revaluation for Classification and Explanation (CIRCE) compound prediction platform has been generated based on explainable machine learning to support the design of selective CB1R and CB2R ligands. Multilayer classifiers were combined with Shapley value analysis to facilitate explainable predictions. In test calculations, CIRCE predictions reached ∼80% accuracy and structural features determining ligand predictions were rationalized. CIRCE was designed as a web-based prediction platform that is made freely available as a part of our study.


Subject(s)
Internet , Machine Learning , Ligands , Receptors, Cannabinoid
16.
Molecules ; 28(15)2023 Aug 01.
Article in English | MEDLINE | ID: mdl-37570774

ABSTRACT

In drug discovery, protein kinase inhibitors (PKIs) are intensely investigated as drug candidates in different therapeutic areas. While ATP site-directed, non-covalent PKIs have long been a focal point in protein kinase (PK) drug discovery, in recent years, there has been increasing interest in allosteric PKIs (APKIs), which are expected to have high kinase selectivity. In addition, as compounds acting by covalent mechanisms experience a renaissance in drug discovery, there is also increasing interest in covalent PKIs (CPKIs). There are various reasons for this increasing interest such as the anticipated high potency, prolonged residence times compared to non-competitive PKIs, and other favorable pharmacokinetic properties. Due to the popularity of PKIs for therapeutic intervention, large numbers of PKIs and large volumes of activity data have accumulated in the public domain, providing a basis for large-scale computational analysis. We have systematically searched for CPKIs containing different reactive groups (warheads) and investigated their potency and promiscuity (multi-PK activity) on the basis of carefully curated activity data. For seven different warheads, sufficiently large numbers of CPKIs were available for detailed follow-up analysis. For only three warheads, the median potency of corresponding CPKIs was significantly higher than of non-covalent PKIs. However, for CKPIs with five of seven warheads, there was a significant increase in the median potency of at least 100-fold compared to PKI analogues without warheads. However, in the analysis of multi-PK activity, there was no general increase in the promiscuity of CPKIs compared to non-covalent PKIs. In addition, we have identified 29 new APKIs in X-ray structures of PK-PKI complexes. Among structurally characterized APKIs, 13 covalent APKIs in complexes with five PKs are currently available, enabling structure-based investigation of PK inhibition by covalent-allosteric mechanisms.


Subject(s)
Protein Kinase Inhibitors , Protein Kinases , Protein Kinase Inhibitors/pharmacology , Phosphorylation , Drug Discovery
17.
Molecules ; 28(14)2023 Jul 24.
Article in English | MEDLINE | ID: mdl-37513472

ABSTRACT

Most machine learning (ML) models produce black box predictions that are difficult, if not impossible, to understand. In pharmaceutical research, black box predictions work against the acceptance of ML models for guiding experimental work. Hence, there is increasing interest in approaches for explainable ML, which is a part of explainable artificial intelligence (XAI), to better understand prediction outcomes. Herein, we have devised a test system for the rationalization of multiclass compound activity prediction models that combines two approaches from XAI for feature relevance or importance analysis, including counterfactuals (CFs) and Shapley additive explanations (SHAP). For compounds with different single- and dual-target activities, we identified small compound modifications that induce feature changes inverting class label predictions. In combination with feature mapping, CFs and SHAP value calculations provide chemically intuitive explanations for model decisions.

18.
Biomolecules ; 13(5)2023 05 13.
Article in English | MEDLINE | ID: mdl-37238703

ABSTRACT

In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguely related to ligand binding. Deep language models adapted from natural language processing offer new opportunities to attempt such predictions via machine translation by directly relating amino acid sequences and chemical structures to each based on textual molecular representations. Herein, we introduce a biochemical language model with transformer architecture for the prediction of new active compounds from sequence motifs of ligand binding sites. In a proof-of-concept application on inhibitors of more than 200 human kinases, the Motif2Mol model revealed promising learning characteristics and an unprecedented ability to consistently reproduce known inhibitors of different kinases.


Subject(s)
Proteins , Humans , Protein Binding , Ligands , Binding Sites , Proteins/chemistry , Amino Acid Sequence
19.
J Med Chem ; 66(11): 7304-7330, 2023 06 08.
Article in English | MEDLINE | ID: mdl-37226670

ABSTRACT

The ATM kinase is a promising target in cancer treatment as an important regulator of the cellular response to DNA double-strand breaks. In this work, we present a new class of specific benzimidazole-based ATM inhibitors with picomolar potency against the isolated enzyme and favorable selectivity within relative PIKK and PI3K kinases. We could identify two promising inhibitor subgroups with significantly different physicochemical properties, which we developed simultaneously. These efforts lead to numerous highly active inhibitors with picomolar enzymatic activities. Furthermore, initial low cellular activities on A549 cells could be increased significantly in numerous examples resulting in cellular IC50 values in the subnanomolar range. Further characterization of the highly potent inhibitors 90 und 93 revealed promising pharmacokinetic properties and strong activities in organoids in combination with etoposide. Additionally, 93 showed no off-target activities within a kinome-representative mini kinase panel, with favorable selectivities within the PIKK- and PI3K-families.


Subject(s)
Benzimidazoles , Pyridines , Humans , Phosphoinositide-3 Kinase Inhibitors/pharmacology , Etoposide , Pyridines/pharmacology , Benzimidazoles/pharmacology , Phosphatidylinositol 3-Kinases/metabolism , Protein Kinase Inhibitors/pharmacology , Protein Kinase Inhibitors/chemistry , Ataxia Telangiectasia Mutated Proteins
20.
Sci Rep ; 13(1): 7412, 2023 05 07.
Article in English | MEDLINE | ID: mdl-37150793

ABSTRACT

Compound potency prediction is a major task in medicinal chemistry and drug design. Inspired by the concept of activity cliffs (which encode large differences in potency between similar active compounds), we have devised a new methodology for predicting potent compounds from weakly potent input molecules. Therefore, a chemical language model was implemented consisting of a conditional transformer architecture for compound design guided by observed potency differences. The model was evaluated using a newly generated compound test system enabling a rigorous assessment of its performance. It was shown to predict known potent compounds from different activity classes not encountered during training. Moreover, the model was capable of creating highly potent compounds that were structurally distinct from input molecules. It also produced many novel candidate compounds not included in test sets. Taken together, the findings confirmed the ability of the new methodology to generate structurally diverse highly potent compounds.


Subject(s)
Drug Design , Models, Chemical , Structure-Activity Relationship , Chemistry, Pharmaceutical/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...