Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 298
Filter
1.
Mol Inform ; : e202400186, 2024 Oct 10.
Article in English | MEDLINE | ID: mdl-39390672

ABSTRACT

Herein we report a virtual library of 1E+60 members, a common estimate for the size of the drug-like chemical space. The library consists of linear or cyclic oligomers forming molecules within the size range of peptide drugs. We demonstrate ligand-based virtual screening using a genetic algorithm.

2.
Toxicol Sci ; 2024 Sep 10.
Article in English | MEDLINE | ID: mdl-39254655

ABSTRACT

Peptides have emerged as promising therapeutic agents. However, their potential is hindered by hemotoxicity. Understanding the hemotoxicity of peptides is crucial for developing safe and effective peptide-based therapeutics. Here, we employed chemical space complex networks (CSNs) to unravel the hemotoxicity tapestry of peptides. CSNs are powerful tools for visualizing and analyzing the relationships between peptides based on their physicochemical properties and structural features. We constructed CSNs from the StarPepDB database, encompassing 2004 hemolytic peptides, and explored the impact of seven different (dis)similarity measures on network topology and cluster (communities) distribution. Our findings revealed that each CSN extracts orthogonal information, enhancing the motif discovery and enrichment process. We identified 12 consensus hemolytic motifs, whose amino acid composition unveiled a high abundance of lysine, leucine, and valine residues, while aspartic acid, methionine, histidine, asparagine and glutamine were depleted. Additionally, physicochemical properties were used to characterize clusters/communities of hemolytic peptides. To predict hemolytic activity directly from peptide sequences, we constructed multi-query similarity searching models (MQSSMs), which outperformed cutting-edge machine learning (ML)-based models, demonstrating robust hemotoxicity prediction capabilities. Overall, this novel in silico approach uses complex network science as its central strategy to develop robust model classifiers, to characterize the chemical space and to discover new motifs from hemolytic peptides. This will help to enhance the design/selection of peptides with potential therapeutic activity and low toxicity.

3.
Theory Biosci ; 2024 Sep 11.
Article in English | MEDLINE | ID: mdl-39259256

ABSTRACT

In an effort to expand the domain of mathematical chemistry and inspire research beyond the realms of graph theory and quantum chemistry, we explore five mathematical chemistry spaces and their interconnectedness. These spaces comprise the chemical space, which encompasses substances and reactions; the space of reaction conditions, spanning the physical and chemical aspects involved in chemical reactions; the space of reaction grammars, which encapsulates the rules for creating and breaking chemical bonds; the space of substance properties, covering all documented measurements regarding substances; and the space of substance representations, composed of the various ontologies for characterising substances.

4.
Expert Opin Drug Discov ; 19(10): 1173-1183, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39132881

ABSTRACT

INTRODUCTION: For the past two decades, virtual screening (VS) has been an efficient hit finding approach for drug discovery. Today, billions of commercially accessible compounds are routinely screened, and many successful examples of VS have been reported. VS methods continue to evolve, including machine learning and physics-based methods. AREAS COVERED: The authors examine recent examples of VS in drug discovery and discuss prospective hit finding results from the critical assessment of computational hit-finding experiments (CACHE) challenge. The authors also highlight the cost considerations and open-source options for conducting VS and examine chemical space coverage and library selections for VS. EXPERT OPINION: The advancement of sophisticated VS approaches, including the use of machine learning techniques and increased computer resources as well as the ease of access to synthetically available chemical spaces, and commercial and open-source VS platforms allow for interrogating ultra-large libraries (ULL) of billions of molecules. An impressive number of prospective ULL VS campaigns have generated potent and structurally novel hits across many target classes. Nonetheless, many successful contemporary VS approaches still use considerably smaller focused libraries. This apparent dichotomy illustrates that VS is best conducted in a fit-for-purpose way choosing an appropriate chemical space. Better methods need to be developed to tackle more challenging targets.


Subject(s)
Drug Discovery , Machine Learning , Small Molecule Libraries , Drug Discovery/methods , Humans , Drug Design , High-Throughput Screening Assays/methods
5.
J Cheminform ; 16(1): 91, 2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39095893

ABSTRACT

Data scarcity is one of the most critical issues impeding the development of prediction models for chemical effects. Multitask learning algorithms leveraging knowledge from relevant tasks showed potential for dealing with tasks with limited data. However, current multitask methods mainly focus on learning from datasets whose task labels are available for most of the training samples. Since datasets were generated for different purposes with distinct chemical spaces, the conventional multitask learning methods may not be suitable. This study presents a novel multitask learning method MTForestNet that can deal with data scarcity problems and learn from tasks with distinct chemical space. The MTForestNet consists of nodes of random forest classifiers organized in the form of a progressive network, where each node represents a random forest model learned from a specific task. To demonstrate the effectiveness of the MTForestNet, 48 zebrafish toxicity datasets were collected and utilized as an example. Among them, two tasks are very different from other tasks with only 1.3% common chemicals shared with other tasks. In an independent test, MTForestNet with a high area under the receiver operating characteristic curve (AUC) value of 0.911 provided superior performance over compared single-task and multitask methods. The overall toxicity derived from the developed models of zebrafish toxicity is well correlated with the experimentally determined overall toxicity. In addition, the outputs from the developed models of zebrafish toxicity can be utilized as features to boost the prediction of developmental toxicity. The developed models are effective for predicting zebrafish toxicity and the proposed MTForestNet is expected to be useful for tasks with distinct chemical space that can be applied in other tasks.Scieific contributionA novel multitask learning algorithm MTForestNet was proposed to address the challenges of developing models using datasets with distinct chemical space that is a common issue of cheminformatics tasks. As an example, zebrafish toxicity prediction models were developed using the proposed MTForestNet which provide superior performance over conventional single-task and multitask learning methods. In addition, the developed zebrafish toxicity prediction models can reduce animal testing.

6.
bioRxiv ; 2024 Aug 10.
Article in English | MEDLINE | ID: mdl-39149242

ABSTRACT

The widespread use of Machine Learning (ML) techniques in chemical applications has come with the pressing need to analyze extremely large molecular libraries. In particular, clustering remains one of the most common tools to dissect the chemical space. Unfortunately, most current approaches present unfavorable time and memory scaling, which makes them unsuitable to handle million- and billion-sized sets. Here, we propose to bypass these problems with a time- and memory-efficient clustering algorithm, BitBIRCH. This method uses a tree structure similar to the one found in the Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm to ensure O N time scaling. BitBIRCH leverages the instant similarity (iSIM) formalism to process binary fingerprints, allowing the use of Tanimoto similarity, and reducing memory requirements. Our tests show that BitBIRCH is already > 1,000 times faster than standard implementations of the Taylor-Butina clustering for libraries with 1,500,000 molecules. BitBIRCH increases efficiency without compromising the quality of the resulting clusters. We explore strategies to handle large sets, which we applied in the clustering of one billion molecules under 5 hours using a parallel/iterative BitBIRCH approximation.

7.
J Cheminform ; 16(1): 98, 2024 Aug 12.
Article in English | MEDLINE | ID: mdl-39129016

ABSTRACT

The exponential growth of data is challenging for humans because their ability to analyze data is limited. Especially in chemistry, there is a demand for tools that can visualize molecular datasets in a convenient graphical way. We propose a new, ready-to-use, multi-tool, and open-source framework for visualizing and navigating chemical space. This framework adheres to the low-code/no-code (LCNC) paradigm, providing a KNIME node, a web-based tool, and a Python package, making it accessible to a broad cheminformatics community. The core technique of the MolCompass framework employs a pre-trained parametric t-SNE model. We demonstrate how this framework can be adapted for the visualisation of chemical space and visual validation of binary classification QSAR/QSPR models, revealing their weaknesses and identifying model cliffs. All parts of the framework are publicly available on GitHub, providing accessibility to the broad scientific community. Scientific contributionWe provide an open-source, ready-to-use set of tools for the visualization of chemical space. These tools can be insightful for chemists to analyze compound datasets and for the visual validation of QSAR/QSPR models.

8.
ACS Synth Biol ; 13(8): 2271-2275, 2024 Aug 16.
Article in English | MEDLINE | ID: mdl-39148431

ABSTRACT

Protein synthesis methods have been adapted to incorporate an ever-growing level of non-natural components. Meanwhile, design of de novo protein structure and function has rapidly emerged as a viable capability. Yet, these two exciting trends have yet to intersect in a meaningful way. The ability to perform de novo design with non-proteinogenic components requires that synthesis and computation align on common targets and applications. This perspective examines the state of the art in these areas and identifies specific, consequential applications to advance the field toward generalized macromolecule design.


Subject(s)
Macromolecular Substances , Protein Engineering , Proteins , Proteins/chemistry , Proteins/metabolism , Macromolecular Substances/chemistry , Macromolecular Substances/metabolism , Protein Engineering/methods
9.
Drug Discov Today ; 29(9): 104133, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39103144

ABSTRACT

Deep generative models (GMs) have transformed the exploration of drug-like chemical space (CS) by generating novel molecules through complex, nontransparent processes, bypassing direct structural similarity. This review examines five key architectures for CS exploration: recurrent neural networks (RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), normalizing flows (NF), and Transformers. It discusses molecular representation choices, training strategies for focused CS exploration, evaluation criteria for CS coverage, and related challenges. Future directions include refining models, exploring new notations, improving benchmarks, and enhancing interpretability to better understand biologically relevant molecular properties.


Subject(s)
Neural Networks, Computer , Pharmaceutical Preparations/chemistry , Artificial Intelligence , Drug Discovery/methods , Humans
10.
Environ Sci Technol ; 58(29): 12784-12822, 2024 Jul 23.
Article in English | MEDLINE | ID: mdl-38984754

ABSTRACT

In the modern "omics" era, measurement of the human exposome is a critical missing link between genetic drivers and disease outcomes. High-resolution mass spectrometry (HRMS), routinely used in proteomics and metabolomics, has emerged as a leading technology to broadly profile chemical exposure agents and related biomolecules for accurate mass measurement, high sensitivity, rapid data acquisition, and increased resolution of chemical space. Non-targeted approaches are increasingly accessible, supporting a shift from conventional hypothesis-driven, quantitation-centric targeted analyses toward data-driven, hypothesis-generating chemical exposome-wide profiling. However, HRMS-based exposomics encounters unique challenges. New analytical and computational infrastructures are needed to expand the analysis coverage through streamlined, scalable, and harmonized workflows and data pipelines that permit longitudinal chemical exposome tracking, retrospective validation, and multi-omics integration for meaningful health-oriented inferences. In this article, we survey the literature on state-of-the-art HRMS-based technologies, review current analytical workflows and informatic pipelines, and provide an up-to-date reference on exposomic approaches for chemists, toxicologists, epidemiologists, care providers, and stakeholders in health sciences and medicine. We propose efforts to benchmark fit-for-purpose platforms for expanding coverage of chemical space, including gas/liquid chromatography-HRMS (GC-HRMS and LC-HRMS), and discuss opportunities, challenges, and strategies to advance the burgeoning field of the exposome.


Subject(s)
Mass Spectrometry , Humans , Mass Spectrometry/methods , Exposome , Metabolomics , Proteomics/methods , Environmental Exposure
11.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38975893

ABSTRACT

The process of drug discovery is widely known to be lengthy and resource-intensive. Artificial Intelligence approaches bring hope for accelerating the identification of molecules with the necessary properties for drug development. Drug-likeness assessment is crucial for the virtual screening of candidate drugs. However, traditional methods like Quantitative Estimation of Drug-likeness (QED) struggle to distinguish between drug and non-drug molecules accurately. Additionally, some deep learning-based binary classification models heavily rely on selecting training negative sets. To address these challenges, we introduce a novel unsupervised learning framework called DrugMetric, an innovative framework for quantitatively assessing drug-likeness based on the chemical space distance. DrugMetric blends the powerful learning ability of variational autoencoders with the discriminative ability of the Gaussian Mixture Model. This synergy enables DrugMetric to identify significant differences in drug-likeness across different datasets effectively. Moreover, DrugMetric incorporates principles of ensemble learning to enhance its predictive capabilities. Upon testing over a variety of tasks and datasets, DrugMetric consistently showcases superior scoring and classification performance. It excels in quantifying drug-likeness and accurately distinguishing candidate drugs from non-drugs, surpassing traditional methods including QED. This work highlights DrugMetric as a practical tool for drug-likeness scoring, facilitating the acceleration of virtual drug screening, and has potential applications in other biochemical fields.


Subject(s)
Drug Discovery , Drug Discovery/methods , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/classification , Algorithms , Deep Learning , Artificial Intelligence
12.
Mol Inform ; 43(7): e202400052, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38994633

ABSTRACT

Compound databases of natural products play a crucial role in drug discovery and development projects and have implications in other areas, such as food chemical research, ecology and metabolomics. Recently, we put together the first version of the Latin American Natural Product database (LANaPDB) as a collective effort of researchers from six countries to ensemble a public and representative library of natural products in a geographical region with a large biodiversity. The present work aims to conduct a comparative and extensive profiling of the natural product-likeness of an updated version of LANaPDB and the individual ten compound databases that form part of LANaPDB. The natural product-likeness profile of the Latin American compound databases is contrasted with the profile of other major natural product databases in the public domain and a set of small-molecule drugs approved for clinical use. As part of the extensive characterization, we employed several chemoinformatics metrics of natural product likeness. The results of this study will capture the attention of the global community engaged in natural product databases, not only in Latin America but across the world.


Subject(s)
Biological Products , Biological Products/chemistry , Biological Products/pharmacology , Latin America , Small Molecule Libraries/pharmacology , Small Molecule Libraries/chemistry , Drug Discovery , Cheminformatics , Databases, Chemical
13.
Mol Inform ; 43(8): e202300316, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38979783

ABSTRACT

Computational exploration of chemical space is crucial in modern cheminformatics research for accelerating the discovery of new biologically active compounds. In this study, we present a detailed analysis of the chemical library of potential glucocorticoid receptor (GR) ligands generated by the molecular generator, Molpher. To generate the targeted GR library and construct the classification models, structures from the ChEMBL database as well as from the internal IMG library, which was experimentally screened for biological activity in the primary luciferase reporter cell assay, were utilized. The composition of the targeted GR ligand library was compared with a reference library that randomly samples chemical space. A random forest model was used to determine the biological activity of ligands, incorporating its applicability domain using conformal prediction. It was demonstrated that the GR library is significantly enriched with GR ligands compared to the random library. Furthermore, a prospective analysis demonstrated that Molpher successfully designed compounds, which were subsequently experimentally confirmed to be active on the GR. A collection of 34 potential new GR ligands was also identified. Moreover, an important contribution of this study is the establishment of a comprehensive workflow for evaluating computationally generated ligands, particularly those with potential activity against targets that are challenging to dock.


Subject(s)
Receptors, Glucocorticoid , Small Molecule Libraries , Receptors, Glucocorticoid/metabolism , Receptors, Glucocorticoid/chemistry , Ligands , Small Molecule Libraries/pharmacology , Small Molecule Libraries/chemistry , Humans
14.
Mol Divers ; 28(4): 2229-2244, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39020133

ABSTRACT

Helicobacter pylori is the main causative agent of gastric cancer, especially non-cardiac gastric cancers. This bacterium relies on urease producing much ammonia to colonize the host. Herein, the study provides valuable insights into structural patterns driving urease inhibition for high-activity molecules designed via exploring known inhibitors. Firstly, an ensemble model was devised to predict the inhibitory activity of novel compounds in an automated workflow (R2 = 0.761) that combines four machine learning approaches. The dataset was characterized in terms of chemical space, including molecular scaffolds, clustering analysis, distribution for physicochemical properties, and activity cliffs. Through these analyses, the hydroxamic acid group and the benzene ring responsible for distinct activity were highlighted. Activity cliff pairs uncovered substituents of the benzene ring on hydroxamic acid derivatives are key structures for substantial activity enhancement. Moreover, 11 hydroxamic acid derivatives were designed, named mol1-11. Results of molecular dynamic simulations showed that the mol9 exhibited stabilization of the active site flap's closed conformation and are expected to be promising drug candidates for Helicobacter pylori infection and further in vitro, in vivo, and clinical trials to demonstrate in future.


Subject(s)
Drug Design , Enzyme Inhibitors , Helicobacter pylori , Hydroxamic Acids , Molecular Dynamics Simulation , Urease , Helicobacter pylori/enzymology , Helicobacter pylori/drug effects , Urease/antagonists & inhibitors , Urease/chemistry , Hydroxamic Acids/chemistry , Hydroxamic Acids/pharmacology , Enzyme Inhibitors/chemistry , Enzyme Inhibitors/pharmacology , Structure-Activity Relationship , Anti-Bacterial Agents/pharmacology , Anti-Bacterial Agents/chemistry
15.
J Cheminform ; 16(1): 87, 2024 Jul 29.
Article in English | MEDLINE | ID: mdl-39075547

ABSTRACT

MOTIVATION: Chemical space embedding methods are widely utilized in various research settings for dimensional reduction, clustering and effective visualization. The maps generated by the embedding process can provide valuable insight to medicinal chemists in terms of the relationships between structural, physicochemical and biological properties of compounds. However, these maps are known to be difficult to interpret, and the ''landscape'' on the map is prone to ''rearrangement'' when embedding different sets of compounds. RESULTS: In this study we present the Hilbert-Curve Assisted Space Embedding (HCASE) method which was designed to create maps by organizing structures according to a logic familiar to medicinal chemists. First, a chemical space is created with the help of a set of ''reference scaffolds''. These scaffolds are sorted according to the medicinal chemistry inspired Scaffold-Key algorithm found in prior art. Next, the ordered scaffolds are mapped to a line which is folded into a higher dimensional (here: 2D) space. The intricately folded line is referred to as a pseudo-Hilbert-Curve. The embedding of a compound happens by locating its most similar reference scaffold in the pseudo-Hilbert-Curve and assuming the respective position. Through a series of experiments, we demonstrate the properties of the maps generated by the HCASE method. Subjects of embeddings were compounds of the DrugBank and CANVASS libraries, and the chemical spaces were defined by scaffolds extracted from the ChEMBL database. SCIENTIFIC CONTRIBUTION: The novelty of HCASE method lies in generating robust and intuitive chemical space embeddings that are reflective of a medicinal chemist's reasoning, and the precedential use of space filling (Hilbert) curve in the process. AVAILABILITY: https://github.com/ncats/hcase.

16.
Mol Inform ; 43(8): e202400050, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38979846

ABSTRACT

The exploration of chemical space is a fundamental aspect of chemoinformatics, particularly when one explores a large compound data set to relate chemical structures with molecular properties. In this study, we extend our previous work on chemical space visualization at the pharmacophoric level. Instead of using conventional binary classification of affinity (active vs inactive), we introduce a refined approach that categorizes compounds into four distinct classes based on their activity levels: super active, very active, active, and inactive. This classification enriches the color scheme applied to pharmacophore space, where the color representation of a pharmacophore hypothesis is driven by the associated compounds. Using the BCR-ABL tyrosine kinase as a case study, we identified intriguing regions corresponding to pharmacophore activity discontinuities, providing valuable insights for structure-activity relationships analysis.


Subject(s)
Fusion Proteins, bcr-abl , Protein Kinase Inhibitors , Fusion Proteins, bcr-abl/antagonists & inhibitors , Fusion Proteins, bcr-abl/chemistry , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Structure-Activity Relationship , Humans , Cheminformatics/methods , Pharmacophore
17.
Pharmaceuticals (Basel) ; 17(6)2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38931408

ABSTRACT

This work examines the current landscape of drug discovery and development, with a particular focus on the chemical and pharmacological spaces. It emphasizes the importance of understanding these spaces to anticipate future trends in drug discovery. The use of cheminformatics and data analysis enabled in silico exploration of these spaces, allowing a perspective of drugs, approved drugs after 2020, and clinical candidates, which were extracted from the newly released ChEMBL34 (March 2024). This perspective on chemical and pharmacological spaces enables the identification of trends and areas to be occupied, thereby creating opportunities for more effective and targeted drug discovery and development strategies in the future.

18.
Environ Sci Technol ; 58(27): 12135-12146, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38916220

ABSTRACT

Biosolids are a byproduct of wastewater treatment that can be beneficially applied to agricultural land as a fertilizer. While U.S. regulations limit metals and pathogens in biosolids intended for land applications, no organic contaminants are currently regulated. Novel techniques can aid in detection, evaluation, and prioritization of biosolid-associated organic contaminants (BOCs). For example, nontargeted analysis (NTA) can detect a broad range of chemicals, producing data sets representing thousands of measured analytes that can be combined with computational toxicological tools to support human and ecological hazard assessment and prioritization. We combined NTA with a computer-based tool from the U.S. EPA, the Cheminformatics Hazard Comparison Module (HCM), to identify and prioritize BOCs present in U.S. and Canadian biosolids (n = 16). Four-hundred fifty-one features were detected in at least 80% of samples, with identities of 92 compounds confirmed or assigned probable structures. These compounds were primarily categorized as endogenous compounds, pharmaceuticals, industrial chemicals, and fragrances. Examples of top prioritized compounds were p-cresol and chlorophene, based on human health end points, and fludioxonil and triclocarban, based on ecological health end points. Combining NTA results with hazard comparison data allowed us to prioritize compounds to be included in future studies of the environmental fate and transport of BOCs.


Subject(s)
Wastewater , Wastewater/chemistry , Environmental Monitoring/methods , Humans , Organic Chemicals/analysis
19.
SAR QSAR Environ Res ; 35(4): 325-342, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38690773

ABSTRACT

This study aims to comprehensively characterize 576 inhibitors targeting Spleen Tyrosine Kinase (SYK), a non-receptor tyrosine kinase primarily found in haematopoietic cells, with significant relevance to B-cell receptor function. The objective is to gain insights into the structural requirements essential for potent activity, with implications for various therapeutic applications. Through chemoinformatic analyses, we focus on exploring the chemical space, scaffold diversity, and structure-activity relationships (SAR). By leveraging ECFP4 and MACCS fingerprints, we elucidate the relationship between chemical compounds and visualize the network using RDKit and NetworkX platforms. Additionally, compound clustering and visualization of the associated chemical space aid in understanding overall diversity. The outcomes include identifying consensus diversity patterns to assess global chemical space diversity. Furthermore, incorporating pairwise activity differences enhances the activity landscape visualization, revealing heterogeneous SAR patterns. The dataset analysed in this work has three activity cliff generators, CHEMBL3415598, CHEMBL4780257, and CHEMBL3265037, compounds with high affinity to SYK are very similar to compounds analogues with reasonable potency differences. Overall, this study provides a critical analysis of SYK inhibitors, uncovering potential scaffolds and chemical moieties crucial for their activity, thereby advancing the understanding of their therapeutic potential.


Subject(s)
Protein Kinase Inhibitors , Syk Kinase , Syk Kinase/antagonists & inhibitors , Syk Kinase/metabolism , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Structure-Activity Relationship , Quantitative Structure-Activity Relationship
20.
J Cheminform ; 16(1): 53, 2024 May 13.
Article in English | MEDLINE | ID: mdl-38741153

ABSTRACT

Molecular fingerprints are indispensable tools in cheminformatics. However, stereochemistry is generally not considered, which is problematic for large molecules which are almost all chiral. Herein we report MAP4C, a chiral version of our previously reported fingerprint MAP4, which lists MinHashes computed from character strings containing the SMILES of all pairs of circular substructures up to a diameter of four bonds and the shortest topological distance between their central atoms. MAP4C includes the Cahn-Ingold-Prelog (CIP) annotation (R, S, r or s) whenever the chiral atom is the center of a circular substructure, a question mark for undefined stereocenters, and double bond cis-trans information if specified. MAP4C performs slightly better than the achiral MAP4, ECFP and AP fingerprints in non-stereoselective virtual screening benchmarks. Furthermore, MAP4C distinguishes between stereoisomers in chiral molecules from small molecule drugs to large natural products and peptides comprising thousands of diastereomers, with a degree of distinction smaller than between structural isomers and proportional to the number of chirality changes. Due to its excellent performance across diverse molecular classes and its ability to handle stereochemistry, MAP4C is recommended as a generally applicable chiral molecular fingerprint. SCIENTIFIC CONTRIBUTION: The ability of our chiral fingerprint MAP4C to handle stereoisomers from small molecules to large natural products and peptides is unprecedented and opens the way for cheminformatics to include stereochemistry as an important molecular parameter across all fields of molecular design.

SELECTION OF CITATIONS
SEARCH DETAIL