Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Chem Theory Comput ; 20(14): 6303-6315, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-38978294

RESUMO

Molecular dynamics (MD) simulations are ideally suited to describe conformational ensembles of biomolecules such as proteins and nucleic acids. Microsecond-long simulations are now routine, facilitated by the emergence of graphical processing units. Clustering, which groups objects based on structural similarity, is typically used to process ensembles, leading to different states, their populations, and the identification of representative structures. A popular pipeline combines hierarchical clustering for clustering and selecting the cluster centroid as representative of the cluster. Here, we propose to improve on this approach, by developing a module-Protein Retrieval via Integrative Molecular Ensembles (PRIME), that consists of tools to improve the prediction of the representative in the most populated cluster using extended continuous similarity. PRIME is integrated with our Molecular Dynamics Analysis with N-ary Clustering Ensembles (MDANCE) package and can be used as a postprocessing tool for arbitrary clustering algorithms, compatible with several MD suites. PRIME predictions produced structures that when aligned to the experimental structure were better superposed (lower RMSD). A further benefit of PRIME is its linear scaling─rather than the traditional O(N2) traditionally associated with comparisons of elements in a set.


Assuntos
Algoritmos , Simulação de Dinâmica Molecular , Proteínas , Proteínas/química , Análise por Conglomerados , Conformação Proteica , Software
2.
Digit Discov ; 3(6): 1160-1171, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38873032

RESUMO

The quantification of molecular similarity has been present since the beginning of cheminformatics. Although several similarity indices and molecular representations have been reported, all of them ultimately reduce to the calculation of molecular similarities of only two objects at a time. Hence, to obtain the average similarity of a set of molecules, all the pairwise comparisons need to be computed, which demands a quadratic scaling in the number of computational resources. Here we propose an exact alternative to this problem: iSIM (instant similarity). iSIM performs comparisons of multiple molecules at the same time and yields the same value as the average pairwise comparisons of molecules represented by binary fingerprints and real-value descriptors. In this work, we introduce the mathematical framework and several applications of iSIM in chemical sampling, visualization, diversity selection, and clustering.

3.
J Chem Theory Comput ; 20(13): 5583-5597, 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38905589

RESUMO

One of the key challenges of k-means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as k-means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex data sets such as those obtained from molecular simulation, k-means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of k-means++ will lead to a lack of reproducibility. K-means N-Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient n-ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping k-means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse data sets and be used as a standalone tool or as part of our MDANCE clustering package.

4.
J Chem Phys ; 160(14)2024 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-38597308

RESUMO

Electron pairs have an illustrious history in chemistry, from powerful concepts to understanding structural stability and reactive changes to the promise of serving as building blocks of quantitative descriptions of the electronic structure of complex molecules and materials. However, traditionally, two-electron wavefunctions (geminals) have not enjoyed the popularity and widespread use of the more standard single-particle methods. This has changed recently, with a renewed interest in the development of geminal wavefunctions as an alternative to describing strongly correlated phenomena. Hence, there is a need to find geminal methods that are accurate, computationally tractable, and do not demand significant input from the user (particularly via cumbersome and often ill-behaved orbital optimization steps). Here, we propose new families of geminal wavefunctions inspired by the pair coupled cluster doubles ansatz. We present a new hierarchy of two-electron wavefunctions that extends the one-reference orbital idea to other geminals. Moreover, we show how to incorporate single-like excitations in this framework without leaving the quasiparticle picture. We explore the role of imposing seniority restrictions on these wavefunctions and benchmark these new methods on model strongly correlated systems.

5.
Digit Discov ; 3(4): 805-817, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38638647

RESUMO

Imaging mass spectrometry is a label-free imaging modality that allows for the spatial mapping of many compounds directly in tissues. In an imaging mass spectrometry experiment, a raster of the tissue surface produces a mass spectrum at each sampled x, y position, resulting in thousands of individual mass spectra, each comprising a pixel in the resulting ion images. However, efficient analysis of imaging mass spectrometry datasets can be challenging due to the hyperspectral characteristics of the data. Each spectrum contains several thousand unique compounds at discrete m/z values that result in unique ion images, which demands robust and efficient algorithms for searching, statistical analysis, and visualization. Some traditional post-processing techniques are fundamentally ill-equipped to dissect these types of data. For example, while principal component analysis (PCA) has long served as a useful tool for mining imaging mass spectrometry datasets to identify correlated analytes and biological regions of interest, the interpretation of the PCA scores and loadings can be non-trivial. The loadings often contain negative peaks in the PCA-derived pseudo-spectra, which are difficult to ascribe to underlying tissue biology. Herein, we have utilized extended similarity indices to streamline the interpretation of imaging mass spectrometry data. This novel workflow uses PCA as a pixel-selection method to parse out the most and least correlated pixels, which are then compared using the extended similarity indices. The extended similarity indices complement PCA by removing all non-physical artifacts and streamlining the interpretation of large volumes of imaging mass spectrometry spectra simultaneously. The linear complexity, O(N), of these indices suggests that large imaging mass spectrometry datasets can be analyzed in a 1 : 1 scale of time and space with respect to the size of the input data. The extended similarity indices algorithmic workflow is exemplified here by identifying discrete biological regions of mouse brain tissue.

6.
J Phys Chem A ; 128(17): 3458-3467, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38651558

RESUMO

We propose a new perturbation theory framework that can be used to help with the projective solution of the Schrödinger equation for arbitrary wave functions. This Flexible Ansatz for N-body Perturbation Theory (FANPT) is based on our previously proposed Flexible Ansatz for the N-body Configuration Interaction (FANCI). We derive recursive FANPT expressions, including arbitrary orders in the perturbation hierarchy. We show that the FANPT equations are well-behaved across a wide range of conditions, including static correlation-dominated configurations and highly nonlinear wave functions.

7.
bioRxiv ; 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38496504

RESUMO

One of the key challenges of k-means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as k-means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex datasets such as those obtained from molecular simulation, k-means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of k-means++ will lead to a lack of reproducibility. K-means N-Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient n-ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping k-means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse datasets and be used as a standalone tool or as part of our MDANCE clustering package.

8.
Chemphyschem ; 25(1): e202300566, 2024 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-37883736

RESUMO

We introduce certain concepts and expressions from conceptual density functional theory (DFT) to study the properties of the Hildebrand solubility parameter. The original form of the Hildebrand solubility parameter is used to qualitatively estimate solubilities for various apolar and aprotic substances and solvents and is based on the square root of the cohesive energy density. Our results show that a revised expression allows the replacement of cohesive energy densities by electrophilicity densities, which are numerically accessible by simple DFT calculations. As an extension, the reformulated expression provides a deeper interpretation of the main contributions and, in particular, emphasizes the importance of charge transfer mechanisms. All calculated values of the Hildebrand parameters for a large number of common solvents are compared with experimental values and show good agreement for non- or moderately polar aprotic solvents in agreement with the original formulation of the Hildebrand solubility parameters. The observed deviations for more polar and protic solvents define robust limits from the original formulation which remain valid. Likewise, we show that the use of machine learning methods leads to only slightly better predictability.

9.
Molecules ; 28(17)2023 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-37687162

RESUMO

Visualization of the chemical space is useful in many aspects of chemistry, including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where the visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space. ChemMaps is a visualization methodology that approximates the distribution of compounds in large datasets based on the selection of satellite compounds that yield a similar mapping of the whole dataset when principal component analysis on a similarity matrix is performed. Here, we show how the recently proposed extended similarity indices can help find regions that are relevant to sample satellites and reduce the amount of high-dimensional data needed to describe a library's chemical space.

10.
bioRxiv ; 2023 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-37546817

RESUMO

Imaging mass spectrometry is a label-free imaging modality that allows for the spatial mapping of many compounds directly in tissues. In an imaging mass spectrometry experiment, a raster of the tissue surface produces a mass spectrum at each sampled x, y position, resulting in thousands of individual mass spectra, each comprising a pixel in the resulting ion images. However, efficient analysis of imaging mass spectrometry datasets can be challenging due to the hyperspectral characteristics of the data. Each spectrum contains several thousand unique compounds at discrete m/z values that result in unique ion images, which demands robust and efficient algorithms for searching, statistical analysis, and visualization. Some traditional post-processing techniques are fundamentally ill-equipped to dissect these types of data. For example, while principal component analysis (PCA) has long served as a useful tool for mining imaging mass spectrometry datasets to identify correlated analytes and biological regions of interest, the interpretation of the PCA scores and loadings can be non-trivial. The loadings often containing negative peaks in the PCA-derived pseudo-spectra, which are difficult to ascribe to underlying tissue biology. Herein, we have utilized extended similarity indices to streamline the interpretation of imaging mass spectrometry data. This novel workflow uses PCA as a pixel-selection method to parse out the most and least correlated pixels, which are then compared using the extended similarity indices. The extended similarity indices complement PCA by removing all non-physical artifacts and streamlining the interpretation of large volumes of IMS spectra simultaneously. The linear complexity, O(N), of these indices suggests that large imaging mass spectrometry datasets can be analyzed in a 1:1 scale of time and space with respect to the size of the input data. The extended similarity indices algorithmic workflow is exemplified here by identifying discrete biological regions of mouse brain tissue.

11.
Mol Inform ; 42(7): e2300056, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37202375

RESUMO

Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly. The goal of this study is to show the applicability of the n-ary indices to quantify the structure-activity landscapes of large compound data sets using different types of structural representation rapidly and efficiently. We also discuss how a recently introduced medoid algorithm provides the foundation to finding optimum correlations between similarity measures and structure-activity rankings. The applicability of the n-ary indices and the medoid algorithm is shown by analyzing the activity landscape of 10 compound data sets with pharmaceutical relevance using three fingerprints of different designs, 16 extended similarity indices, and 11 coincidence thresholds.


Assuntos
Algoritmos , Descoberta de Drogas , Relação Estrutura-Atividade , Aprendizado de Máquina
12.
Phys Chem Chem Phys ; 25(19): 13611-13622, 2023 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-37144347

RESUMO

The hard/soft acid/base (HSAB) principle is a cornerstone in our understanding of chemical reactivity preferences. Motivated by the success of the original ("global") version of this rule, a "local" counterpart was readily proposed to account for regioselectivity preferences, in particular, in ambident reactions. However, ample experimental evidence indicates that the local HSAB principle often fails to provide meaningful predictions. Here we examine the assumptions behind the standard proof of the local HSAB rule, showing that it is based on a flawed premise. By solving this issue, we show that it is critical to consider not only the charge transferred between the different reacting centers but also the charge reorganization within the non-reacting parts of the molecule. We propose different reorganization models and derive the corresponding regioselectivity rules for each.

13.
J Phys Chem B ; 127(11): 2546-2551, 2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-36917810

RESUMO

We present a first-principles approach for the calculation of solvation energies and enthalpies with respect to different ion pair combinations in various solvents. The method relies on the conceptual density functional theory (DFT) of solvation, from which detailed expressions for the solvation energies can be derived. In addition to fast and straightforward gas phase calculations, we also study the influence of modified chemical reactivity descriptors in terms of electronic perturbations. The corresponding phenomenological changes in molecular energy levels can be interpreted as the influence of continuum solvents. Our approach shows that the introduction of these modified expressions is essential for a quantitative agreement between the calculated and the experimental results.

14.
J Comput Chem ; 44(5): 697-709, 2023 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-36440947

RESUMO

Fanpy is a free and open-source Python library for developing and testing multideterminant wavefunctions and related ab initio methods in electronic structure theory. The main use of Fanpy is to quickly prototype new methods by making it easier to convert the mathematical formulation of a new wavefunction ansätze to a working implementation. Fanpy is designed based on our recently introduced Flexible Ansatz for N-electron Configuration Interaction (FANCI) framework, where multideterminant wavefunctions are represented by their overlaps with Slater determinants of orthonormal spin-orbitals. In the simplest case, a new wavefunction ansatz can be implemented by simply writing a function for evaluating its overlap with an arbitrary Slater determinant. Fanpy is modular in both implementation and theory: the wavefunction model, the system's Hamiltonian, and the choice of objective function are all independent modules. This modular structure makes it easy for users to mix and match different methods and for developers to quickly explore new ideas. Fanpy is written purely in Python with standard dependencies, making it accessible for various operating systems. In addition, it adheres to principles of modern software development, including comprehensive documentation, extensive testing, quality assurance, and continuous integration and delivery protocols. This article is considered to be the official release notes for the Fanpy library.


Assuntos
Teoria Quântica , Software , Elétrons
15.
J Cheminform ; 14(1): 82, 2022 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-36461094

RESUMO

We report the main conclusions of the first Chemoinformatics and Artificial Intelligence Colloquium, Mexico City, June 15-17, 2022. Fifteen lectures were presented during a virtual public event with speakers from industry, academia, and non-for-profit organizations. Twelve hundred and ninety students and academics from more than 60 countries. During the meeting, applications, challenges, and opportunities in drug discovery, de novo drug design, ADME-Tox (absorption, distribution, metabolism, excretion and toxicity) property predictions, organic chemistry, peptides, and antibiotic resistance were discussed. The program along with the recordings of all sessions are freely available at https://www.difacquim.com/english/events/2022-colloquium/ .

16.
Phys Chem Chem Phys ; 24(46): 28314-28324, 2022 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-36383178

RESUMO

We present explainable machine learning approaches for the accurate prediction and understanding of solvation free energies, enthalpies, and entropies for different salts in various protic and aprotic solvents. As key input features, we use fundamental contributions from the conceptual density functional theory (DFT) of solutions. The most accurate models with the highest prediction accuracy for the experimental validation data set are decision tree-based approaches such as extreme gradient boosting and extra trees, which highlight the non-linear influence of feature values on target predictions. The detailed assessment of the importance of features in terms of Gini importance criteria as well as Shapley Additive Explanations (SHAP) and permutation and reduction approaches underlines the prominent role of anion and cation solvation effects in combination with fundamental electronic properties of the solvents. These results are reasonably consistent with previous assumptions and provide a solid rationale for more recent theoretical approaches.


Assuntos
Eletrônica , Aprendizado de Máquina , Entropia , Sais , Solventes
17.
J Phys Chem B ; 126(43): 8864-8872, 2022 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-36269164

RESUMO

We demonstrate the utility of basic chemical principles like the "|Δµ| big is good" (DMB) rule for the study of solvation interactions between distinct solutes such as ions and solvents. The corresponding approach allows us to define relevant criteria for maximum solvation energies of ion pairs in different solvents in terms of electronegativities and chemical hardnesses. Our findings reveal that the DMB principle culminates into the strong and weak acids and bases concept as recently derived for specific ion effects in various solvents. The further application of the DMB approach highlights a similar condition for the chemical hardnesses with a reminiscence to the hard/soft acids and bases principle. Comparable conclusions can also be drawn with regard to the change of the solvent. We show that favorable solvent interactions are mainly driven by low chemical hardnesses as well as high electronegativity differences between the ions and the solvent. Our findings highlight that solvation interactions are governed by basic chemical principles, which demonstrates the close similarity between solvation mechanisms and chemical reactions.


Assuntos
Termodinâmica , Solventes , Soluções , Íons
18.
J Chem Phys ; 157(15): 156101, 2022 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-36272807

RESUMO

We show that the "|Δµ| big is good" principle holds at temperatures above absolute zero (the so-called "finite-T regime"). We also provide the first conditions hinting at the validity of this reactivity rule in cases where the chemical reactions involved have different signs in their chemical potential variations.

19.
Phys Chem Chem Phys ; 24(37): 22477-22486, 2022 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-36106477

RESUMO

We present a new classification scheme for amino acids and nucleobases based on the electronic properties of the individual molecules. Using chemical reactivity indices such as electronegativity, electrophilicity, and chemical hardness, we can identify similarities and differences between each class of amino acids and nucleobases. Notable differences emerge in particular with regard to high, neutral or low electronegativity as well as different combinations of chemical hardness. Our approach allows us to relate these insights to the properties of the side groups in terms of a unique reference scheme. We further show that hydrophobic differences between amino acids are rather negligible in the context of electronic properties. Our classification scheme also rationalizes the occurrence of distinct stable nucleobase pairs and clearly emphasizes certain differences between individual molecules. The stability and abundant occurrence of Watson-Crick nucleobase pairs is further discussed in the context of the minimum electrophilicity principle.


Assuntos
Aminoácidos , Eletrônica , Pareamento de Bases , Interações Hidrofóbicas e Hidrofílicas
20.
Front Chem ; 10: 929464, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35936089

RESUMO

In the first paper of this series, the authors derived an expression for the interaction energy between two reagents in terms of the chemical reactivity indicators that can be derived from density functional perturbation theory. While negative interaction energies can explain reactivity, reactivity is often more simply explained using the "|dµ| big is good" rule or the maximum hardness principle. Expressions for the change in chemical potential (µ) and hardness when two reagents interact are derived. A partial justification for the maximum hardness principle is that the terms that appear in the interaction energy expression often reappear in the expression for the interaction hardness, but with opposite sign.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...