Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 131
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
ACS Cent Sci ; 10(3): 637-648, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38559300

RESUMO

Data-driven techniques are increasingly used to replace electronic-structure calculations of matter. In this context, a relevant question is whether machine learning (ML) should be applied directly to predict the desired properties or combined explicitly with physically grounded operations. We present an example of an integrated modeling approach in which a symmetry-adapted ML model of an effective Hamiltonian is trained to reproduce electronic excitations from a quantum-mechanical calculation. The resulting model can make predictions for molecules that are much larger and more complex than those on which it is trained and allows for dramatic computational savings by indirectly targeting the outputs of well-converged calculations while using a parametrization corresponding to a minimal atom-centered basis. These results emphasize the merits of intertwining data-driven techniques with physical approximations, improving the transferability and interpretability of ML models without affecting their accuracy and computational efficiency and providing a blueprint for developing ML-augmented electronic-structure methods.

2.
Chem Mater ; 36(3): 1482-1496, 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38370276

RESUMO

Lithium ortho-thiophosphate (Li3PS4) has emerged as a promising candidate for solid-state electrolyte batteries, thanks to its highly conductive phases, cheap components, and large electrochemical stability range. Nonetheless, the microscopic mechanisms of Li-ion transport in Li3PS4 are far from being fully understood, the role of PS4 dynamics in charge transport still being controversial. In this work, we build machine learning potentials targeting state-of-the-art DFT references (PBEsol, r2SCAN, and PBE0) to tackle this problem in all known phases of Li3PS4 (α, ß, and γ), for large system sizes and time scales. We discuss the physical origin of the observed superionic behavior of Li3PS4: the activation of PS4 flipping drives a structural transition to a highly conductive phase, characterized by an increase in Li-site availability and by a drastic reduction in the activation energy of Li-ion diffusion. We also rule out any paddle-wheel effects of PS4 tetrahedra in the superionic phases-previously claimed to enhance Li-ion diffusion-due to the orders-of-magnitude difference between the rate of PS4 flips and Li-ion hops at all temperatures below melting. We finally elucidate the role of interionic dynamical correlations in charge transport, by highlighting the failure of the Nernst-Einstein approximation to estimate the electrical conductivity. Our results show a strong dependence on the target DFT reference, with PBE0 yielding the best quantitative agreement with experimental measurements not only for the electronic band gap but also for the electrical conductivity of ß- and α-Li3PS4.

3.
Digit Discov ; 3(1): 23-33, 2024 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-38239898

RESUMO

In light of the pressing need for practical materials and molecular solutions to renewable energy and health problems, to name just two examples, one wonders how to accelerate research and development in the chemical sciences, so as to address the time it takes to bring materials from initial discovery to commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative and accelerating impact on many if not most, technological domains. To shed light on these questions, the authors and participants gathered in person for the ASLLA Symposium on the theme of 'Accelerated Chemical Science with AI' at Gangneung, Republic of Korea. We present the findings, ideas, comments, and often contentious opinions expressed during four panel discussions related to the respective general topics: 'Data', 'New applications', 'Machine learning algorithms', and 'Education'. All discussions were recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers, educators in higher education, and academic bodies such as associations, publishers, librarians, and companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.

4.
J Chem Theory Comput ; 19(22): 8020-8031, 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-37948446

RESUMO

Machine learning (ML) models for molecules and materials commonly rely on a decomposition of the global target quantity into local, atom-centered contributions. This approach is convenient from a computational perspective, enabling large-scale ML-driven simulations with a linear-scaling cost and also allows for the identification and posthoc interpretation of contributions from individual chemical environments and motifs to complicated macroscopic properties. However, even though practical justifications exist for the local decomposition, only the global quantity is rigorously defined. Thus, when the atom-centered contributions are used, their sensitivity to the training strategy or the model architecture should be carefully considered. To this end, we introduce a quantitative metric, which we call the local prediction rigidity (LPR), that allows one to assess how robust the locally decomposed predictions of ML models are. We investigate the dependence of the LPR on the aspects of model training, particularly the composition of training data set, for a range of different problems from simple toy models to real chemical systems. We present strategies to systematically enhance the LPR, which can be used to improve the robustness, interpretability, and transferability of atomistic ML models.

5.
J Phys Chem Lett ; 14(43): 9612-9618, 2023 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-37862712

RESUMO

One essential ingredient in many machine learning (ML) based methods for atomistic modeling of materials and molecules is the use of locality. While allowing better system-size scaling, this systematically neglects long-range (LR) effects such as electrostatic or dispersion interactions. We present an extension of the long distance equivariant (LODE) framework that can handle diverse LR interactions in a consistent way and seamlessly integrates with preexisting methods by building new sets of atom centered features. We provide a direct physical interpretation of these using the multipole expansion, which allows for simpler and more efficient implementations. The framework is applied to simple toy systems as proof of concept and a heterogeneous set of molecular dimers to push the method to its limits. By generalizing LODE to arbitrary asymptotic behaviors, we provide a coherent approach to treat arbitrary two- and many-body nonbonded interactions in the data-driven modeling of matter.

6.
J Chem Phys ; 159(6)2023 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-37551818

RESUMO

Spherical harmonics provide a smooth, orthogonal, and symmetry-adapted basis to expand functions on a sphere, and they are used routinely in physical and theoretical chemistry as well as in different fields of science and technology, from geology and atmospheric sciences to signal processing and computer graphics. More recently, they have become a key component of rotationally equivariant models in geometric machine learning, including applications to atomic-scale modeling of molecules and materials. We present an elegant and efficient algorithm for the evaluation of the real-valued spherical harmonics. Our construction features many of the desirable properties of existing schemes and allows us to compute Cartesian derivatives in a numerically stable and computationally efficient manner. To facilitate usage, we implement this algorithm in sphericart, a fast C++ library that also provides C bindings, a Python API, and a PyTorch implementation that includes a GPU kernel.

8.
Chem Sci ; 14(5): 1272-1285, 2023 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-36756329

RESUMO

Due to the subtle balance of intermolecular interactions that govern structure-property relations, predicting the stability of crystal structures formed from molecular building blocks is a highly non-trivial scientific problem. A particularly active and fruitful approach involves classifying the different combinations of interacting chemical moieties, as understanding the relative energetics of different interactions enables the design of molecular crystals and fine-tuning of their stabilities. While this is usually performed based on the empirical observation of the most commonly encountered motifs in known crystal structures, we propose to apply a combination of supervised and unsupervised machine-learning techniques to automate the construction of an extensive library of molecular building blocks. We introduce a structural descriptor tailored to the prediction of the binding (lattice) energy and apply it to a curated dataset of organic crystals, exploiting its atom-centered nature to obtain a data-driven assessment of the contribution of different chemical groups to the lattice energy of the crystal. We then interpret this library using a low-dimensional representation of the structure-energy landscape and discuss selected examples of the insights into crystal engineering that can be extracted from this analysis, providing a complete database to guide the design of molecular materials.

9.
J Chem Theory Comput ; 19(14): 4451-4460, 2023 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-36453538

RESUMO

The electron density of a molecule or material has recently received major attention as a target quantity of machine-learning models. A natural choice to construct a model that yields transferable and linear-scaling predictions is to represent the scalar field using a multicentered atomic basis analogous to that routinely used in density fitting approximations. However, the nonorthogonality of the basis poses challenges for the learning exercise, as it requires accounting for all the atomic density components at once. We devise a gradient-based approach to directly minimize the loss function of the regression problem in an optimized and highly sparse feature space. In so doing, we overcome the limitations associated with adopting an atom-centered model to learn the electron density over arbitrarily complex data sets, obtaining very accurate predictions using a comparatively small training set. The enhanced framework is tested on 32-molecule periodic cells of liquid water, presenting enough complexity to require an optimal balance between accuracy and computational efficiency. We show that starting from the predicted density a single Kohn-Sham diagonalization step can be performed to access total energy components that carry an error of just 0.1 meV/atom with respect to the reference density functional calculations. Finally, we test our method on the highly heterogeneous QM9 benchmark data set, showing that a small fraction of the training data is enough to derive ground-state total energies within chemical accuracy.

10.
Open Res Eur ; 3: 81, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38234865

RESUMO

Easy-to-use libraries such as scikit-learn have accelerated the adoption and application of machine learning (ML) workflows and data-driven methods. While many of the algorithms implemented in these libraries originated in specific scientific fields, they have gained in popularity in part because of their generalisability across multiple domains. Over the past two decades, researchers in the chemical and materials science community have put forward general-purpose machine learning methods. The deployment of these methods into workflows of other domains, however, is often burdensome due to the entanglement with domainspecific functionalities. We present the python library scikit-matter that targets domain-agnostic implementations of methods developed in the computational chemical and materials science community, following the scikit-learn API and coding guidelines to promote usability and interoperability with existing workflows.

11.
Digit Discov ; 1(6): 779-789, 2022 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-36561986

RESUMO

Zeolites are nanoporous alumino-silicate frameworks widely used as catalysts and adsorbents. Even though millions of siliceous networks can be generated by computer-aided searches, no new hypothetical framework has yet been synthesized. The needle-in-a-haystack problem of finding promising candidates among large databases of predicted structures has intrigued materials scientists for decades; yet, most work to date on the zeolite problem has been limited to intuitive structural descriptors. Here, we tackle this problem through a rigorous data science scheme-the "Zeolite Sorting Hat"-that exploits interatomic correlations to discriminate between real and hypothetical zeolites and to partition real zeolites into compositional classes that guide synthetic strategies for a given hypothetical framework. We find that, regardless of the structural descriptor used by the Zeolite Sorting Hat, there remain hypothetical frameworks that are incorrectly classified as real ones, suggesting that they might be good candidates for synthesis. We seek to minimize the number of such misclassified frameworks by using as complete a structural descriptor as possible, thus focusing on truly viable synthetic targets, while discovering structural features that distinguish real and hypothetical frameworks as an output of the Zeolite Sorting Hat. Further ranking of the candidates can be achieved based on thermodynamic stability and/or their suitability for the desired applications. Based on this workflow, we propose three hypothetical frameworks differing in their molar volume range as the top targets for synthesis, each with a composition suggested by the Zeolite Sorting Hat. Finally, we analyze the behavior of the Zeolite Sorting Hat with a hierarchy of structural descriptors including intuitive descriptors reported in previous studies, finding that intuitive descriptors produce significantly more misclassified hypothetical frameworks, and that more rigorous interatomic correlations point to second-neighbor Si-O distances around 3.2-3.4 Å as the key discriminatory factor.

12.
J Chem Phys ; 157(23): 234101, 2022 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-36550032

RESUMO

Machine learning frameworks based on correlations of interatomic positions begin with a discretized description of the density of other atoms in the neighborhood of each atom in the system. Symmetry considerations support the use of spherical harmonics to expand the angular dependence of this density, but there is, as of yet, no clear rationale to choose one radial basis over another. Here, we investigate the basis that results from the solution of the Laplacian eigenvalue problem within a sphere around the atom of interest. We show that this generates a basis of controllable smoothness within the sphere (in the same sense as plane waves provide a basis with controllable smoothness for a problem with periodic boundaries) and that a tensor product of Laplacian eigenstates also provides a smooth basis for expanding any higher-order correlation of the atomic density within the appropriate hypersphere. We consider several unsupervised metrics of the quality of a basis for a given dataset and show that the Laplacian eigenstate basis has a performance that is much better than some widely used basis sets and competitive with data-driven bases that numerically optimize each metric. Finally, we investigate the role of the basis in building models of the potential energy. In these tests, we find that a combination of the Laplacian eigenstate basis and target-oriented heuristics leads to equal or improved regression performance when compared to both heuristic and data-driven bases in the literature. We conclude that the smoothness of the basis functions is a key aspect of successful atomic density representations.

13.
J Chem Phys ; 157(17): 177101, 2022 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-36347686

RESUMO

The "quasi-constant" smooth overlap of atomic position and atom-centered symmetry function fingerprint manifolds recently discovered by Parsaeifard and Goedecker [J. Chem. Phys. 156, 034302 (2022)] are closely related to the degenerate pairs of configurations, which are known shortcomings of all low-body-order atom-density correlation representations of molecular structures. Configurations that are rigorously singular-which we demonstrate can only occur in finite, discrete sets and not as a continuous manifold-determine the complete failure of machine-learning models built on this class of descriptors. The "quasi-constant" manifolds, on the other hand, exhibit low but non-zero sensitivity to atomic displacements. As a consequence, for any such manifold, it is possible to optimize model parameters and the training set to mitigate their impact on learning even though this is often impractical and it is preferable to use descriptors that avoid both exact singularities and the associated numerical instability.

14.
J Phys Chem C Nanomater Interfaces ; 126(39): 16710-16720, 2022 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-36237276

RESUMO

Nuclear magnetic resonance (NMR) chemical shifts are a direct probe of local atomic environments and can be used to determine the structure of solid materials. However, the substantial computational cost required to predict accurate chemical shifts is a key bottleneck for NMR crystallography. We recently introduced ShiftML, a machine-learning model of chemical shifts in molecular solids, trained on minimum-energy geometries of materials composed of C, H, N, O, and S that provides rapid chemical shift predictions with density functional theory (DFT) accuracy. Here, we extend the capabilities of ShiftML to predict chemical shifts for both finite temperature structures and more chemically diverse compounds, while retaining the same speed and accuracy. For a benchmark set of 13 molecular solids, we find a root-mean-squared error of 0.47 ppm with respect to experiment for 1H shift predictions (compared to 0.35 ppm for explicit DFT calculations), while reducing the computational cost by over four orders of magnitude.

15.
J Chem Phys ; 156(20): 204115, 2022 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-35649823

RESUMO

Data-driven schemes that associate molecular and crystal structures with their microscopic properties share the need for a concise, effective description of the arrangement of their atomic constituents. Many types of models rely on descriptions of atom-centered environments, which are associated with an atomic property or with an atomic contribution to an extensive macroscopic quantity. Frameworks in this class can be understood in terms of atom-centered density correlations (ACDC), which are used as a basis for a body-ordered, symmetry-adapted expansion of the targets. Several other schemes that gather information on the relationship between neighboring atoms using "message-passing" ideas cannot be directly mapped to correlations centered around a single atom. We generalize the ACDC framework to include multi-centered information, generating representations that provide a complete linear basis to regress symmetric functions of atomic coordinates, and provide a coherent foundation to systematize our understanding of both atom-centered and message-passing and invariant and equivariant machine-learning schemes.

16.
J Chem Theory Comput ; 18(3): 1467-1479, 2022 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-35179897

RESUMO

The application of machine learning to theoretical chemistry has made it possible to combine the accuracy of quantum chemical energetics with the thorough sampling of finite-temperature fluctuations. To reach this goal, a diverse set of methods has been proposed, ranging from simple linear models to kernel regression and highly nonlinear neural networks. Here we apply two widely different approaches to the same, challenging problem: the sampling of the conformational landscape of polypeptides at finite temperature. We develop a local kernel regression (LKR) coupled with a supervised sparsity method and compare it with a more established approach based on Behler-Parrinello type neural networks. In the context of the LKR, we discuss how the supervised selection of the reference pool of environments is crucial to achieve accurate potential energy surfaces at a competitive computational cost and leverage the locality of the model to infer which chemical environments are poorly described by the DFTB baseline. We then discuss the relative merits of the two frameworks and perform Hamiltonian-reservoir replica-exchange Monte Carlo sampling and metadynamics simulations, respectively, to demonstrate that both frameworks can achieve converged and transferable sampling of the conformational landscape of complex and flexible biomolecules with comparable accuracy and computational cost.


Assuntos
Simulação de Dinâmica Molecular , Redes Neurais de Computação , Aprendizado de Máquina , Conformação Molecular , Oligopeptídeos/química
17.
J Chem Phys ; 156(1): 014115, 2022 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-34998321

RESUMO

Symmetry considerations are at the core of the major frameworks used to provide an effective mathematical representation of atomic configurations that is then used in machine-learning models to predict the properties associated with each structure. In most cases, the models rely on a description of atom-centered environments and are suitable to learn atomic properties or global observables that can be decomposed into atomic contributions. Many quantities that are relevant for quantum mechanical calculations, however-most notably the single-particle Hamiltonian matrix when written in an atomic orbital basis-are not associated with a single center, but with two (or more) atoms in the structure. We discuss a family of structural descriptors that generalize the very successful atom-centered density correlation features to the N-center case and show, in particular, how this construction can be applied to efficiently learn the matrix elements of the (effective) single-particle Hamiltonian written in an atom-centered orbital basis. These N-center features are fully equivariant-not only in terms of translations and rotations but also in terms of permutations of the indices associated with the atoms-and are suitable to construct symmetry-adapted machine-learning models of new classes of properties of molecules and materials.

20.
Sci Adv ; 7(48): eabk2341, 2021 Nov 26.
Artigo em Inglês | MEDLINE | ID: mdl-34826232

RESUMO

A prerequisite for NMR studies of organic materials is assigning each experimental chemical shift to a set of geometrically equivalent nuclei. Obtaining the assignment experimentally can be challenging and typically requires time-consuming multidimensional correlation experiments. An alternative solution for determining the assignment involves statistical analysis of experimental chemical shift databases, but no such database exists for molecular solids. Here, by combining the Cambridge Structural Database with a machine learning model of chemical shifts, we construct a statistical basis for probabilistic chemical shift assignment of organic crystals by calculating shifts for more than 200,000 compounds, enabling the probabilistic assignment of organic crystals directly from their two-dimensional chemical structure. The approach is demonstrated with the 13C and 1H assignment of 11 molecular solids with experimental shifts and benchmarked on 100 crystals using predicted shifts. The correct assignment was found among the two most probable assignments in more than 80% of cases.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...