Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
Add more filters










Publication year range
1.
Nat Chem ; 16(5): 727-734, 2024 May.
Article in English | MEDLINE | ID: mdl-38454071

ABSTRACT

Atomistic simulation has a broad range of applications from drug design to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive ab initio simulations. For this reason, chemistry and materials science would greatly benefit from a general reactive MLIP, that is, an MLIP that is applicable to a broad range of reactive chemistry without the need for refitting. Here we develop a general reactive MLIP (ANI-1xnr) through automated sampling of condensed-phase reactions. ANI-1xnr is then applied to study five distinct systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. In all studies, ANI-1xnr closely matches experiment (when available) and/or previous studies using traditional model chemistry methods. As such, ANI-1xnr proves to be a highly general reactive MLIP for C, H, N and O elements in the condensed phase, enabling high-throughput in silico reactive chemistry experimentation.

2.
J Chem Theory Comput ; 20(3): 1274-1281, 2024 Feb 13.
Article in English | MEDLINE | ID: mdl-38307009

ABSTRACT

Methodologies for training machine learning potentials (MLPs) with quantum-mechanical simulation data have recently seen tremendous progress. Experimental data have a very different character than simulated data, and most MLP training procedures cannot be easily adapted to incorporate both types of data into the training process. We investigate a training procedure based on iterative Boltzmann inversion that produces a pair potential correction to an existing MLP using equilibrium radial distribution function data. By applying these corrections to an MLP for pure aluminum based on density functional theory, we observe that the resulting model largely addresses previous overstructuring in the melt phase. Interestingly, the corrected MLP also exhibits improved performance in predicting experimental diffusion constants, which are not included in the training procedure. The presented method does not require autodifferentiating through a molecular dynamics solver and does not make assumptions about the MLP architecture. Our results suggest a practical framework for incorporating experimental data into machine learning models to improve the accuracy of molecular dynamics simulations.

3.
J Chem Theory Comput ; 20(2): 891-901, 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38168674

ABSTRACT

A light-matter hybrid quasiparticle, called a polariton, is formed when molecules are strongly coupled to an optical cavity. Recent experiments have shown that polariton chemistry can manipulate chemical reactions. Polariton chemistry is a collective phenomenon, and its effects increase with the number of molecules in a cavity. However, simulating an ensemble of molecules in the excited state coupled to a cavity mode is theoretically and computationally challenging. Recent advances in machine learning (ML) techniques have shown promising capabilities in modeling ground-state chemical systems. This work presents a general protocol to predict excited-state properties, such as energies, transition dipoles, and nonadiabatic coupling vectors with the hierarchically interacting particle neural network. ML predictions are then applied to compute the potential energy surfaces and electronic spectra of a prototype azomethane molecule in the collective coupling scenario. These computational tools provide a much-needed framework to model and understand many molecules' emerging excited-state polariton chemistry.

4.
BMC Bioinformatics ; 24(1): 441, 2023 Nov 22.
Article in English | MEDLINE | ID: mdl-37990143

ABSTRACT

BACKGROUND: Correlation metrics are widely utilized in genomics analysis and often implemented with little regard to assumptions of normality, homoscedasticity, and independence of values. This is especially true when comparing values between replicated sequencing experiments that probe chromatin accessibility, such as assays for transposase-accessible chromatin via sequencing (ATAC-seq). Such data can possess several regions across the human genome with little to no sequencing depth and are thus non-normal with a large portion of zero values. Despite distributed use in the epigenomics field, few studies have evaluated and benchmarked how correlation and association statistics behave across ATAC-seq experiments with known differences or the effects of removing specific outliers from the data. Here, we developed a computational simulation of ATAC-seq data to elucidate the behavior of correlation statistics and to compare their accuracy under set conditions of reproducibility. RESULTS: Using these simulations, we monitored the behavior of several correlation statistics, including the Pearson's R and Spearman's [Formula: see text] coefficients as well as Kendall's [Formula: see text] and Top-Down correlation. We also test the behavior of association measures, including the coefficient of determination R[Formula: see text], Kendall's W, and normalized mutual information. Our experiments reveal an insensitivity of most statistics, including Spearman's [Formula: see text], Kendall's [Formula: see text], and Kendall's W, to increasing differences between simulated ATAC-seq replicates. The removal of co-zeros (regions lacking mapped sequenced reads) between simulated experiments greatly improves the estimates of correlation and association. After removing co-zeros, the R[Formula: see text] coefficient and normalized mutual information display the best performance, having a closer one-to-one relationship with the known portion of shared, enhanced loci between simulated replicates. When comparing values between experimental ATAC-seq data using a random forest model, mutual information best predicts ATAC-seq replicate relationships. CONCLUSIONS: Collectively, this study demonstrates how measures of correlation and association can behave in epigenomics experiments. We provide improved strategies for quantifying relationships in these increasingly prevalent and important chromatin accessibility assays.


Subject(s)
Chromatin , High-Throughput Nucleotide Sequencing , Humans , Chromatin/genetics , Reproducibility of Results , Chromatin Immunoprecipitation Sequencing , Transposases/genetics , Sequence Analysis, DNA
5.
J Chem Phys ; 159(11)2023 Sep 21.
Article in English | MEDLINE | ID: mdl-37712780

ABSTRACT

Catalyzed by enormous success in the industrial sector, many research programs have been exploring data-driven, machine learning approaches. Performance can be poor when the model is extrapolated to new regions of chemical space, e.g., new bonding types, new many-body interactions. Another important limitation is the spatial locality assumption in model architecture, and this limitation cannot be overcome with larger or more diverse datasets. The outlined challenges are primarily associated with the lack of electronic structure information in surrogate models such as interatomic potentials. Given the fast development of machine learning and computational chemistry methods, we expect some limitations of surrogate models to be addressed in the near future; nevertheless spatial locality assumption will likely remain a limiting factor for their transferability. Here, we suggest focusing on an equally important effort-design of physics-informed models that leverage the domain knowledge and employ machine learning only as a corrective tool. In the context of material science, we will focus on semi-empirical quantum mechanics, using machine learning to predict corrections to the reduced-order Hamiltonian model parameters. The resulting models are broadly applicable, retain the speed of semiempirical chemistry, and frequently achieve accuracy on par with much more expensive ab initio calculations. These early results indicate that future work, in which machine learning and quantum chemistry methods are developed jointly, may provide the best of all worlds for chemistry applications that demand both high accuracy and high numerical efficiency.

6.
Sci Rep ; 13(1): 16262, 2023 Sep 27.
Article in English | MEDLINE | ID: mdl-37758757

ABSTRACT

Throughout computational science, there is a growing need to utilize the continual improvements in raw computational horsepower to achieve greater physical fidelity through scale-bridging over brute-force increases in the number of mesh elements. For instance, quantitative predictions of transport in nanoporous media, critical to hydrocarbon extraction from tight shale formations, are impossible without accounting for molecular-level interactions. Similarly, inertial confinement fusion simulations rely on numerical diffusion to simulate molecular effects such as non-local transport and mixing without truly accounting for molecular interactions. With these two disparate applications in mind, we develop a novel capability which uses an active learning approach to optimize the use of local fine-scale simulations for informing coarse-scale hydrodynamics. Our approach addresses three challenges: forecasting continuum coarse-scale trajectory to speculatively execute new fine-scale molecular dynamics calculations, dynamically updating coarse-scale from fine-scale calculations, and quantifying uncertainty in neural network models.

7.
PLoS Comput Biol ; 19(6): e1011075, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37289841

ABSTRACT

Interactions between stressed organisms and their microbiome environments may provide new routes for understanding and controlling biological systems. However, microbiomes are a form of high-dimensional data, with thousands of taxa present in any given sample, which makes untangling the interaction between an organism and its microbial environment a challenge. Here we apply Latent Dirichlet Allocation (LDA), a technique for language modeling, which decomposes the microbial communities into a set of topics (non-mutually-exclusive sub-communities) that compactly represent the distribution of full communities. LDA provides a lens into the microbiome at broad and fine-grained taxonomic levels, which we show on two datasets. In the first dataset, from the literature, we show how LDA topics succinctly recapitulate many results from a previous study on diseased coral species. We then apply LDA to a new dataset of maize soil microbiomes under drought, and find a large number of significant associations between the microbiome topics and plant traits as well as associations between the microbiome and the experimental factors, e.g. watering level. This yields new information on the plant-microbial interactions in maize and shows that LDA technique is useful for studying the coupling between microbiomes and stressed organisms.


Subject(s)
Microbiota , Microbial Interactions , Phenotype
8.
J Chem Theory Comput ; 19(11): 3209-3222, 2023 Jun 13.
Article in English | MEDLINE | ID: mdl-37163680

ABSTRACT

Extended Lagrangian Born-Oppenheimer molecular dynamics (XL-BOMD) in its most recent shadow potential energy version has been implemented in the semiempirical PyTorch-based software PySeQM. The implementation includes finite electronic temperatures, canonical density matrix perturbation theory, and an adaptive Krylov subspace approximation for the integration of the electronic equations of motion within the XL-BOMB approach (KSA-XL-BOMD). The PyTorch implementation leverages the use of GPU and machine learning hardware accelerators for the simulations. The new XL-BOMD formulation allows studying more challenging chemical systems with charge instabilities and low electronic energy gaps. The current public release of PySeQM continues our development of modular architecture for large-scale simulations employing semi-empirical quantum-mechanical treatment. Applied to molecular dynamics, simulation of 840 carbon atoms, one integration time step executes in 4 s on a single Nvidia RTX A6000 GPU.

9.
J Chem Phys ; 158(18)2023 May 14.
Article in English | MEDLINE | ID: mdl-37158328

ABSTRACT

Atomistic machine learning focuses on the creation of models that obey fundamental symmetries of atomistic configurations, such as permutation, translation, and rotation invariances. In many of these schemes, translation and rotation invariance are achieved by building on scalar invariants, e.g., distances between atom pairs. There is growing interest in molecular representations that work internally with higher rank rotational tensors, e.g., vector displacements between atoms, and tensor products thereof. Here, we present a framework for extending the Hierarchically Interacting Particle Neural Network (HIP-NN) with Tensor Sensitivity information (HIP-NN-TS) from each local atomic environment. Crucially, the method employs a weight tying strategy that allows direct incorporation of many-body information while adding very few model parameters. We show that HIP-NN-TS is more accurate than HIP-NN, with negligible increase in parameter count, for several datasets and network sizes. As the dataset becomes more complex, tensor sensitivities provide greater improvements to model accuracy. In particular, HIP-NN-TS achieves a record mean absolute error of 0.927 kcalmol for conformational energy variation on the challenging COMP6 benchmark, which includes a broad set of organic molecules. We also compare the computational performance of HIP-NN-TS to HIP-NN and other models in the literature.

10.
J Phys Chem A ; 127(17): 3768-3778, 2023 May 04.
Article in English | MEDLINE | ID: mdl-37078657

ABSTRACT

Highly energetic electron-hole pairs (hot carriers) formed from plasmon decay in metallic nanostructures promise sustainable pathways for energy-harvesting devices. However, efficient collection before thermalization remains an obstacle for realization of their full energy generating potential. Addressing this challenge requires detailed understanding of physical processes from plasmon excitation in the metal to their collection in a molecule or a semiconductor, where atomistic theoretical investigation may be particularly beneficial. Unfortunately, first-principles theoretical modeling of these processes is extremely costly, preventing a detailed analysis over a large number of potential nanostructures and limiting the analysis to systems with a few 100s of atoms. Recent advances in machine learned interatomic potentials suggest that dynamics can be accelerated with surrogate models which replace the full solution of the Schrödinger Equation. Here, we modify an existing neural network, Hierarchically Interacting Particle Neural Network (HIP-NN), to predict plasmon dynamics in Ag nanoparticles. The model takes as a minimum as three time steps of the reference real-time time-dependent density functional theory (rt-TDDFT) calculated charges as history and predicts trajectories for 5 fs in great agreement with the reference simulation. Further, we show that a multistep training approach in which the loss function includes errors from future time-step predictions can stabilize the model predictions for the entire simulated trajectory (∼25 fs). This extends the model's capability to accurately predict plasmon dynamics in large nanoparticles of up to 561 atoms, not present in the training data set. More importantly, with machine learning models on GPUs, we gain a speed-up factor of ∼103 as compared with the rt-TDDFT calculations when predicting important physical quantities such as dynamic dipole moments in Ag55 and a factor of ∼104 for extended nanoparticles that are 10 times larger. This underscores the promise of future machine learning accelerated electron/nuclear dynamics simulations for understanding fundamental properties of plasmon-driven hot carrier devices.

11.
Nat Comput Sci ; 3(3): 230-239, 2023 Mar.
Article in English | MEDLINE | ID: mdl-38177878

ABSTRACT

Machine learning (ML) models, if trained to data sets of high-fidelity quantum simulations, produce accurate and efficient interatomic potentials. Active learning (AL) is a powerful tool to iteratively generate diverse data sets. In this approach, the ML model provides an uncertainty estimate along with its prediction for each new atomic configuration. If the uncertainty estimate passes a certain threshold, then the configuration is included in the data set. Here we develop a strategy to more rapidly discover configurations that meaningfully augment the training data set. The approach, uncertainty-driven dynamics for active learning (UDD-AL), modifies the potential energy surface used in molecular dynamics simulations to favor regions of configuration space for which there is large model uncertainty. The performance of UDD-AL is demonstrated for two AL tasks: sampling the conformational space of glycine and sampling the promotion of proton transfer in acetylacetone. The method is shown to efficiently explore the chemically relevant configuration space, which may be inaccessible using regular dynamical sampling at target temperature conditions.


Subject(s)
Fabaceae , Uncertainty , Glycine , Machine Learning , Molecular Dynamics Simulation
12.
Sci Data ; 9(1): 579, 2022 Oct 03.
Article in English | MEDLINE | ID: mdl-36192410

ABSTRACT

Physical processes that occur within porous materials have wide-ranging applications including - but not limited to - carbon sequestration, battery technology, membranes, oil and gas, geothermal energy, nuclear waste disposal, water resource management. The equations that describe these physical processes have been studied extensively; however, approximating them numerically requires immense computational resources due to the complex behavior that arises from the geometrically-intricate solid boundary conditions in porous materials. Here, we introduce a new dataset of unprecedented scale and breadth, DRP-372: a catalog of 3D geometries, simulation results, and structural properties of samples hosted on the Digital Rocks Portal. The dataset includes 1736 flow and electrical simulation results on 217 samples, which required more than 500 core years of computation. This data can be used for many purposes, such as constructing empirical models, validating new simulation codes, and developing machine learning algorithms that closely match the extensive purely-physical simulation. This article offers a detailed description of the contents of the dataset including the data collection, simulation schemes, and data validation.

13.
Proc Natl Acad Sci U S A ; 119(27): e2120333119, 2022 Jul 05.
Article in English | MEDLINE | ID: mdl-35776544

ABSTRACT

Conventional machine-learning (ML) models in computational chemistry learn to directly predict molecular properties using quantum chemistry only for reference data. While these heuristic ML methods show quantum-level accuracy with speeds several orders of magnitude faster than traditional quantum chemistry methods, they suffer from poor extensibility and transferability; i.e., their accuracy degrades on large or new chemical systems. Incorporating quantum chemistry frameworks into the ML models directly solves this problem. Here we take the structure of semiempirical quantum mechanics (SEQM) methods to construct dynamically responsive Hamiltonians. SEQM methods use empirical parameters fitted to experimental properties to construct reduced-order Hamiltonians, facilitating much faster calculations than ab initio methods but with compromised accuracy. By replacing these static parameters with machine-learned dynamic values inferred from the local environment, we greatly improve the accuracy of the SEQM methods. Trained on molecular energies and atomic forces, these dynamically generated Hamiltonian parameters show a strong correlation with atomic hybridization and bonding. Trained with only about 60,000 small organic molecular conformers, the resulting model retains interpretability, extensibility, and transferability when testing on much larger chemical systems and predicting various molecular properties. Overall, this work demonstrates the virtues of incorporating physics-based descriptions with ML to develop models that are simultaneously accurate, transferable, and interpretable.

14.
Phys Rev E ; 105(4-2): 045301, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35590626

ABSTRACT

We propose a data-driven method to describe consistent equations of state (EOS) for arbitrary systems. Complex EOS are traditionally obtained by fitting suitable analytical expressions to thermophysical data. A key aspect of EOS is that the relationships between state variables are given by derivatives of the system free energy. In this work, we model the free energy with an artificial neural network and utilize automatic differentiation to directly learn the derivatives of the free energy. We demonstrate this approach on two different systems, the analytic van der Waals EOS and published data for the Lennard-Jones fluid, and we show that it is advantageous over direct learning of thermodynamic properties (i.e., not as derivatives of the free energy but as independent properties), in terms of both accuracy and the exact preservation of the Maxwell relations. Furthermore, the method implicitly provides the free energy of a system without explicit integration.

15.
Nat Rev Chem ; 6(9): 653-672, 2022 Sep.
Article in English | MEDLINE | ID: mdl-37117713

ABSTRACT

Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.

16.
J Chem Inf Model ; 61(8): 3846-3857, 2021 08 23.
Article in English | MEDLINE | ID: mdl-34347460

ABSTRACT

Machine learning (ML) plays a growing role in the design and discovery of chemicals, aiming to reduce the need to perform expensive experiments and simulations. ML for such applications is promising but difficult, as models must generalize to vast chemical spaces from small training sets and must have reliable uncertainty quantification metrics to identify and prioritize unexplored regions. Ab initio computational chemistry and chemical intuition alike often take advantage of differences between chemical conditions, rather than their absolute structure or state, to generate more reliable results. We have developed an analogous comparison-based approach for ML regression, called pairwise difference regression (PADRE), which is applicable to arbitrary underlying learning models and operates on pairs of input data points. During training, the model learns to predict differences between all possible pairs of input points. During prediction, the test points are paired with all training set points, giving rise to a set of predictions that can be treated as a distribution of which the mean is treated as a final prediction and the dispersion is treated as an uncertainty measure. Pairwise difference regression was shown to reliably improve the performance of the random forest algorithm across five chemical ML tasks. Additionally, the pair-derived dispersion is both well correlated with model error and performs well in active learning. We also show that this method is competitive with state-of-the-art neural network techniques. Thus, pairwise difference regression is a promising tool for candidate selection algorithms used in chemical discovery.


Subject(s)
Algorithms , Machine Learning , Neural Networks, Computer , Uncertainty
17.
Chem Sci ; 12(30): 10207-10217, 2021 Aug 04.
Article in English | MEDLINE | ID: mdl-34447529

ABSTRACT

Phosphorescence is commonly utilized for applications including light-emitting diodes and photovoltaics. Machine learning (ML) approaches trained on ab initio datasets of singlet-triplet energy gaps may expedite the discovery of phosphorescent compounds with the desired emission energies. However, we show that standard ML approaches for modeling potential energy surfaces inaccurately predict singlet-triplet energy gaps due to the failure to account for spatial localities of spin transitions. To solve this, we introduce localization layers in a neural network model that weight atomic contributions to the energy gap, thereby allowing the model to isolate the most determinative chemical environments. Trained on the singlet-triplet energy gaps of organic molecules, we apply our method to an out-of-sample test set of large phosphorescent compounds and demonstrate the substantial improvement that localization layers have on predicting their phosphorescence energies. Remarkably, the inferred localization weights have a strong relationship with the ab initio spin density of the singlet-triplet transition, and thus infer localities of the molecule that determine the spin transition, despite the fact that no direct electronic information was provided during training. The use of localization layers is expected to improve the modeling of many localized, non-extensive phenomena and could be implemented in any atom-centered neural network model.

18.
J Chem Phys ; 154(24): 244108, 2021 Jun 28.
Article in English | MEDLINE | ID: mdl-34241371

ABSTRACT

The Hückel Hamiltonian is an incredibly simple tight-binding model known for its ability to capture qualitative physics phenomena arising from electron interactions in molecules and materials. Part of its simplicity arises from using only two types of empirically fit physics-motivated parameters: the first describes the orbital energies on each atom and the second describes electronic interactions and bonding between atoms. By replacing these empirical parameters with machine-learned dynamic values, we vastly increase the accuracy of the extended Hückel model. The dynamic values are generated with a deep neural network, which is trained to reproduce orbital energies and densities derived from density functional theory. The resulting model retains interpretability, while the deep neural network parameterization is smooth and accurate and reproduces insightful features of the original empirical parameterization. Overall, this work shows the promise of utilizing machine learning to formulate simple, accurate, and dynamically parameterized physics models.

19.
J Phys Chem Lett ; 12(26): 6227-6243, 2021 Jul 08.
Article in English | MEDLINE | ID: mdl-34196559

ABSTRACT

Machine learning (ML) is quickly becoming a premier tool for modeling chemical processes and materials. ML-based force fields, trained on large data sets of high-quality electron structure calculations, are particularly attractive due their unique combination of computational efficiency and physical accuracy. This Perspective summarizes some recent advances in the development of neural network-based interatomic potentials. Designing high-quality training data sets is crucial to overall model accuracy. One strategy is active learning, in which new data are automatically collected for atomic configurations that produce large ML uncertainties. Another strategy is to use the highest levels of quantum theory possible. Transfer learning allows training to a data set of mixed fidelity. A model initially trained to a large data set of density functional theory calculations can be significantly improved by retraining to a relatively small data set of expensive coupled cluster theory calculations. These advances are exemplified by applications to molecules and materials.

20.
Nat Commun ; 12(1): 1257, 2021 Feb 23.
Article in English | MEDLINE | ID: mdl-33623036

ABSTRACT

Machine learning, trained on quantum mechanics (QM) calculations, is a powerful tool for modeling potential energy surfaces. A critical factor is the quality and diversity of the training dataset. Here we present a highly automated approach to dataset construction and demonstrate the method by building a potential for elemental aluminum (ANI-Al). In our active learning scheme, the ML potential under development is used to drive non-equilibrium molecular dynamics simulations with time-varying applied temperatures. Whenever a configuration is reached for which the ML uncertainty is large, new QM data is collected. The ML model is periodically retrained on all available QM data. The final ANI-Al potential makes very accurate predictions of radial distribution function in melt, liquid-solid coexistence curve, and crystal properties such as defect energies and barriers. We perform a 1.3M atom shock simulation and show that ANI-Al force predictions shine in their agreement with new reference DFT calculations.

SELECTION OF CITATIONS
SEARCH DETAIL
...