Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
J Chem Theory Comput ; 20(6): 2445-2461, 2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38450638

ABSTRACT

The effective fragment molecular orbital (EFMO) method has been developed to predict the total energy of a very large molecular system accurately (with respect to the underlying quantum mechanical method) and efficiently by taking advantage of the locality of strong chemical interactions and employing a two-level hierarchical parallelism. The accuracy of the EFMO method is partly attributed to the accurate and robust intermolecular interaction prediction between distant fragments, in particular, the many-body polarization and dispersion effects, which require the generation of static and dynamic polarizability tensors by solving the coupled perturbed Hartree-Fock (CPHF) and time-dependent HF (TDHF) equations, respectively. Solving the CPHF and TDHF equations is the main EFMO computational bottleneck due to the inefficient (serial) and I/O-intensive implementation of the CPHF and TDHF solvers. In this work, the efficiency and scalability of the EFMO method are significantly improved with a new CPU memory-based implementation for solving the CPHF and TDHF equations that are parallelized by either message passing interface (MPI) or hybrid MPI/OpenMP. The accuracy of the EFMO method is demonstrated for both covalently bonded systems and noncovalently bound molecular clusters by systematically examining the effects of basis sets and a key distance-related cutoff parameter, Rcut. Rcut determines whether a fragment pair (dimer) is treated by the chosen ab initio method or calculated using the effective fragment potential (EFP) method (separated dimers). Decreasing the value of Rcut increases the number of separated (EFP) dimers, thereby decreasing the computational effort. It is demonstrated that excellent accuracy (<1 kcal/mol error per fragment) can be achieved when using a sufficiently large basis set with diffuse functions coupled with a small Rcut value. With the new parallel implementation, the total EFMO wall time is substantially reduced, especially with a high number of MPI ranks. Given a sufficient workload, nearly ideal strong scaling is achieved for the CPHF and TDHF parts of the calculation. For the first time, EFMO calculations with the inclusion of long-range polarization and dispersion interactions on a hydrated mesoporous silica nanoparticle with explicit water solvent molecules (more than 15k atoms) are achieved on a massively parallel supercomputer using nearly 1000 physical nodes. In addition, EFMO calculations on the carbinolamine formation step of an amine-catalyzed aldol reaction at the nanoscale with explicit solvent effects are presented.

2.
J Chem Theory Comput ; 19(20): 7031-7055, 2023 Oct 24.
Article in English | MEDLINE | ID: mdl-37793073

ABSTRACT

The primary focus of GAMESS over the last 5 years has been the development of new high-performance codes that are able to take effective and efficient advantage of the most advanced computer architectures, both CPU and accelerators. These efforts include employing density fitting and fragmentation methods to reduce the high scaling of well-correlated (e.g., coupled-cluster) methods as well as developing novel codes that can take optimal advantage of graphical processing units and other modern accelerators. Because accurate wave functions can be very complex, an important new functionality in GAMESS is the quasi-atomic orbital analysis, an unbiased approach to the understanding of covalent bonds embedded in the wave function. Best practices for the maintenance and distribution of GAMESS are also discussed.

3.
J Chem Theory Comput ; 19(13): 3798-3805, 2023 Jul 11.
Article in English | MEDLINE | ID: mdl-37343236

ABSTRACT

The performance of Fortran 2008 DO CONCURRENT (DC) relative to OpenACC and OpenMP target offloading (OTO) with different compilers is studied for the GAMESS quantum chemistry application. Specifically, DC and OTO are used to offload the Fock build, which is a computational bottleneck in most quantum chemistry codes, to GPUs. The DC Fock build performance is studied on NVIDIA A100 and V100 accelerators and compared with the OTO versions compiled by the NVIDIA HPC, IBM XL, and Cray Fortran compilers. The results show that DC can speed up the Fock build by 3.0× compared with that of the OTO model. With similar offloading efforts, DC is a compelling programming model for offloading Fortran applications to GPUs.

4.
J Chem Phys ; 158(19)2023 May 21.
Article in English | MEDLINE | ID: mdl-37184015

ABSTRACT

Multiple ERI (Electron Repulsion Integral) tensor contractions (METC) with several matrices are ubiquitous in quantum chemistry. In response theories, the contraction operation, rather than ERI computations, can be the major bottleneck, as its computational demands are proportional to the multiplicatively combined contributions of the number of excited states and the kernel pre-factors. This paper presents several high-performance strategies for METC. Optimal approaches involve either the data layout reformations of interim density and Fock matrices, the introduction of intermediate ERI quartet buffer, and loop-reordering optimization for a higher cache hit rate. The combined strategies remarkably improve the performance of the MRSF (mixed reference spin flip)-TDDFT (time-dependent density functional theory) by nearly 300%. The results of this study are not limited to the MRSF-TDDFT method and can be applied to other METC scenarios.

5.
J Chem Phys ; 158(16)2023 Apr 28.
Article in English | MEDLINE | ID: mdl-37114705

ABSTRACT

Using an OpenMP Application Programming Interface, the resolution-of-the-identity second-order Møller-Plesset perturbation (RI-MP2) method has been off-loaded onto graphical processing units (GPUs), both as a standalone method in the GAMESS electronic structure program and as an electron correlation energy component in the effective fragment molecular orbital (EFMO) framework. First, a new scheme has been proposed to maximize data digestion on GPUs that subsequently linearizes data transfer from central processing units (CPUs) to GPUs. Second, the GAMESS Fortran code has been interfaced with GPU numerical libraries (e.g., NVIDIA cuBLAS and cuSOLVER) for efficient matrix operations (e.g., matrix multiplication, matrix decomposition, and matrix inversion). The standalone GPU RI-MP2 code shows an increasing speedup of up to 7.5× using one NVIDIA V100 GPU with one IBM 42-core P9 CPU for calculations on fullerenes of increasing size from 40 to 260 carbon atoms using the 6-31G(d)/cc-pVDZ-RI basis sets. A single Summit node with six V100s can compute the RI-MP2 correlation energy of a cluster of 175 water molecules using the correlation consistent basis sets cc-pVDZ/cc-pVDZ-RI containing 4375 atomic orbitals and 14 700 auxiliary basis functions in ∼0.85 h. In the EFMO framework, the GPU RI-MP2 component shows near linear scaling for a large number of V100s when computing the energy of an 1800-atom mesoporous silica nanoparticle in a bath of 4000 water molecules. The parallel efficiencies of the GPU RI-MP2 component with 2304 and 4608 V100s are 98.0% and 96.1%, respectively.

6.
J Chem Phys ; 158(16)2023 Apr 28.
Article in English | MEDLINE | ID: mdl-37098765

ABSTRACT

Strategies for multiple-level parallelizations of quantum-mechanical calculations are discussed, with an emphasis on using groups of workers for performing parallel tasks. These parallel programming models can be used for a variety ab initio quantum chemistry approaches, including the fragment molecular orbital method and replica-exchange molecular dynamics. Strategies for efficient load balancing on problems of increasing granularity are introduced and discussed. A four-level parallelization is developed based on a multi-level hierarchical grouping, and a high parallel efficiency is achieved on the Theta supercomputer using 131 072 OpenMP threads.

7.
J Chem Theory Comput ; 19(8): 2213-2221, 2023 Apr 25.
Article in English | MEDLINE | ID: mdl-37011288

ABSTRACT

A framework to offload four-index two-electron repulsion integrals to graphical processing units (GPUs) using OpenMP is discussed. The method has been applied to the Fock build for low angular momentum s and p functions in both the restricted Hartree-Fock (RHF) and in the effective fragment molecular orbital (EFMO) framework. Benchmark calculations for the GPU code for the pure RHF method show an increasing speedup relative to the existing OpenMP CPU code in GAMESS from 1.04 to 52× for clusters of 70-569 water molecules. The parallel efficiency on 24 NVIDIA V100 GPU boards also increases when increasing the system size: from 75 to 94% for water clusters that contain 303-1120 molecules. In the EFMO framework, the GPU Fock build shows a high linear scalability up to 4608 V100s with a parallel efficiency of 96% for calculations on a solvated mesoporous silica nanoparticle system with ∼67,000 basis functions.

8.
J Phys Chem A ; 127(8): 1874-1882, 2023 Mar 02.
Article in English | MEDLINE | ID: mdl-36791340

ABSTRACT

An ab initio quantum chemical approach for the modeling of propellant degradation is presented. Using state-of-the-art bonding analysis techniques and composite methods, a series of potential degradation reactions are devised for a sample hydroxyl-terminated-polybutadiene (HTPB) type solid fuel. By applying thermochemical procedures and isodesmic reactions, accurate thermochemical quantities are obtained using a modified G3 composite method based on the resolution of the identity. The calculated heats of formation for the different structures produced presents an ∼2 kcal/mol average error when compared against experimental values.

9.
J Phys Chem A ; 125(42): 9421-9429, 2021 Oct 28.
Article in English | MEDLINE | ID: mdl-34658243

ABSTRACT

The Gaussian-3 (G3) composite approach for thermochemical properties is revisited in light of the enhanced computational efficiency and reduced memory costs by applying the resolution-of-the-identity (RI) approximation for two-electron repulsion integrals (ERIs) to the computationally demanding component methods in the G3 model: the energy and gradient computations via the second-order Møller-Plesset perturbation theory (MP2) and the energy computations using the coupled-cluster singles-doubles method augmented with noniterative triples corrections [CCSD(T)]. Efficient implementation of the RI-based methods is achieved by employing a hybrid distributed/shared memory model based on MPI and OpenMP. The new variant of the G3 composite approach based on the RI approximation is termed the RI-G3 scheme, or alternatively the PDG method. The accuracy of the new RI-G3/PDG scheme is compared to the "standard" G3 composite approach that employs the memory-expensive four-center ERIs in the MP2 and CCSD(T) calculations. Taking the computation of the heats of formation of the closed-shell molecules in the G3/99 test set as a test case, it is demonstrated that the RI approximation introduces negligible changes to the mean absolute errors relative to the standard G3 model (less than 0.1 kcal/mol), while the standard deviations remain unaltered. The efficiency and memory requirements for the RI-MP2 and RI-CCSD(T) methods are compared to the standard MP2 and CCSD(T) approaches, respectively. The hybrid MPI/OpenMP-based RI-MP2 energy plus gradient computation is found to attain a 7.5× speedup over the standard MP2 calculations. For the most demanding CCSD(T) calculations, the application of the RI approximation is found to nearly halve the memory demand, confer about a 4-5× speedup for the CCSD iterations, and reduce the computational time for the compute-intensive triples correction step by several hours.

10.
J Chem Phys ; 152(15): 154102, 2020 Apr 21.
Article in English | MEDLINE | ID: mdl-32321259

ABSTRACT

A discussion of many of the recently implemented features of GAMESS (General Atomic and Molecular Electronic Structure System) and LibCChem (the C++ CPU/GPU library associated with GAMESS) is presented. These features include fragmentation methods such as the fragment molecular orbital, effective fragment potential and effective fragment molecular orbital methods, hybrid MPI/OpenMP approaches to Hartree-Fock, and resolution of the identity second order perturbation theory. Many new coupled cluster theory methods have been implemented in GAMESS, as have multiple levels of density functional/tight binding theory. The role of accelerators, especially graphical processing units, is discussed in the context of the new features of LibCChem, as it is the associated problem of power consumption as the power of computers increases dramatically. The process by which a complex program suite such as GAMESS is maintained and developed is considered. Future developments are briefly summarized.

11.
J Chem Theory Comput ; 16(2): 1039-1054, 2020 Feb 11.
Article in English | MEDLINE | ID: mdl-31899632

ABSTRACT

The fully analytic gradient of the second-order Møller-Plesset perturbation theory (MP2) with the resolution-of-the-identity (RI) approximation in the fragment molecular orbital (FMO) framework is derived and implemented using a hybrid multilevel parallel programming model, a combination of the general distributed data interface (GDDI) and the OpenMP API. The FMO/MP2 analytic gradient contains three parts, i.e., the internal fragment component, the electrostatic potential (ESP) component, and the response terms. The RI approximation is applied to the internal fragment MP2 gradient term, whose MP2 densities and monomer MP2 Lagrangians are shared with the ESP and the response terms. The FMO/RI-MP2 analytic gradient implementation is validated against the numerical gradient (with errors ∼10-6-10-5 Hartree/Bohr) and the energy conservation in molecular dynamics (MD) simulations using NVE ensembles. The RI approximation introduces an error of ∼10-5 Hartree/Bohr with a speedup of 4.0-8.0× compared with the currently available GDDI FMO/MP2 gradient. The node linear scaling of the fragmentation framework due to multilevel parallelism is well-preserved and is demonstrated in single-point gradient calculations of large water clusters (e.g., 1120 and 2165 molecules) using 300-800 KNL compute nodes with a parallel efficiency of more than 90%.

12.
J Chem Theory Comput ; 15(10): 5252-5258, 2019 Oct 08.
Article in English | MEDLINE | ID: mdl-31509402

ABSTRACT

The general distributed data interface (GDDI) that was developed for the fragment molecular orbital (FMO) method is combined with the shared memory OpenMP parallel middleware to support a threading multilevel parallelism. First, GDDI partitions [logical] compute nodes into groups, which are statically or dynamically assigned to different fragments. A small number of processes are created on each compute node. Each process subsequently spawns multiple threads for the actual computation. The performance of the hybrid GDDI/OpenMP approach relative to the pure GDDI model was examined in terms of the FMO/RI-MP2 method; that is, the second-order Moller-Plesset (MP2) correlation energy was evaluated using the resolution-of-the-identity (RI) and the FMO approximations. The GDDI and OpenMP workload balances are handled by an arithmetic progression and a loop fusion, respectively. Other OpenMP properties, such as threadprivate or shared memory, are combined with the low memory demand of the RI two-electron integrals to enhance the performance. Benchmark calculations demonstrate that because the hybrid parallel model can make use of multiprocessor resources more efficiently than the regular distributed memory-based GDDI model, calculations for small to large water clusters containing 139-2165 molecules and an ionic liquid cluster exhibit node linear scaling and speedups of a factor of 10×.

13.
J Chem Theory Comput ; 15(4): 2254-2264, 2019 Apr 09.
Article in English | MEDLINE | ID: mdl-30811187

ABSTRACT

The four-index two-electron repulsion integral (4-2ERI) matrix is compressed using the resolution-of-the-identity (RI) approximation combined with the rank factorization approximation (RFA). The 4-2ERI is first approximated by the RI product. Then, the singular value decomposition (SVD) approximation is used to eliminate low-weighted singular vectors. The SVD RI approximation maintains the canonical form of the RI approximation and introduces a tunable compression factor. The characteristics of the SVD RI approximation along with the stochastic RI and natural auxiliary function approximation were numerically examined by applying these methods to the closed-shell second-order Møller-Plesset perturbation theory (MP2). The results show that, while the SVD RI approximation yields large errors for absolute properties (e.g., the correlation energy), it provides accurate relative properties (potential energy surface, binding energy) of the applied ab initio method (e.g., RHF, MP2).

14.
J Phys Chem A ; 121(26): 4851-4852, 2017 Jul 06.
Article in English | MEDLINE | ID: mdl-28679210
15.
Phys Chem Chem Phys ; 18(48): 33274-33281, 2016 Dec 07.
Article in English | MEDLINE | ID: mdl-27896344

ABSTRACT

The thermodynamic and kinetic controls of graphene chemistry are studied computationally using a graphene hydrogenation reaction and polyaromatic hydrocarbons to represent the graphene surface. Hydrogen atoms are concertedly chemisorped onto the surface of graphene models of different shapes (i.e., all-zigzag, all-armchair, zigzag-armchair mixed edges) and sizes (i.e., from 16-42 carbon atoms). The second-order Z-averaged perturbation theory (ZAPT2) method combined with Pople double and triple zeta basis sets are used for all calculations. It is found that both the net enthalpy change and the barrier height of graphene hydrogenation at graphene edges are lower than at their interior surfaces. While the thermodynamic product distribution is mainly determined by the remaining π-islands of functionalized graphenes (Phys. Chem. Chem. Phys., 2013, 15, 3725-3735), the kinetics of the reaction is primarily correlated with the localization of the electrostatic potential of the graphene surface.

16.
J Comput Chem ; 35(22): 1630-40, 2014 Aug 15.
Article in English | MEDLINE | ID: mdl-24935159

ABSTRACT

A comprehensive picture on the mechanism of the epoxy-phenol curing reactions is presented using the density functional theory B3LYP/ 6-31G(d,p) and simplified physical molecular models to examine all possible reaction pathways. Phenol can act as its own promoter by using an addition phenol molecule to stabilize the transition states, and thus lower the rate-limiting barriers by 27.0-48.9 kJ/mol. In the uncatalyzed reaction, an epoxy ring is opened by a phenol with an apparent barrier of about 129.6 kJ/mol. In catalyzed reaction, catalysts facilitate the epoxy ring opening prior to curing that lowers the apparent barriers by 48.9-50.6 kJ/mol. However, this can be competed in highly basic catalysts such as amine-based catalysts, where catalysts are trapped in forms of hydrogen-bonded complex with phenol. Our theoretical results predict the activation energy in the range of 79.0-80.7 kJ/mol in phosphine-based catalyzed reactions, which agrees well with the reported experimental range of 54-86 kJ/mol.

17.
Phys Chem Chem Phys ; 15(11): 3725-35, 2013 Mar 21.
Article in English | MEDLINE | ID: mdl-23388654

ABSTRACT

We present a detailed analysis of the factors influencing the formation of epoxide and ether groups in graphene nanoflakes using conventional density functional theory (DFT), the density-functional tight-binding (DFTB) method, π-Hückel theory, and graph theoretical invariants. The relative thermodynamic stability associated with the chemisorption of oxygen atoms at various positions on hexagonal graphene flakes (HGFs) of D(6h)-symmetry is determined by two factors - viz. the disruption of the π-conjugation of the HGF and the geometrical deformation of the HGF structure. The thermodynamically most stable structure is achieved when the former factor is minimized, and the latter factor is simultaneously maximized. Infrared (IR) spectra computed using DFT and DFTB reveal a close correlation between the relative thermodynamic stabilities of the oxidized HGF structures and their IR spectral activities. The most stable oxidized structures exhibit significant IR activity between 600 and 1800 cm(-1), whereas less stable oxidized structures exhibit little to no activity in this region. In contrast, Raman spectra are found to be less informative in this respect.

SELECTION OF CITATIONS
SEARCH DETAIL
...