Search | VHL Regional Portal

1.

AI-accelerated protein-ligand docking for SARS-CoV-2 is 100-fold faster with no significant change in detection.

Clyde, Austin; Liu, Xuefeng; Brettin, Thomas; Yoo, Hyunseung; Partin, Alexander; Babuji, Yadu; Blaiszik, Ben; Mohd-Yusof, Jamaludin; Merzky, Andre; Turilli, Matteo; Jha, Shantenu; Ramanathan, Arvind; Stevens, Rick.

Sci Rep ; 13(1): 2105, 2023 02 06.

Article in English | MEDLINE | ID: mdl-36747041

ABSTRACT

Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/metabolism , Artificial Intelligence , Molecular Docking Simulation , Ligands , Proteins/metabolism

2.

A new hourly dataset for photovoltaic energy production for the continental USA.

Hu, Weiming; Cervone, Guido; Merzky, Andre; Turilli, Matteo; Jha, Shantenu.

Data Brief ; 40: 107824, 2022 Feb.

Article in English | MEDLINE | ID: mdl-35141367

ABSTRACT

This new dataset is an ensemble of solar photovoltaic energy production simulations over the continental US. The simulations are carried out in three steps. First, a weather forecast system is used for the predictions of incoming insolation; then, forecast ensembles with 21 members are generated using the Analog Ensemble technique; finally, each ensemble member is used to simulate 13 different solar panels. In total, there are 21 × 13 = 273 simulated scenarios. Simulations are carried out for the entire year 2019, with a temporal resolution of one hour, and a spatial resolution of 12 km. The data provide a high spatio-temporal analysis of the power production under different weather and engineering scenarios. The size of the entire dataset is about 1 TB but can be openly accessed by days and scenarios. Details on how to access and use such a dataset are provided in this article.

3.

High-Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Noncovalent Inhibitor.

Clyde, Austin; Galanie, Stephanie; Kneller, Daniel W; Ma, Heng; Babuji, Yadu; Blaiszik, Ben; Brace, Alexander; Brettin, Thomas; Chard, Kyle; Chard, Ryan; Coates, Leighton; Foster, Ian; Hauner, Darin; Kertesz, Vilmos; Kumar, Neeraj; Lee, Hyungro; Li, Zhuozhao; Merzky, Andre; Schmidt, Jurgen G; Tan, Li; Titov, Mikhail; Trifan, Anda; Turilli, Matteo; Van Dam, Hubertus; Chennubhotla, Srinivas C; Jha, Shantenu; Kovalevsky, Andrey; Ramanathan, Arvind; Head, Martha S; Stevens, Rick.

J Chem Inf Model ; 62(1): 116-128, 2022 01 10.

Article in English | MEDLINE | ID: mdl-34793155

ABSTRACT

Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.

Subject(s)

COVID-19 , Protease Inhibitors , Antiviral Agents , Coronavirus 3C Proteases , Humans , Molecular Docking Simulation , Molecular Dynamics Simulation , Orotic Acid/analogs & derivatives , Piperazines , SARS-CoV-2

4.

Pandemic drugs at pandemic speed: infrastructure for accelerating COVID-19 drug discovery with hybrid machine learning- and physics-based simulations on high-performance computers.

Bhati, Agastya P; Wan, Shunzhou; Alfè, Dario; Clyde, Austin R; Bode, Mathis; Tan, Li; Titov, Mikhail; Merzky, Andre; Turilli, Matteo; Jha, Shantenu; Highfield, Roger R; Rocchia, Walter; Scafuri, Nicola; Succi, Sauro; Kranzlmüller, Dieter; Mathias, Gerald; Wifling, David; Donon, Yann; Di Meglio, Alberto; Vallecorsa, Sofia; Ma, Heng; Trifan, Anda; Ramanathan, Arvind; Brettin, Tom; Partin, Alexander; Xia, Fangfang; Duan, Xiaotan; Stevens, Rick; Coveney, Peter V.

Interface Focus ; 11(6): 20210018, 2021 Dec 06.

Article in English | MEDLINE | ID: mdl-34956592

ABSTRACT

The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers.

5.

Adaptive distributed replica-exchange simulations.

Luckow, Andre; Jha, Shantenu; Kim, Joohyun; Merzky, Andre; Schnor, Bettina.

Philos Trans A Math Phys Eng Sci ; 367(1897): 2595-606, 2009 Jun 28.

Article in English | MEDLINE | ID: mdl-19451113

ABSTRACT

Owing to the loose coupling between replicas, the replica-exchange (RE) class of algorithms should be able to benefit greatly from using as many resources as available. However, the ability to effectively use multiple distributed resources to reduce the time to completion remains a challenge at many levels. Additionally, an implementation of a pleasingly distributed algorithm such as replica-exchange, which is independent of infrastructural details, does not exist. This paper proposes an extensible and scalable framework based on Simple API for Grid Applications that provides a general-purpose, opportunistic mechanism to effectively use multiple resources in an infrastructure-independent way. By analysing the requirements of the RE algorithm and the challenges of implementing it on real production systems, we propose a new abstraction (BigJob), which forms the basis of the adaptive redistribution and effective scheduling of replicas.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL