Running ahead of evolution - AI based simulation for predicting future high-risk SARS-CoV-2 variants

Jie Chen; Zhiwei Nie; Yu Wang; Kai Wang; Fan Xu; Zhiheng Hu; Bin Zheng; Zhennan Wang; Guoli Song; Jingyi Zhang; Jie Fu; Xiansong Huang; Zhongqi Wang; Zhixiang Ren; Qiankun Wang; Daixi Li; Dongqing Wei; Bin Zhou; Chao Yang; Yonghong Tian

This article is a Preprint

Preprints are preliminary research reports that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Preprints posted online allow authors to receive rapid feedback and the entire scientific community can appraise the work for themselves and respond appropriately. Those comments are posted alongside the preprints for anyone to read them and serve as a post publication assessment.

Running ahead of evolution - AI based simulation for predicting future high-risk SARS-CoV-2 variants

Jie Chen; Zhiwei Nie; Yu Wang; Kai Wang; Fan Xu; Zhiheng Hu; Bin Zheng; Zhennan Wang; Guoli Song; Jingyi Zhang; Jie Fu; Xiansong Huang; Zhongqi Wang; Zhixiang Ren; Qiankun Wang; Daixi Li; Dongqing Wei; Bin Zhou; Chao Yang; Yonghong Tian.

Affiliation

Jie Chen; Peng Cheng Laboratory
Zhiwei Nie; Peng Cheng Laboratory
Yu Wang; Peng Cheng Laboratory
Kai Wang; Peng Cheng Laboratory
Fan Xu; Peng Cheng Laboratory
Zhiheng Hu; Peng Cheng Laboratory
Bin Zheng; Peng Cheng Laboratory
Zhennan Wang; Peng Cheng Laboratory
Guoli Song; Peng Cheng Laboratory
Jingyi Zhang; Peng Cheng Laboratory
Jie Fu; Beijing Academy of Artificial Intelligence
Xiansong Huang; Peng Cheng Laboratory
Zhongqi Wang; Peng Cheng Laboratory
Zhixiang Ren; Peng Cheng Laboratory
Qiankun Wang; Peng Cheng Laboratory
Daixi Li; Peng Cheng Laboratory
Dongqing Wei; Peng Cheng Laboratory
Bin Zhou; Shandong University
Chao Yang; Peking University,
Yonghong Tian; Peng Cheng Laboratory

Preprint in English | bioRxiv | ID: ppbiorxiv-516989

ABSTRACT

ABSTRACT

The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model on approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9x speedup in mixed precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https//github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.

License

cc_by_nc_nd

Fulltext

Add to My VHL

XML

Search on Google

Full text: Available Collection: Preprints Database: bioRxiv Type of study: Prognostic study Language: English Year: 2022 Document type: Preprint

Fulltext

Add to My VHL

XML

Search on Google

Full text: Available Collection: Preprints Database: bioRxiv Type of study: Prognostic study Language: English Year: 2022 Document type: Preprint