Your browser doesn't support javascript.
loading
Running ahead of evolution - AI based simulation for predicting future high-risk SARS-CoV-2 variants
Jie Chen; Zhiwei Nie; Yu Wang; Kai Wang; Fan Xu; Zhiheng Hu; Bin Zheng; Zhennan Wang; Guoli Song; Jingyi Zhang; Jie Fu; Xiansong Huang; Zhongqi Wang; Zhixiang Ren; Qiankun Wang; Daixi Li; Dongqing Wei; Bin Zhou; Chao Yang; Yonghong Tian.
Affiliation
  • Jie Chen; Peng Cheng Laboratory
  • Zhiwei Nie; Peng Cheng Laboratory
  • Yu Wang; Peng Cheng Laboratory
  • Kai Wang; Peng Cheng Laboratory
  • Fan Xu; Peng Cheng Laboratory
  • Zhiheng Hu; Peng Cheng Laboratory
  • Bin Zheng; Peng Cheng Laboratory
  • Zhennan Wang; Peng Cheng Laboratory
  • Guoli Song; Peng Cheng Laboratory
  • Jingyi Zhang; Peng Cheng Laboratory
  • Jie Fu; Beijing Academy of Artificial Intelligence
  • Xiansong Huang; Peng Cheng Laboratory
  • Zhongqi Wang; Peng Cheng Laboratory
  • Zhixiang Ren; Peng Cheng Laboratory
  • Qiankun Wang; Peng Cheng Laboratory
  • Daixi Li; Peng Cheng Laboratory
  • Dongqing Wei; Peng Cheng Laboratory
  • Bin Zhou; Shandong University
  • Chao Yang; Peking University,
  • Yonghong Tian; Peng Cheng Laboratory
Preprint in English | bioRxiv | ID: ppbiorxiv-516989
ABSTRACT
The never-ending emergence of SARS-CoV-2 variations of concern (VOCs) has challenged the whole world for pandemic control. In order to develop effective drugs and vaccines, one needs to efficiently simulate SARS-CoV-2 spike receptor binding domain (RBD) mutations and identify high-risk variants. We pretrain a large protein language model on approximately 408 million protein sequences and construct a high-throughput screening for the prediction of binding affinity and antibody escape. As the first work on SARS-CoV-2 RBD mutation simulation, we successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds. Our workflow scales to 4096 NPUs with 96.5% scalability and 493.9x speedup in mixed precision computing, while achieving a peak performance of 366.8 PFLOPS (reaching 34.9% theoretical peak) on Pengcheng Cloudbrain-II. Our method paves the way for simulating coronavirus evolution in order to prepare for a future pandemic that will inevitably take place. Our models are released at https//github.com/ZhiweiNiepku/SARS-CoV-2_mutation_simulation to facilitate future related work.
License
cc_by_nc_nd
Full text: Available Collection: Preprints Database: bioRxiv Type of study: Prognostic study Language: English Year: 2022 Document type: Preprint
Full text: Available Collection: Preprints Database: bioRxiv Type of study: Prognostic study Language: English Year: 2022 Document type: Preprint
...