Comprehensive annotations of the mutational spectra of SARS-CoV-2 spike protein: a fast and accurate pipeline

M. Shaminur Rahman; Md. Rafiul Islam; M. Nazmul Hoque; A. S. M. Rubayet Ul Alam; Masuda Akther; J. Akter Puspo; Salma Akter; Azraf Anwar; Munawar Sultana; Md. Anwar Hossain

This article is a Preprint

Preprints are preliminary research reports that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Preprints posted online allow authors to receive rapid feedback and the entire scientific community can appraise the work for themselves and respond appropriately. Those comments are posted alongside the preprints for anyone to read them and serve as a post publication assessment.

Comprehensive annotations of the mutational spectra of SARS-CoV-2 spike protein: a fast and accurate pipeline

M. Shaminur Rahman; Md. Rafiul Islam; M. Nazmul Hoque; A. S. M. Rubayet Ul Alam; Masuda Akther; J. Akter Puspo; Salma Akter; Azraf Anwar; Munawar Sultana; Md. Anwar Hossain.

Affiliation

M. Shaminur Rahman; Department of Microbiology, University of Dhaka
Md. Rafiul Islam; University of Dhaka
M. Nazmul Hoque; Bangabandhu Sheikh Mujibur Rahman Agricultural University Gazipur-1706, Bangladesh
A. S. M. Rubayet Ul Alam; Department of Microbiology, University of Dhaka
Masuda Akther; Department of Microbiology, University of Dhaka
J. Akter Puspo; Department of Microbiology, University of Dhaka
Salma Akter; Department of Microbiology, University of Dhaka
Azraf Anwar; British Columbia University, Vancouver, Canada
Munawar Sultana; Department of Microbiology, University of Dhaka
Md. Anwar Hossain; University of Dhaka

Preprint in English | bioRxiv | ID: ppbiorxiv-177238

ABSTRACT

ABSTRACT

In order to explore nonsynonymous mutations and deletions in the spike (S) protein of SARS-CoV-2, we comprehensively analyzed 35,750 complete S protein gene sequences from across six continents and five climate zones around the world, as documented in the GISAID database as of June 24th, 2020. Through a custom Python-based pipeline for analyzing mutations, we identified 27,801 (77.77 % of spike sequences) mutated strains compared to Wuhan-Hu-1 strain. 84.40% of these strains had only single amino-acid (aa) substitution mutations, but an outlier strain from Bosnia and Herzegovina (EPI_ISL_463893) was found to possess six aa substitutions. The D614G variant of the major G clade was found to be predominant across circulating strains in all climates. We also identified 988 unique aa substitution mutations distributed across 660 positions within the spike protein, with eleven sites showing high variability - these sites had four types of aa variations at each position. Besides, 17 in-frame deletions at four major regions (three in N-terminal domain and one just downstream of the RBD) may have possible impact on attenuation. Moreover, the mutational frequency differed significantly (p= 0.003, Kruskal-Wallis test) among the SARS-CoV-2 strains worldwide. This study presents a fast and accurate pipeline for identifying nonsynonymous mutations and deletions from large dataset for any particular protein coding sequence and presents this S protein data as representative analysis. By using separate multi-sequence alignment with MAFFT, removing ambiguous sequences and in-frame stop codons, and utilizing pairwise alignment, this method can derive nonsynonymus mutations (ReferencePositionStrain). We believe this will aid in the surveillance of any proteins encoded by SARS-CoV-2, and will prove to be crucial in tracking the ever-increasing variation of many other divergent RNA viruses in the future.

License

cc_by_nc

Fulltext

Add to My VHL

XML

Search on Google

Full text: Available Collection: Preprints Database: bioRxiv Language: English Year: 2020 Document type: Preprint

Fulltext

Add to My VHL

XML

Search on Google

Full text: Available Collection: Preprints Database: bioRxiv Language: English Year: 2020 Document type: Preprint