Search | VHL Regional Portal

Author Correction: Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity.

Islam, M Rafiul; Hoque, M Nazmul; Rahman, M Shaminur; Alam, A S M Rubayet Ul; Akther, Masuda; Puspo, J Akter; Akter, Salma; Sultana, Munawar; Crandall, Keith A; Hossain, M Anwar.

Sci Rep ; 11(1): 20568, 2021 Oct 12.

Article in English | MEDLINE | ID: mdl-34642347

Comprehensive annotations of the mutational spectra of SARS-CoV-2 spike protein: a fast and accurate pipeline.

Rahman, Mohammad Shaminur; Islam, Mohammad Rafiul; Hoque, Mohammad Nazmul; Alam, Abu Sayed Mohammad Rubayet Ul; Akther, Masuda; Puspo, Joynob Akter; Akter, Salma; Anwar, Azraf; Sultana, Munawar; Hossain, Mohammad Anwar.

Transbound Emerg Dis ; 68(3): 1625-1638, 2021 May.

Article in English | MEDLINE | ID: mdl-32954666

ABSTRACT

Infecting millions of people, the SARS-CoV-2 is evolving at an unprecedented rate, demanding advanced and specified analytic pipeline to capture the mutational spectra. In order to explore mutations and deletions in the spike (S) protein - the most-discussed protein of SARS-CoV-2 - we comprehensively analyzed 35,750 complete S protein-coding sequences through a custom Python-based pipeline. This GISAID-collected dataset of until 24 June 2020 covered six continents and five major climate zones. We identified 27,801 (77.77% sequences) mutated strains compared to reference Wuhan-Hu-1 wherein 84.40% of these strains mutated by only a single amino acid (aa). An outlier strain (EPI_ISL_463893) from Bosnia and Herzegovina possessed six aa substitutions. We also identified 11 residues with high aa mutation frequency, and each contains four types of aa variations. The infamous D614G variant has spread worldwide with ever-rising dominance and across regions with different climatic conditions alongside L5F and D936Y mutants, which have been documented throughout all regions and climate zones, respectively. We also found 988 unique aa substitutions spanned across 660 residues, which differed significantly among different continents (p = .003) and climatic zones (p = .021) as inferred with the Kruskal-Wallis test. Besides, 17 in-frame deletions at four sites adjacent to receptor-binding-domain were determined that may have a possible impact on attenuation. This study provides a fast and accurate pipeline for identifying mutations and deletions from the large dataset for coding and also non-coding sequences as evidenced by the representative analysis on existing S protein data. By using separate multi-sequence alignment, removing ambiguous sequences and in-frame stop codons, and utilizing pairwise alignment, this method can derive both synonymous and non-synonymous mutations (strain_ID reference aa:mutation position:strain aa). We suggest that the pipeline will aid in the evolutionary surveillance of any SARS-CoV-2 encoded proteins and will prove to be crucial in tracking the ever-increasing variation of many other divergent RNA viruses in the future. The code is available at https://github.com/SShaminur/Mutation-Analysis.

Subject(s)

COVID-19/virology , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics , Amino Acid Substitution , Databases, Factual , Humans , Mutation , Open Reading Frames , Sequence Alignment/veterinary , Sequence Deletion

Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity.

Islam, M Rafiul; Hoque, M Nazmul; Rahman, M Shaminur; Alam, A S M Rubayet Ul; Akther, Masuda; Puspo, J Akter; Akter, Salma; Sultana, Munawar; Crandall, Keith A; Hossain, M Anwar.

Sci Rep ; 10(1): 14004, 2020 08 19.

Article in English | MEDLINE | ID: mdl-32814791

ABSTRACT

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), a novel evolutionary divergent RNA virus, is responsible for the present devastating COVID-19 pandemic. To explore the genomic signatures, we comprehensively analyzed 2,492 complete and/or near-complete genome sequences of SARS-CoV-2 strains reported from across the globe to the GISAID database up to 30 March 2020. Genome-wide annotations revealed 1,516 nucleotide-level variations at different positions throughout the entire genome of SARS-CoV-2. Moreover, nucleotide (nt) deletion analysis found twelve deletion sites throughout the genome other than previously reported deletions at coding sequence of the ORF8 (open reading frame), spike, and ORF7a proteins, specifically in polyprotein ORF1ab (n = 9), ORF10 (n = 1), and 3´-UTR (n = 2). Evidence from the systematic gene-level mutational and protein profile analyses revealed a large number of amino acid (aa) substitutions (n = 744), demonstrating the viral proteins heterogeneous. Notably, residues of receptor-binding domain (RBD) showing crucial interactions with angiotensin-converting enzyme 2 (ACE2) and cross-reacting neutralizing antibody were found to be conserved among the analyzed virus strains, except for replacement of lysine with arginine at 378th position of the cryptic epitope of a Shanghai isolate, hCoV-19/Shanghai/SH0007/2020 (EPI_ISL_416320). Furthermore, our results of the preliminary epidemiological data on SARS-CoV-2 infections revealed that frequency of aa mutations were relatively higher in the SARS-CoV-2 genome sequences of Europe (43.07%) followed by Asia (38.09%), and North America (29.64%) while case fatality rates remained higher in the European temperate countries, such as Italy, Spain, Netherlands, France, England and Belgium. Thus, the present method of genome annotation employed at this early pandemic stage could be a promising tool for monitoring and tracking the continuously evolving pandemic situation, the associated genetic variants, and their implications for the development of effective control and prophylaxis strategies.

Subject(s)

Betacoronavirus/classification , Betacoronavirus/genetics , Coronavirus Infections/epidemiology , Genetic Heterogeneity , Genome, Viral/genetics , Genome-Wide Association Study/methods , Global Health , Pneumonia, Viral/epidemiology , Amino Acid Sequence/genetics , Angiotensin-Converting Enzyme 2 , Antibodies, Neutralizing/immunology , Base Pair Mismatch , Base Sequence/genetics , COVID-19 , Climate , Coronavirus Infections/virology , Humans , Open Reading Frames/genetics , Pandemics , Peptidyl-Dipeptidase A/metabolism , Phylogeny , Pneumonia, Viral/virology , Protein Domains/genetics , Protein Domains/immunology , SARS-CoV-2 , Sequence Deletion , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/metabolism

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL