This article is a Preprint
Preprints are preliminary research reports that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Preprints posted online allow authors to receive rapid feedback and the entire scientific community can appraise the work for themselves and respond appropriately. Those comments are posted alongside the preprints for anyone to read them and serve as a post publication assessment.
Variation in synonymous nucleotide composition among genomes of sarbecoviruses and consequences for the origin of COVID-19
Preprint
in English
| bioRxiv
| ID: ppbiorxiv-457807
ABSTRACT
The subgenus Sarbecovirus includes two human viruses, SARS-CoV and SARS-CoV-2, respectively responsible for the SARS epidemic and COVID-19 pandemic, as well as many bat viruses and two pangolin viruses. Here, the synonymous nucleotide composition (SNC) of Sarbecovirus genomes was analysed by examining third codon-positions, dinucleotides, and degenerate codons. The results show evidence for the eigth following groups (i) SARS-CoV related coronaviruses (SCoVrC including many bat viruses from China), (ii) SARS-CoV-2 related coronaviruses (SCoV2rC; including five bat viruses from Cambodia, Thailand and Yunnan), (iii) pangolin viruses, (iv) three bat viruses showing evidence of recombination between SCoVrC and SCoV2rC genomes, (v) two highly divergent bat viruses from Yunnan, (vi) the bat virus from Japan, (vii) the bat virus from Bulgaria, and (viii) the bat virus from Kenya. All these groups can be diagnosed by specific nucleotide compositional features except the one concerned by recombination between SCoVrC and SCoV2rC. In particular, SCoV2rC genomes are characterised by the lowest percentages of cyosine and highest percentages of uracil at third codon-positions, whereas the genomes of pangolin viruses exhibit the highest percentages of adenine at third codon-positions. I suggest that latitudinal and taxonomic differences in the imbalanced nucleotide pools available in host cells during viral replication can explain the seven groups of SNC here detected among Sarbecovirus genomes. A related effect due to hibernating bats is also considered. I conclude that the two independent host switches from Rhinolophus bats to pangolins resulted in convergent mutational constraints and that SARS-CoV-2 emerged directly from a horseshoe bat virus.
cc_by_nc_nd
Full text:
Available
Collection:
Preprints
Database:
bioRxiv
Type of study:
Experimental_studies
/
Rct
Language:
English
Year:
2021
Document type:
Preprint