This article is a Preprint
Preprints are preliminary research reports that have not been certified by peer review. They should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Preprints posted online allow authors to receive rapid feedback and the entire scientific community can appraise the work for themselves and respond appropriately. Those comments are posted alongside the preprints for anyone to read them and serve as a post publication assessment.
Unsupervised genome-wide cluster analysis: nucleotide sequences of the omicron variant of SARS-CoV-2 are similar to sequences from early 2020 (preprint)
biorxiv; 2021.
Preprint
in English
| bioRxiv | ID: ppzbmed-10.1101.2021.12.29.474469
ABSTRACT
The GISAID database contains more than 100,000 SARS-CoV-2 genomes, including sequences of the recently discovered SARS-CoV-2 omicron variant and of prior SARS-CoV-2 strains that have been collected from patients around the world since the beginning of the pandemic. We applied unsupervised cluster analysis to the SARS-CoV-2 genomes, assessing their similarity at a genome-wide level based on the Jaccard index and principal component analysis. Our analysis results show that the omicron variant sequences are most similar to sequences that have been submitted early in the pandemic around January 2020. Furthermore, the omicron variants in GISAID are spread across the entire range of the first principal component, suggesting that the strain has been in circulation for some time. This observation supports a long-term infection hypothesis as the omicron strain origin.
Full text:
Available
Collection:
Preprints
Database:
bioRxiv
Language:
English
Year:
2021
Document Type:
Preprint
Similar
MEDLINE
...
LILACS
LIS