Your browser doesn't support javascript.
ABSTRACT
With an increasing number of SARS-CoV-2 sequences available day by day, new genomic information is getting revealed to us. As SARS-CoV-2 sequences highlight wide changes across the samples, we aim to explore whether these changes reveal the geographical origin of the corresponding samples. The k-mer distributions, denoting normalized frequency counts of all possible combinations of nucleotide of size upto k, are often helpful to explore sequence level patterns. Given the SARS-CoV-2 sequences are highly imbalanced by its geographical origin (relatively with a higher number samples collected from the USA), we observe that with proper under-sampling k-mer distributions in the SARS-CoV-2 sequences predict its geographical origin with more than 90% accuracy. The experiments are performed on the samples collected from six countries with maximum number of sequences available till July 07, 2020. This comprises SARS-CoV-2 sequences from Australia, USA, China, India, Greece and France. Moreover, we demonstrate that the changes of genomic sequences characterize the continents as a whole. We also highlight that the network motifs present in the sequence similarity networks have a significant difference across the said countries. This, as a whole, is capable of predicting the geographical shift of SARS-CoV-2.

Full text: Available Collection: Preprints Database: bioRxiv Language: English Year: 2020 Document Type: Preprint

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Preprints Database: bioRxiv Language: English Year: 2020 Document Type: Preprint