Your browser doesn't support javascript.
Clustering analysis of single nucleotide polymorphism data reveals population structure of SARS-CoV-2 worldwide (preprint)
biorxiv; 2020.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.09.04.283358
ABSTRACT
The SARS-CoV-2 virus has been spreading rapidly and across the globe since first being reported in December 2019. To understand the evolutionary trajectory of the coronavirus, phylogenetic analysis is needed to study the population structure of SARS-CoV-2. As sequencing data worldwide is accruing rapidly, grouping them into clusters helps to organize the landscape of population structures. To effectively group these data, computational methodologies are needed to provide more productive and robust solutions for clustering. In this study, using the single nucleotide polymorphisms of the viral sequences as input features, we utilized three clustering algorithms, namely K-means, hierarchical clustering and balanced iterative reducing and clustering using hierarchies to partition the viral sequences into six major clusters. Comparison of the three clustering results reveals that the three methods produced highly consistent results, but K-means performed best and produced the smallest intra-cluster pairwise genetic distances among the three methods. The partition of the viral sequences revealed that the six clusters differed in their geographical distributions. Using comprehensive approaches to compare the diversity and selective pressure across the clusters, we discovered a high genetic diversity between the clusters. Based on characteristics of the mutation profiles in each cluster along with their geographical distributions and evolutionary histories, we identified the extent of molecular divergence within and between the clusters. The identification of the mutations that are strongly associated with clusters have potential implications for diagnosis and pathogenesis of COVID-19. In addition, the clustering method will enable further study of variant population structures in specific regions of these fast-growing viruses.
Subject(s)

Full text: Available Collection: Preprints Database: bioRxiv Main subject: COVID-19 Language: English Year: 2020 Document Type: Preprint

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Preprints Database: bioRxiv Main subject: COVID-19 Language: English Year: 2020 Document Type: Preprint