Entropy Based Clustering of Viral Sequences
18th International Symposium on Bioinformatics Research and Applications, ISBRA 2022
; 13760 LNBI:369-380, 2022.
Article
in English
| Scopus | ID: covidwho-2265112
ABSTRACT
Clustering viral sequences allows us to characterize the composition and structure of intrahost and interhost viral populations, which play a crucial role in disease progression and epidemic spread. In this paper we propose and validate a new entropy based method for clustering aligned viral sequences considered as categorical data. The method finds a homogeneous clustering by minimizing information entropy rather than distance between sequences in the same cluster. We have applied our entropy based clustering method to SARS-CoV-2 viral sequencing data. We report the information content extracted from the sequences by entropy based clustering. Our method converges to similar minimum-entropy clusterings across different runs and limited permutations of data. We also show that a parallelized version of our tool is scalable to very large SARS-CoV-2 datasets. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Full text:
Available
Collection:
Databases of international organizations
Database:
Scopus
Language:
English
Journal:
18th International Symposium on Bioinformatics Research and Applications, ISBRA 2022
Year:
2022
Document Type:
Article
Similar
MEDLINE
...
LILACS
LIS