Your browser doesn't support javascript.
loading
Time-series trend of pandemic SARS-CoV-2 variants visualized using batch-learning self-organizing map for oligonucleotide compositions
Takashi Abe; Ryuki Furukawa; Yuki Iwasaki; Toshimichi Ikemura.
Affiliation
  • Takashi Abe; Smart Information Systems, Faculty of Engineering, Niigata University
  • Ryuki Furukawa; Smart Information Systems, Faculty of Engineering, Niigata University
  • Yuki Iwasaki; Nagahama Institute of Bio-Science and Technology
  • Toshimichi Ikemura; Nagahama Institute of Bio-Science and Technology
Preprint in English | bioRxiv | ID: ppbiorxiv-439956
Journal article
A scientific journal published article is available and is probably based on this preprint. It has been identified through a machine matching algorithm, human confirmation is still pending.
See journal article
ABSTRACT
To confront the global threat of coronavirus disease 2019, a massive number of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequences have been decoded, with the results promptly released through the GISAID database. Based on variant types, eight clades have already been defined in GISAID, but the diversity can be far greater. Owing to the explosive increase in available sequences, it is important to develop new technologies that can easily grasp the whole picture of the big-sequence data and support efficient knowledge discovery. An ability to efficiently clarify the detailed time-series changes in genome-wide mutation patterns will enable us to promptly identify and characterize dangerous variants that rapidly increase their population frequency. Here, we collectively analyzed over 150,000 SARS-CoV-2 genomes to understand their overall features and time-dependent changes using a batch-learning self-organizing map (BLSOM) for oligonucleotide composition, which is an unsupervised machine learning method. BLSOM can separate clades defined by GISAID with high precision, and each clade is subdivided into clusters, which shows a differential increase/decrease pattern based on geographic region and time. This allowed us to identify prevalent strains in each region and to show the commonality and diversity of the prevalent strains. Comprehensive characterization of the oligonucleotide composition of SARS-CoV-2 and elucidation of time-series trends of the population frequency of variants can clarify the viral adaptation processes after invasion into the human population and the time-dependent trend of prevalent epidemic strains across various regions, such as continents.
License
cc_by_nc
Full text: Available Collection: Preprints Database: bioRxiv Type of study: Experimental_studies Language: English Year: 2021 Document type: Preprint
Full text: Available Collection: Preprints Database: bioRxiv Type of study: Experimental_studies Language: English Year: 2021 Document type: Preprint
...