Your browser doesn't support javascript.
loading
Unsupervised explainable AI for simultaneous molecular evolutionary study of forty thousand SARS-CoV-2 genomes
Toshimichi Ikemura; Kennosuke Wada; Yoshiko Wada; Yuki Iwasaki; Takashi Abe.
Affiliation
  • Toshimichi Ikemura; Nagahama Institute of Bio-Science and Technology
  • Kennosuke Wada; Nagahama Institute of Bio-Science and Technology
  • Yoshiko Wada; Nagahama Institute of Bio-Science and Technology
  • Yuki Iwasaki; Nagahama Institute of Bio-Science and Technology
  • Takashi Abe; Faculty of Engineering, Niigata University
Preprint in En | PREPRINT-BIORXIV | ID: ppbiorxiv-335406
ABSTRACT
Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes. While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes.
License
cc_by_nc
Full text: 1 Collection: 09-preprints Database: PREPRINT-BIORXIV Language: En Year: 2020 Document type: Preprint
Full text: 1 Collection: 09-preprints Database: PREPRINT-BIORXIV Language: En Year: 2020 Document type: Preprint