Your browser doesn't support javascript.
Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format.
Kryukov, Kirill; Jin, Lihua; Nakagawa, So.
  • Kryukov K; Department of Informatics, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan.
  • Jin L; Genomus Co., Ltd., Sagamihara, Kanagawa 252-0226, Japan.
  • Nakagawa S; Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan.
Patterns (N Y) ; 3(9): 100562, 2022 Sep 09.
Article in English | MEDLINE | ID: covidwho-1914886
ABSTRACT
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome data are essential for epidemiology, vaccine development, and tracking emerging variants. Millions of SARS-CoV-2 genomes have been sequenced during the pandemic. However, downloading SARS-CoV-2 genomes from databases is slow and unreliable, largely due to suboptimal choice of compression method. We evaluated the available compressors and found that Nucleotide Archival Format (NAF) would provide a drastic improvement compared with current methods. For Global Initiative on Sharing Avian Flu Data's (GISAID) pre-compressed datasets, NAF would increase efficiency 52.2 times for gzip-compressed data and 3.7 times for xz-compressed data. For DNA DataBank of Japan (DDBJ), NAF would improve throughput 40 times for gzip-compressed data. For GenBank and European Nucleotide Archive (ENA), NAF would accelerate data distribution by a factor of 29.3 times compared with uncompressed FASTA. This article provides a tutorial for installing and using NAF. Offering a NAF download option in sequence databases would provide a significant saving of time, bandwidth, and disk space and accelerate biological and medical research worldwide.
Keywords

Full text: Available Collection: International databases Database: MEDLINE Type of study: Experimental Studies Topics: Vaccines / Variants Language: English Journal: Patterns (N Y) Year: 2022 Document Type: Article Affiliation country: J.patter.2022.100562

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: International databases Database: MEDLINE Type of study: Experimental Studies Topics: Vaccines / Variants Language: English Journal: Patterns (N Y) Year: 2022 Document Type: Article Affiliation country: J.patter.2022.100562