Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Article in English | MEDLINE | ID: mdl-38512749

ABSTRACT

In this paper, we propose a new coding scheme for DNA storage using low-density parity-check (LDPC) codes and interleaving techniques. While conventional coding schemes generally employ error correcting codes in both inter and intra-oligo directions, we show that inter-oligo LDPC codes, optimized by differential evolution, are sufficient in ensuring the reliability of DNA storage due to the powerful soft decoding of LDPC codes. In addition, we apply interleaving techniques for handling non-uniform error characteristics of DNA storage to enhance the decoding performance. Consequently, the proposed coding scheme reduces the required number of oligo reads for perfect recovery by 26.25% ~ 38.5% compared to existing state-of-the-art coding schemes. Moreover, we develop an analytical DNA channel model in terms of non-uniform binary symmetric channels. This mathematical model allows us to demonstrate the superiority of the proposed coding scheme while isolating the experimental variation, as well as confirm the independent effects of LDPC codes and interleaving techniques.

2.
IEEE Trans Nanobioscience ; 23(1): 81-90, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37294652

ABSTRACT

Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3%  âˆ¼  7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Information Storage and Retrieval , DNA/genetics , DNA/chemistry
3.
Bioinformatics ; 39(9)2023 09 02.
Article in English | MEDLINE | ID: mdl-37669160

ABSTRACT

MOTIVATION: DNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed. RESULTS: We propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared with the two previous works. AVAILABILITY AND IMPLEMENTATION: Data and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes.


Subject(s)
Algorithms , Reading , Female , Pregnancy , Humans , Cluster Analysis , DNA
4.
Bioinformatics ; 37(19): 3136-3143, 2021 Oct 11.
Article in English | MEDLINE | ID: mdl-33904574

ABSTRACT

MOTIVATION: In DNA storage systems, there are tradeoffs between writing and reading costs. Increasing the code rate of error-correcting codes may save writing cost, but it will need more sequence reads for data retrieval. There is potentially a way to improve sequencing and decoding processes in such a way that the reading cost induced by this tradeoff is reduced without increasing the writing cost. In past researches, clustering, alignment and decoding processes were considered as separate stages but we believe that using the information from all these processes together may improve decoding performance. Actual experiments of DNA synthesis and sequencing should be performed because simulations cannot be relied on to cover all error possibilities in practical circumstances. RESULTS: For DNA storage systems using fountain code and Reed-Solomon (RS) code, we introduce several techniques to improve the decoding performance. We designed the decoding process focusing on the cooperation of key components: Hamming-distance based clustering, discarding of abnormal sequence reads, RS error correction as well as detection and quality score-based ordering of sequences. We synthesized 513.6 KB data into DNA oligo pools and sequenced this data successfully with Illumina MiSeq instrument. Compared to Erlich's research, the proposed decoding method additionally incorporates sequence reads with minor errors which had been discarded before, and thus was able to make use of 10.6-11.9% more sequence reads from the same sequencing environment, this resulted in 6.5-8.9% reduction in the reading cost. Channel characteristics including sequence coverage and read-length distributions are provided as well. AVAILABILITY AND IMPLEMENTATION: The raw data files and the source codes of our experiments are available at: https://github.com/jhjeong0702/dna-storage.

5.
Entropy (Basel) ; 21(2)2019 Feb 23.
Article in English | MEDLINE | ID: mdl-33266927

ABSTRACT

A new attack algorithm is proposed for a secure key generation and management method introduced by Yang and Wu. It was previously claimed that the key generation method of Yang and Wu using a keystore seed was information-theoretically secure and could solve the long-term key storage problem in cloud systems, thanks to the huge number of secure keys that the keystone seed can generate. Their key generation method, however, is considered to be broken if an attacker can recover the keystore seed. The proposed attack algorithm in this paper reconstructs the keystore seed of the Yang-Wu key generation method from a small number of collected keys. For example, when t = 5 and l = 2 7 , it was previously claimed that more than 2 53 secure keys could be generated, but the proposed attack algorithm can reconstruct the keystone seed based on only 84 collected keys. Hence, the Yang-Wu key generation method is not information-theoretically secure when the attacker can gather multiple keys and a critical amount of information about the keystone seed is leaked.

6.
Entropy (Basel) ; 20(9)2018 Aug 25.
Article in English | MEDLINE | ID: mdl-33265725

ABSTRACT

In this paper, a new family of binary LRCs (BLRCs) with locality 2 and uneven availabilities for hot data is proposed, which has a high information symbol availability and low parity symbol availabilities for the local repair of distributed storage systems. The local repair of each information symbol for the proposed codes can be done not by accessing other information symbols but only by accessing parity symbols. The proposed BLRCs with k = 4 achieve the optimality on the information length for their given code length, minimum Hamming distance, locality, and availability in terms of the well-known theoretical upper bound.

SELECTION OF CITATIONS
SEARCH DETAIL
...