Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
Add more filters











Publication year range
1.
Mater Today Bio ; 28: 101221, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39309163

ABSTRACT

The exponential increasement and the attributes of medical data drive the requirement for secure medical data archiving. DNA data storage shows promise for storing sensitive and important data like medical records due to its high density and endurance. Nevertheless, current DNA data storage working scheme generally does not fully consider the data encryption, posing a risk of data corruption by routine DNA sequencing. Here, we designed a "multi-layer" encryption pipeline for medical data archiving. Initially, digital information was encrypted using Blowfish algorithm at information technology (IT) layer, followed by two-layer data encryption at the biotechnology (BT) layer. The first BT layer exploited the molecular weight of synthetic DNA or nucleoside to encrypt the key, while the second BT layer encrypted digital information within DNA sequences. Consequently, decryption involved layer-by-layer interpretation of data, including mass spectroscopy, sequencing, and Blowfish decryption, significantly enhancing data security. Utilizing mass spectroscopy to retrieve information allows for employment of both natural and unnatural nucleosides, as well as their synthetic oligonucleotides, for data storage, thereby considerably boosting scalability. Our work implies expanded flexibility of DNA-based data storage, highlighting the potential for leveraging various physical and chemical characteristics of DNA molecules to encode and access digital information.

2.
Biomed Eng Lett ; 14(5): 993-1009, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39220021

ABSTRACT

DNA data storage has emerged as a solution for storing massive volumes of data by utilizing nucleic acids as a digital information medium. DNA offers exceptionally high storage density, long durability, and low maintenance costs compared to conventional storage media such as flash memory and hard disk drives. DNA data storage consists of the following steps: encoding, DNA synthesis (i.e., writing), preservation, retrieval, DNA sequencing (i.e., reading), and decoding. Out of these steps, DNA synthesis presents a bottleneck due to imperfect coupling efficiency, low throughput, and excessive use of organic solvents. Overcoming these challenges is essential to establish DNA as a viable data storage medium. In this review, we provide the overall process of DNA data storage, presenting the recent progress of each step. Next, we examine a detailed overview of DNA synthesis methods with an emphasis on their limitations. Lastly, we discuss the efforts to overcome the constraints of each method and their prospects.

3.
Imeta ; 3(2): e168, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38882485

ABSTRACT

Deoxyribonucleic acid (DNA) has been suggested as a very promising medium for data storage in recent years. Although numerous studies have advocated for DNA data storage, its practical application remains obscure and there is a lack of a user-oriented platform. Here, we developed a DNA data storage platform, named Storage-D, which allows users to convert their data into DNA sequences of any length and vice versa by selecting algorithms, error-correction, random-access, and codec pin strategies in terms of their own choice. It incorporates a newly designed "Wukong" algorithm, which provides over 20 trillion codec pins for data privacy use. This algorithm can also control GC content to the selected standard, as well as adjust the homopolymer run length to a defined level, while maintaining a high coding potential of ~1.98 bis/nt, allowing it to outperform previous algorithms. By connecting to a commercial DNA synthesis and sequencing platform with "Storage-D," we successfully stored "Diagnosis and treatment protocol for COVID-19 patients" into 200 nt oligo pools in vitro, and 500 bp genes in vivo which replicated in both normal and extreme bacteria. Together, this platform allows for practical and personalized DNA data storage, potentially with a wide range of applications.

4.
Small Methods ; : e2301585, 2024 May 29.
Article in English | MEDLINE | ID: mdl-38807543

ABSTRACT

DNA-based data storage is a new technology in computational and synthetic biology, that offers a solution for long-term, high-density data archiving. Given the critical importance of medical data in advancing human health, there is a growing interest in developing an effective medical data storage system based on DNA. Data integrity, accuracy, reliability, and efficient retrieval are all significant concerns. Therefore, this study proposes an Effective DNA Storage (EDS) approach for archiving medical MRI data. The EDS approach incorporates three key components (i) a novel fraction strategy to address the critical issue of rotating encoding, which often leads to data loss due to single base error propagation; (ii) a novel rule-based quaternary transcoding method that satisfies bio-constraints and ensure reliable mapping; and (iii) an indexing technique designed to simplify random search and access. The effectiveness of this approach is validated through computer simulations and biological experiments, confirming its practicality. The EDS approach outperforms existing methods, providing superior control over bio-constraints and reducing computational time. The results and code provided in this study open new avenues for practical DNA storage of medical MRI data, offering promising prospects for the future of medical data archiving and retrieval.

5.
Adv Mater ; : e2403071, 2024 May 23.
Article in English | MEDLINE | ID: mdl-38779945

ABSTRACT

This study develops two deoxyribonucleic acid (DNA) lossy compression models, Models A and B, to encode grayscale images into DNA sequences, enhance information density, and enable high-fidelity image recovery. These models, distinguished by their handling of pixel domains and interpolation methods, offer a novel approach to data storage for DNA. Model A processes pixels in overlapped domains using linear interpolation (LI), whereas Model B uses non-overlapped domains with nearest-neighbor interpolation (NNI). Through a comparative analysis with Joint Photographic Experts Group (JPEG) compression, the DNA lossy compression models demonstrate competitive advantages in terms of information density and image quality restoration. The application of these models to the Modified National Institute of Standards and Technology (MNIST) dataset reveals their efficiency and the recognizability of decompressed images, which is validated by convolutional neural network (CNN) performance. In particular, Model B2, a version of Model B, emerges as an effective method for balancing high information density (surpassing over 20 times the typical densities of two bits per nucleotide) with reasonably good image quality. These findings highlight the potential of DNA-based data storage systems for high-density and efficient compression, indicating a promising future for biological data storage solutions.

6.
Biosensors (Basel) ; 14(4)2024 Apr 06.
Article in English | MEDLINE | ID: mdl-38667170

ABSTRACT

Using DNA as the next-generation medium for data storage offers unparalleled advantages in terms of data density, storage duration, and power consumption as compared to existing data storage technologies. To meet the high-speed data writing requirements in DNA data storage, this paper proposes a novel design for an ultra-high-density and high-throughput DNA synthesis platform. The presented design mainly leverages two functional modules: a dynamic random-access memory (DRAM)-like integrated circuit (IC) responsible for electrode addressing and voltage supply, and the static droplet array (SDA)-based microfluidic structure to eliminate any reaction species diffusion concern in electrochemical DNA synthesis. Through theoretical analysis and simulation studies, we validate the effective addressing of 10 million electrodes and stable, adjustable voltage supply by the integrated circuit. We also demonstrate a reaction unit size down to 3.16 × 3.16 µm2, equivalent to 10 million/cm2, that can rapidly and stably generate static droplets at each site, effectively constraining proton diffusion. Finally, we conducted a synthesis cycle experiment by incorporating fluorescent beacons on a microfabricated electrode array to examine the feasibility of our design.


Subject(s)
DNA , Electrodes , Microfluidics , Biosensing Techniques
7.
Micromachines (Basel) ; 15(4)2024 Mar 30.
Article in English | MEDLINE | ID: mdl-38675287

ABSTRACT

DNA data storage based on synthetic oligonucleotides is a major attraction due to the possibility of storage over long periods. Nowadays, the quantity of data generated has been growing exponentially, and the storage capacity needs to keep pace with the growth caused by new technologies and globalization. Since DNA can hold a large amount of information with a high density and remains stable for hundreds of years, this technology offers a solution for current long-term data centers by reducing energy consumption and physical storage space. Currently, research institutes, technology companies, and universities are making significant efforts to meet the growing need for data storage. DNA data storage is a promising field, especially with the advancement of sequencing techniques and equipment, which now make it possible to read genomes (i.e., to retrieve the information) and process this data easily. To overcome the challenges associated with developing new technologies for DNA data storage, a message encoding and decoding exercise was conducted at a Brazilian research center. The exercise performed consisted of synthesizing oligonucleotides by the phosphoramidite route. An encoded message, using a coding scheme that adheres to DNA sequence constraints, was synthesized. After synthesis, the oligonucleotide was sequenced and decoded, and the information was fully recovered.

8.
Methods Mol Biol ; 2760: 133-145, 2024.
Article in English | MEDLINE | ID: mdl-38468086

ABSTRACT

Efficient preparation of DNA oligonucleotides containing unnatural nucleobases (UBs) that can pair with their cognates to form unnatural base pairs (UBPs) is an essential prerequisite for the application of UBPs in vitro and in vivo. Traditional preparation of oligonucleotides containing unnatural nucleobases largely relies on solid-phase synthesis, which needs to use unstable nucleoside phosphoramidites and a DNA synthesizer, and is environmentally unfriendly and limited in product length. To overcome these limitations of solid-phase synthesis, we developed enzymatic methods for daily laboratory preparation of DNA oligonucleotides containing unnatural nucleobase dNaM, dTPT3, or one of the functionalized dTPT3 derivatives, which can be used for orthogonal DNA labeling or the preparation of DNAs containing UBP dNaM-dTPT3, one of the most successful UBPs to date, based on the template-independent polymerase terminal deoxynucleotidyl transferase (TdT). Here, we first provide a detailed procedure for the TdT-based preparation of DNA oligonucleotides containing 3'-nucleotides of dNaM, dTPT3, or one of dTPT3 derivatives. We then present the procedures for enzyme-linked oligonucleotide assay (ELONA) and imaging of bacterial cells using DNA oligonucleotides containing 3'-nucleotides of dTPT3 derivatives with different functional groups. The procedure for enzymatic synthesis of DNAs containing an internal UBP dNaM-dTPT3 is also described. Hopefully, these methods will greatly facilitate the application of UBPs and the construction of semi-synthetic organisms with an expanded genetic alphabet.


Subject(s)
DNA Nucleotidylexotransferase , Synthetic Biology , DNA Nucleotidylexotransferase/genetics , Synthetic Biology/methods , DNA/genetics , DNA-Directed DNA Polymerase , Nucleotides/genetics , Oligonucleotides/genetics
9.
Adv Sci (Weinh) ; 11(15): e2305921, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38332565

ABSTRACT

DNA has emerged as an appealing material for information storage due to its great storage density and durability. Random reading and rewriting are essential tasks for practical large-scale data storage. However, they are currently difficult to implement simultaneously in a single DNA-based storage system, strongly limiting their practicability. Here, a "Cell Disk" storage system is presented, achieving high-density in vivo DNA data storage that enables both random reading and rewriting. In this system, each yeast cell is used as a chamber to store information, similar to a "disk block" but with the ability to self-replicate. Specifically, each genome of yeast cell has a customized CRISPR/Cas9-based "lock-and-key" module inserted, which allows selective retrieval, erasure, or rewriting of the targeted cell "block" from a pool of cells ("disk"). Additionally, a codec algorithm with lossless compression ability is developed to improve the information density of each cell "block". As a proof of concept, target-specific reading and rewriting of the compressed data from a mimic cell "disk" comprising up to 105 "blocks" are demonstrated and achieve high specificity and reliability. The "Cell Disk" system described here concurrently supports random reading and rewriting, and it should have great scalability for practical data storage use.


Subject(s)
Reading , Saccharomyces cerevisiae , Reproducibility of Results , Saccharomyces cerevisiae/genetics , DNA/genetics , Information Storage and Retrieval
10.
Biosystems ; 237: 105136, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38316169

ABSTRACT

DNA data storage has gained more attention last decades. DNA molecules can be used for encoding of non-biological information and as promising carriers due to greater data capacity, higher duration of the storage, and better technical failures stability. Here we propose a new method for encoding of notes and music in DNA. The encoding technique takes into account the duration and tonality of each note, enabling to encode all seven octaves by assigning a nucleotide sequence to each key. A certain set of short sequences is suggested to define the duration of note sound. The proposed method allows to encode more complicated melodies compared to the approach based on Huffman algorithm.


Subject(s)
Music , Sound , Algorithms , DNA/genetics
11.
Mater Today Bio ; 24: 100900, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38234463

ABSTRACT

Incorporating biomolecules as integral parts of computational systems represents a frontier challenge in bio- and nanotechnology. Using DNA to store digital data is an attractive alternative to conventional information technologies due to its high information density and long lifetime. However, developing an adequate DNA storage medium remains a significant challenge in permitting the safe archiving and retrieval of oligonucleotides. This work introduces composite nucleic acid-polymer fibers as matrix materials for digital information-bearing oligonucleotides. We devised a complete workflow for the stable storage of DNA in PEO, PVA, and PCL fibers by employing electrohydrodynamic processes to produce electrospun nanofibers with embedded oligonucleotides. The on-demand retrieval of messages is afforded by non-hazardous chemical treatment and subsequent PCR amplification and DNA sequencing. Finally, we develop a platform for melt-electrowriting of polymer-DNA composites to produce microfiber meshes of programmable patterns and geometries.

12.
Trends Biotechnol ; 42(2): 156-167, 2024 02.
Article in English | MEDLINE | ID: mdl-37673693

ABSTRACT

DNA is an intelligent data storage medium due to its stability and high density. It has been used by nature for over 3.5 billion years. Compared with traditional methods, DNA offers better compression and physical density. DNA can retain information for thousands of years. However, challenges exist in scalability, standardization, metadata gathering, biocybersecurity, and specialized tools. Addressing these challenges is crucial for widespread implementation. Collaboration among experts, as well as keeping the future in mind, is needed to unlock the full potential of DNA data storage, which promises low energy costs, high-density storage, and long-term stability.


Subject(s)
DNA , Information Storage and Retrieval , DNA/genetics
13.
Comput Struct Biotechnol J ; 23: 140-147, 2024 Dec.
Article in English | MEDLINE | ID: mdl-38146435

ABSTRACT

A secondary structure in single-stranded DNA refers to its propensity to undergo self-folding, leading to functional inactivity and irreparable failures within DNA storage systems. Consequently, the property of secondary structure avoidance (SSA) becomes a crucial criterion in the design of single-stranded DNA sequences for DNA storage, as it prohibits the inclusion of reverse-complement subsequences that contribute to such structures. This work is specifically focused on addressing the avoidance of secondary structures in single-stranded DNA sequences. We propose a novel sequence replacement approach, which successfully resolves the SSA problem under conditions where the stem exceeds a length of 2log2⁡n+2, and the loop is of length k≥4. These parameters have been carefully chosen to closely resemble the real-world scenarios encountered in biochemical processes, enhancing the practical relevance of our study.

14.
ACS Synth Biol ; 12(12): 3567-3577, 2023 Dec 15.
Article in English | MEDLINE | ID: mdl-37961855

ABSTRACT

A comprehensive error analysis of DNA-stored data during processing, such as DNA synthesis and sequencing, is crucial for reliable DNA data storage. Both synthesis and sequencing errors depend on the sequence and the transition of bases of nucleotides; ignoring either one of the error sources leads to technical challenges in minimizing the error rate. Here, we present a methodology and toolkit that utilizes an oligonucleotide library generated from a 10-base-shifted sequence array, which is individually labeled with unique molecular identifiers, to delineate and profile DNA synthesis and sequencing errors simultaneously. This methodology enables position- and sequence-independent error profiling of both DNA synthesis and sequencing. Using this toolkit, we report base transitional errors in both synthesis and sequencing in general DNA data storage as well as degenerate-base-augmented DNA data storage. The methodology and data presented will contribute to the development of DNA sequence designs with minimal error.


Subject(s)
DNA , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , DNA/genetics , DNA Replication , Nucleotides/genetics
15.
Comput Biol Med ; 165: 107404, 2023 10.
Article in English | MEDLINE | ID: mdl-37666064

ABSTRACT

DNA data storage is a promising technology that utilizes computer simulation, and synthetic biology, offering high-density and reliable digital information storage. It is challenging to store massive data in a small amount of DNA without losing the original data since nonspecific hybridization errors occur frequently and severely affect the reliability of stored data. This study proposes a novel biologically optimized encoding model for DNA data storage (BO-DNA) to overcome the reliability problem. BO-DNA model is developed by a new rule-based mapping method to avoid data drop during the transcoding of binary data to premier nucleotides. A customized optimization algorithm based on a tent chaotic map is applied to maximize the lower bounds that help to minimize the nonspecific hybridization errors. The robustness of BO-DNA is computed by four bio-constraints to confirm the reliability of newly generated DNA sequences. Experimentally, different medical images are encoded and decoded successfully with 12%-59% improved lower bounds and optimally constrained-based DNA sequences reported with 1.77bit/nt average density. BO-DNA's results demonstrate substantial advantages in constructing reliable DNA data storage.


Subject(s)
Algorithms , DNA , Computer Simulation , Reproducibility of Results , DNA/genetics
16.
Adv Sci (Weinh) ; 10(32): e2303197, 2023 11.
Article in English | MEDLINE | ID: mdl-37755129

ABSTRACT

DNA can be used to store digital data, and synthetic short-sequence DNA pools are developed to store high quantities of digital data. However, synthetic DNA data cannot be actively processed in DNA pools. An active DNA data editing process is developed using splint ligation in a droplet-controlled fluidics (DCF) system. DNA fragments of discrete sizes (100-500 bps) are synthesized for droplet assembly, and programmed sequence information exchange occurred. The encoded DNA sequences are processed in series and parallel to synthesize the determined DNA pools, enabling random access using polymerase chain reaction amplification. The sequencing results of the assembled DNA data pools can be orderly aligned for decoding and have high fidelity through address primer scanning. Furthermore, eight 90 bps DNA pools with pixel information (png: 0.27-0.28 kB), encoded by codons, are synthesized to create eight 270 bps DNA pools with an animation movie chip file (mp4: 12 kB) in the DCF system.


Subject(s)
DNA , DNA/genetics , Polymerase Chain Reaction/methods
17.
Small Methods ; 7(9): e2201610, 2023 09.
Article in English | MEDLINE | ID: mdl-37263984

ABSTRACT

DNA is a promising material for high density and long-term archival data storage. In addition to algorithms for encoding digital information into DNA sequences, the DNA writing (chemical synthesis) and reading (DNA sequencing), the preservation of DNA mixtures with high sequence diversity is another critical issue for sustainable, long-term, and large-scale DNA data storage. Here, this work demonstrates a method for low-cost, convenient and sustainable DNA data storage on cellulose paper. A DNA pool comprising thousands of sequences, in which archival data are encoded, is conveniently stored on a cellulose paper with a calculated density as high as 15 TB per mm3 through electrostatic adsorption. This work demonstrates that these digitally encoded DNA pools can be stable for years on the cellulose paper after drying even when directly exposed to air. Furthermore, the reversible electrostatic adsorption enables repeated loading/retrieval of DNA on/off cellulose paper. Therefore, this sustainable DNA preservation on cellulose paper through the convenient electrostatic adsorption exhibits a great advantage in terms of storage capacity and cost that is crucial for practical systems to achieve large-scale and long-time data storage.


Subject(s)
Cellulose , Information Storage and Retrieval , DNA/genetics , Sequence Analysis, DNA/methods , Algorithms
18.
ACS Appl Mater Interfaces ; 15(20): 24097-24108, 2023 May 24.
Article in English | MEDLINE | ID: mdl-37184884

ABSTRACT

Due to its high coding density and longevity, DNA is a compelling data storage alternative. However, current DNA data storage systems rely on the de novo synthesis of enormous DNA molecules, resulting in low data editability, high synthesis costs, and restrictions on further applications. Here, we demonstrate the programmable assembly of reusable DNA blocks for versatile data storage using the ancient movable type printing principle. Digital data are first encoded into nucleotide sequences in DNA hairpins, which are then synthesized and immobilized on solid beads as modular DNA blocks. Using DNA polymerase-catalyzed primer exchange reaction, data can be continuously replicated from hairpins on DNA blocks and attached to a primer in tandem to produce new information. The assembly of DNA blocks is highly programmable, producing various data by reusing a finite number of DNA blocks and reducing synthesis costs (∼1718 versus 3000 to 30,000 US$ per megabyte using conventional methods). We demonstrate the flexible assembly of texts, images, and random numbers using DNA blocks and the integration with DNA logic circuits to manipulate data synthesis. This work suggests a flexible paradigm by recombining already synthesized DNA to build cost-effective and intelligent DNA data storage systems.


Subject(s)
DNA , Information Storage and Retrieval , DNA/genetics , DNA Primers , Printing , Printing, Three-Dimensional
19.
Front Genet ; 14: 1158337, 2023.
Article in English | MEDLINE | ID: mdl-37021008

ABSTRACT

DNA is a practical storage medium with high density, durability, and capacity to accommodate exponentially growing data volumes. A DNA sequence structure is a biocomputing problem that requires satisfying bioconstraints to design robust sequences. Existing evolutionary approaches to DNA sequences result in errors during the encoding process that reduces the lower bounds of DNA coding sets used for molecular hybridization. Additionally, the disordered DNA strand forms a secondary structure, which is susceptible to errors during decoding. This paper proposes a computational evolutionary approach based on a synergistic moth-flame optimizer by Levy flight and opposition-based learning mutation strategies to optimize these problems by constructing reverse-complement constraints. The MFOS aims to attain optimal global solutions with robust convergence and balanced search capabilities to improve DNA code lower bounds and coding rates for DNA storage. The ability of the MFOS to construct DNA coding sets is demonstrated through various experiments that use 19 state-of-the-art functions. Compared with the existing studies, the proposed approach with three different bioconstraints substantially improves the lower bounds of the DNA codes by 12-28% and significantly reduces errors.

20.
Adv Sci (Weinh) ; 10(10): e2206201, 2023 04.
Article in English | MEDLINE | ID: mdl-36737843

ABSTRACT

DNA has been pursued as a novel biomaterial for digital data storage. While large-scale data storage and random access have been achieved in DNA oligonucleotide pools, repeated data accessing requires constant data replenishment, and these implementations are confined in professional facilities. Here, a mobile data storage system in the genome of the extremophile Halomonas bluephagenesis, which enables dual-mode storage, dynamic data maintenance, rapid readout, and robust recovery. The system relies on two key components: A versatile genetic toolbox for the integration of 10-100 kb scale synthetic DNA into H. bluephagenesis genome and an efficient error correction coding scheme targeting noisy nanopore sequencing reads. The storage and repeated retrieval of 5 KB data under non-laboratory conditions are demonstrated. The work highlights the potential of DNA data storage in domestic and field scenarios, and expands its application domain from archival data to frequently accessed data.


Subject(s)
Extremophiles , Sequence Analysis, DNA , Information Storage and Retrieval , DNA/genetics , Genomics
SELECTION OF CITATIONS
SEARCH DETAIL