Storing Information Inside DNA

A new way to store all of our data; but make it biological

9 min readDec 11, 2020

Every few months I have to delete a bunch of my videos in order to have space on my phone to make more. I don’t know about you, but this is hard for me because I am so worried I am going to delete something important. As humans, we produce about 2.5 quintillion bytes of data EVERY day! 😶That’s a whole lot of data! But, are current methods of storage can’t keep up with all this data, which is causing us to get rid of some. This is unfortunate since every piece of data could be important to future generations. But what if we didn’t have to do this? Introducing… storing information in DNA!🎉

How do we store information in DNA?

On the basic level, storing information in DNA is actually quite simple. First, we gather the binary code from our desired information, 0s and 1s that the computer use, and change it into A, T, C, and G’s. Once we have the binary code sequenced into the nucleotides (the A, T, C, and G’s), then we sequence it into a strand of DNA. The DNA is then stored in a small vile of water until someone tries to access the information. When someone does this, they then take the DNA sequence and turn it back into 0s and 1s and the computer does the rest. See, I told you it was simple! But in reality, there is a lot more that goes into storing information in DNA. It all starts with ✨Nested Primer Molecular Memory!✨

https://www.sciencealert.com/microsoft-could-be-storing-data-on-dna-within-the-next-three-years

Nested Primer Molecular Memory

Nested Primer Molecular Memory, or NPMM, is the base in all storage of information in DNA. NPMM is the aqueous solution that the DNA is mixed in. Aqueous solution just means that the solvent in the reaction is water. So basically, NPMM is water. The solution in NPMM consists of two parts; the data and the file address.

So, how does the DNA work in NPMM? Well, there are three parts of the DNA sequence in NPMM; the data, the file address, and a Reblock. The data block is the site on the DNA database where the information we want to store in encoded in our nucleotides. Then, we have the address block. The address block is the name of the file, so we can extract the right information when we want to. The address block is split up into many different blocks, called sub-blocks. Finally, we have the Reblock. The Reblock is the site where the reverse primer in PCR is hybridized. This basically means the place where the primer in PCR and the primer on the DNA combine. The Reblock is also the only common sequence in all of the DNA.

Now, let's get a little bit more complicated. How do we specify and address in NPMM?

Select Re from P. Re stands for the reverse primer in the PCR and P stands for P= {Ai, Bj, Ck, Re|i,j,k∈{0,1,2}}, which is basically just a long equation that tells the address block what to do.
For i=1 to L (which is the address block) Select p from P, perform PCR using p and Re, sequentially. This is just the part that allows us to remove the target data that we want.

By following this process, we can remove the specific file we want from a big group of files.

Potential_for_enlarging_DNA_memory_the_validity_of_experimental_operations_of_scaled_up_nested_primer_molecular_memory

Why do we use NPMM? Well, for one, NPMM can recognize a HUGE file address by just looking at a few sub-blocks. This is great because it allows us to pick out the specific file we want quickly and efficiently. Also, NPMM allows for a high-level capacity of data storage, which is great for the applications we are looking at. NPPM has great potential to scale up and is easy to reproduce, which allows us to share our data with people across the world. As awesome as NPPM is, it is only where we store our data. There are still two more methods for retrieving and transcribing this data.

The Traditional Method: PCR

Ok, so now let’s talk about how we can insert and retrieve our data from the DNA. The most common method for that right now is PCR. What is PCR? Well, PCR stands for a polymerase chain reaction. The polymerase chain reaction is in vitro, which means in a test tube, the process for making big amounts of DNA or RNA like we need to do with our information. So basically, in order to store our information in DNA, we first turn the binary code into nucleotides, then use PCR to turn that sequence into DNA, and then use PCR again to retrieve it.

So, how does that really all work? Well, once you have turned your binary code into nucleotides, then we can create a specific DNA sequence that matches the nucleotides. To do this, we take a strand of DNA and insert it with the specific nucleotide bases we want to use. Then, we store our specific DNA information in a small vile of water. But what do we do when we want to retrieve it? Well, we put the DNA through PCR, which magnifies the part of the DNA with our information on it. It then copies this information and gives it to us in nucleotide bases. Once we know these bases, we can turn in back in binary code and read the information that we had stored. Not too bad right?

https://www.twistbioscience.com/blog/perspectives/advances-dna-data-storage-random-access-memory

But, currently, this system doesn’t work very well because PCR involves us having to raise and lower the temperature of our DNA, which causes a loss in capacity of the information and makes the system less efficient. So, what do we do about this?

Introducing ✨DORIS✨

What is DORIS? Well, DORIS is a technology that allows us to access our information much more efficiently than the traditional PCR method. DORIS is inspired by the natural ways that cells access information in their genome. This allows for DORIS to maximize its ability to reuse information, allow for maximum density of information, and be largely scaleable. It is composed of a fundamental unit of double-stranded DNA and a single-stranded overhang. The real to DORIS is the single-stranded overhang! The overhang acts as a file address for the information, but also provides a handle to separate the file.

So, what is so great about DORIS? Well DORIS has 4 main advantages.

Efficiently creating DNA strands in one “pot”
Increasing Density and Capacity Limits
Repeatable file access
The ability to change the file in DNA

So, let’s take a look at the first awesome thing, creating all the DNA strands in one “pot!” To understand how awesome this is, let’s take a look at the future🎇! In the future, in order to store all our data on DNA, we are going to be making DNA databases that are comprised of upwards of 10¹⁵ distinct strands, which is a whole lot more than we can do now. And this is where DORIS comes in! DORIS allows us to use a T7 RNA polymerase promoter that the sequence contained to bind the strand to a common primer and turn our data into the DNA strands that we needed. This resulted in our strand having a 20 nt (nucleotide) overhang. The overhang is used to easily determine which file we need, in a quicker, more efficient manner.

Another super awesome thing about DORIS is its ability to increase the density and capacity limits of the information. The really cool thing about DORIS is that it gives us the ability to room temperature separations of the double-stranded portions of the DNA! I know that might sound boring, but it dramatically increases the chance that we will be able to use DNA for storage. As DNA databases increase in size, the chance of similar files increases. This is where PCR falls behind, but DORIS strives. For PCR, similar files would mean that they would be unable to decode the similar information. But DORIS, by using computational code words in the data while encoding, would be able to easily separate the files, allowing for more accurate and more information to be decoded.

A third great thing about DORIS is the ability to repeatedly access the file. In order to do this, we look to nature for inspiration! As a cell repeatedly accesses genetic information in DNA, we can too! DORIS works by using a single permanent copy of genomic DNA and using the process of transcription to do this. This process turns the DNA into RNA and then returns the RNA to the database and then reverse transcribes it, in order to reaccess it while also keeping as much of the information as possible. In tests done, as much as 90% of the information remained after multiple times accessing the file! That is great! See, in PCR, a file can’t be accessed more than once, which would mean you could only see your data once, but with this system, you can see it multiple times with the same information.

And finally, the most wonderful awesome thing about DORIS, the ability to change the file in the DNA! Isn’t that crazy! This means that you would be able to delete, insert, and edit the file in the DNA, just like you would on a computer. Due to the overhang that DORIS provides, we are able to execute computations without taking all the data out of the DNA. In a lot of tests, this method was majorly successful, being able to execute the desired function 50% to 100% of the time! This is great because it allows us to deal with information the same way we would as if it was stored on an inorganic information system. It is a great step towards making a functional, efficient, and realistic database to store information through DNA!

Putting it all together

Ok, I know that was a TON of information! But, now you know the two different methods for information in DNA storage and what they are stored in. Let’s review. NPPM is the solution in which the DNA is stored. It allows us to easily and efficiently access the specific file necessary from just a few sub-blocks. Next, we talked about the two main methods of accessing our DNA. The first one PCR is the most common method, but it is not very scalable, and it requires a very limited capacity of information. Next, we talked about DORIS. DORIS does not use PCR and allows us to retrieve information at room temperature. DORIS is efficient, scalable, and allows for a high capacity of information. So now, we know that the combination of NPPM and DORIS allows for a realistic and efficient way to store information in DNA!

What’s next?

Well, despite the efficiency of DORIS, it is still limited, especially when we look at all the information we need to store. So our next steps are going to be looking at how we can 10X DORIS in order to achieve maximum storage potential. But, for right now we can work on turning our data into DNA and saving our information for future generations.

Bonus Content

Watch this super awesome video about how this company is using plants to store DNA!

Grow Your Own Cloud — The Data Garden — YouTube

Grow Your Own Cloud — Data Garden

Another super awesome video about a bunny that was 3D printed with its own instructions inside of it. The DNA of things.

Plastic Bunny 3D Printed From Its Own DNA — YouTube

And finally, this Netflix pre video to a series about “Biohackers”

Biohackers | First Original Series stored in DNA | Netflix — YouTube

If you’ve made it this far, thank you! I am a 15-year-old who is interested in regenerative medicine, biocomputing, and public health. If you want to see me continue to grow and 10X myself, sign up for my newsletter here!