Bacteria Box – Future of storage?

By James Whitcroft –

In recent years scientists have been exploring the process of storing data on, or in rather, living media. They have had success storing bytes of data into Escherichia coli (E. coli). Although this is not a brand-new idea, a new method has been developed that allows data to be written directly onto the genome of a bacteria cell, allowing it to be passed down to subsequent generations. That’s self-replicating data we’re talking about here. The innovators of this technique claim that bacteria cannot be hacked, have nearly limitless storage potential, and are all contained to an incredibly small area. Is this the future of data storage or just a sci-fi wet dream? That begs the question, how safe and/or private could your data really be when stored on self-replicating, microscopic media?


In 2007, a team at Keo University in Japan had successfully encoded data in the DNA of a common soil bacterium. By 2010 a team from Hong Kong’s Chinese University is presenting its gold medal winning method of storing compressed data onto E. coli at MIT’s iGEM competition. Now in 2016, a team of Harvard scientists have made large advances in the field, expanding data storage capacity potential 10x that of past methods.

How does it work?

The idea behind how data is stored in living bacteria is quite simple. There are certain bacteria that utilize a system known to geneticists as CRISPR/Cas system. This is simply a way for bacteria to protect themselves against viral infections. When bacteria carrying this CRISPR/Cas system encounter a virus, they will store a chunk of the virus’ DNA in their own DNA. The important piece to this is that, as the bacteria encounter viruses, they store these DNA chunks sequentially. This practice allows the bacteria to remember the viruses to evade future invasions. It is also important to note that this genetic memory is passed on to future generations. If it hasn’t yet clicked with you how this relates to storing data, hold tight.

Scientist, armed with this knowledge, were then able to introduce chunks of data into bacteria. They achieved this by disguising the data to look like viral DNA and introducing it to the bacteria, which then essentially eat the “viral DNA” data. This has the effect of storing the data in an array like fashion, making it easy to retrieve.

Some science behind it

CRISPR/Cas system is an immunological memory type in which foreign DNA sequences originating from viral infections are stored within genome-based arrays. These short sequences, when stored in the array, are known as spacers. These arrays both preserve the spacer sequence and record the order in which the sequences are acquired.


The team at Chinese University was applying a compression step prior to DNA sequence synthesis. The deflate algorithm was chosen to do the compression.

What about security?

A new term has been coined amid all of this, biocrytography. Scientists working on these projects praise them for being “unhackable” and boast their lack of vulnerability to electrical failures or data theft. Scientists use encoding mechanisms with built in checks to ensure that data is not corrupted by mutations in the bacteria cells.

Encryption and Decryption

As it turns out, you need to be an expert in biology to encrypt or decrypt biological media. To get a better understanding of these processes please refer to the Chinese University’s iGem site located here


The space/storage tradeoff is immense. With this new technique being practiced by the team at Harvard, roughly 100 bytes of data can be stored in a single bacterium. This may not seem like much but when you think about the size of a single bacteria, the math works out to somewhere around 450 2,000 gigabyte hard drives worth of data per gram of bacteria. The bacteria are self-replicating, allowing for automatic, seemingly limitless, backups.

This all sounds promising…

Up to this point it all sounds good, maybe too good. Let’s consider some flaws with storing our data on living media.

  1. Large chunks of data cannot be stored within a single piece of DNA.
    • Fragmentation has been the approach to solving this issue
      1. Fragments are composed of three sectors; header, message, and checksum
  2. It’s… living bacteria. Wont I get sick?
    • The bacteria being used is an altered form of E. coli that cannot exist outside of its synthetic medium
  3. Will I need a degree in biology to study computer science?
    • Yes! A major obstacle is that this is complicated stuff
    • Data retrieval currently requires an expert and a laboratory
  4. How do we ensure the bacteria eat up the message?
    • For one reason or another, not all bacteria in the colony will eat up all of the message
    • So far this is not an issue because rapid genotyping can be done on a few thousand bacteria to deduce the entirety of the message with certainty
  5. What if the bacterium die?
    • Bacterium are in colonies or clusters, giving you literally millions of backups
    • Data retrieval will still be possible even on dead bacteria

Final thoughts

As of now, biological media is not a feasible option. Too much expertise is required to simply store a few bytes of data, let alone encrypt or decrypt it. As far as security goes, if this product was ever widely available, it could only mean that the retrieval process is widely understood, which I believe will render the data just as vulnerable as any other data is today. The amount of time it takes to store and retrieve data, due to the specialty required, is another reason this idea is not practical. The cost associated with storing data would be incredible compared to that of current data storage. This is directly related, again, to the expertise required to sequence DNA,


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s