Beep. Beep. Beep. Hear that? Is it coming from the TV? From your phone? Or…
Growing Data: DNA Data Storage
By Mike Cobb, Director of Engineering
Advancements in data storage technology have not ended with flash. Have you heard of storing data in DNA?
Growing Amount of Data
As computers, smartphones and tablets become more and more widespread, the amount of data being produced is growing at an exponential and staggering rate. It seems that every two years, there’s a new report saying 90% of the world’s data was created in the last two years. Today, there are 3.7 billion people using the internet and we are creating 2.5 million terabytes of data every single day.
It goes without saying that as more data is created each year, more storage is needed to hold that data. Cloud storage companies, corporations and government entities are perpetually busy adding to their servers, adding new servers and otherwise increasing capacity for stored data. And so far, storage device manufacturers have kept up with demand.
But with the exponentially increasing need for storage devices and data storage space, will production continue to outrun demand? According to scientists at data storage company Catalog, it will not.
Hyunjun Park, CEO and co-founder of Catalog, told Digital Trends that he expects data produced to far outweigh worldwide storage capacity as early as 2025. He expects that, if we continue with current storage technology, by 2025 we will only be capable of storing less than 15% of the world’s data. Catalog’s answer? DNA.
Watch Daily Mail’s interview with Catalog co-founder Hyunjun Park.
What is DNA Storage and How Can It be Used?
As science fiction as this concept may sound, nobody is taking blood from a person or an animal and storing data in their DNA. Instead, synthetic DNA will be manufactured for the purpose of data storage.
There are currently several companies in business who produce synthetic DNA that never belonged to any living thing. Applications are far ranging and include everything from developing antiviral medications to making fabric for clothing. Data storage is just one more practical use.
If you’re thinking that in a few years you’ll be buying DNA pools instead of external hard drives to back up your home computer, you may have the wrong idea. The process of writing data to DNA is lengthy and expensive. The process of reading data from DNA is even more so. This form of data storage will likely be limited to huge archive data centers for the purpose of saving space and keeping data in cold storage for years at a time with no access.
Benefits of DNA Data Storage
So why DNA? Three reasons:
- Data density
- Timeless technology
Data storage devices that hold more and more data are already getting smaller and smaller using electronic components, such as monolithic flash. However, the growing amount of data stored is outpacing even the miniaturization of storage devices and data centers are still having to purchase hundreds of thousands of square feet – millions in some cases – to house it all.
Add to this the prediction referenced above that data produced is expected to outpace devices manufactured and you have an even bigger problem. We will eventually run out of both data storage devices and physical space to house them.
Catalog claims that they are able to store up to a terabyte of data on a single gram-sized DNA pellet. With this technology, a million square foot data center of today may be able to store all of its data in a single domestic-size refrigerator. Space issue solved…for now.
With the increasing rate that humans are producing and storing data, how long until we’re storing refrigerators of synthetic DNA data storage in space? But that’s a question for another time.
Currently, the longest lasting data storage technology in regular use is tape storage, which can last up to 30 years in ideal conditions. Optical storage has seen some incredible advancements in the last few years, with some disks theoretically able to last up to 10,000 years; however, optical storage is not widely used in large data centers because they take up so much more space to store the same amount of data as other data storage technologies.
DNA, on the other hand, has been proven to last thousands of years and, as mentioned above, is extremely compact. In fact, it is estimated that DNA storage can last up to 6.8 million years in optimal conditions.
When was the last time you used a 3.25” floppy? When I brought up floppy disks to a 23-year old recently, he had never heard of any such thing. Showing him pictures did not ring any bells either.
Although there are a few storage technologies that have stood the test of time, such as tape and HDD, most have come and gone. With archival storage, this is a constant fear: we might store irreplaceable data only to find that one day we cannot retrieve it because the storage technology that was used to preserve it has become obsolete.
With DNA, this will theoretically never be an issue. Even when one day data storage technologies advance further, there will always be a need to read DNA for other purposes, such as genetic testing or criminal investigations. Therefore, there will always be a way to read it for data storage purposes as well.
History of DNA Data Storage
The first snippet of data – a 23-character text-based message – was stored on synthetic DNA in 1999. Since that time, several universities and technology companies have been working together to improve the technology. As recently as 2016, Microsoft Research and the University of Washington (UW) released a study showing significant advancements: storage of 200MB of data on DNA in a space the size of a pencil tip.
So why didn’t it immediately go mainstream? Two reasons: cost and length of process.
The process of DNA data storage and retrieval, as used by Microsoft and UW, required three separate pieces of equipment: a DNA synthesizer to translate digital data into the components of DNA, storage for the DNA, and a DNA sequencer to translate the DNA components back into readable data. The entire process takes an insane amount of time and costs thousands of dollars per MB of data.
Boston-based Catalog is keeping details of its improvements on DNA data storage technology mostly a secret. However, they claim to have dramatically reduced length of process and cost by using a different approach to data-to-DNA translation. They intend to begin providing their improved technology to the entertainment industry and other big data industries by next year – 2019.
The average 2 terabyte hard drive currently weighs a few hundred grams. Catalog claims to be able to store up to 250 petabytes on a pellet of DNA weighing only one gram! One petabyte, by the way, is equal to 1,000 terabytes. So 100,000 2TB hard drives may one day soon be replaced by a single tiny pellet.
Data Loss in DNA Data Storage
A handful of data loss scenarios have already been identified for DNA data storage. Data loss can be caused by logical corruption, decay, oxidation, high temperature, truncation, errors in the conversion process and other factors.
It is possible that data may one day be recoverable in some of these data loss situations. At DriveSavers, we always enjoy a challenge and eagerly await the opportunity to explore data recovery techniques specific to this new technology.
Whatever new data storage innovation arises, it is always important to back up, back up, back up! Even with this amazing advancement, if the customer archives the data in only one place, then they are vulnerable to losing data. This is true of HDD, SSD, DNA and any other technology that may be developed. If you or your company one day choose to take advantage of DNA data storage, we recommend multiple backups or else DriveSavers will have to retool for recovering data from DNA strands!