Encoding Shakespeare into DNA

DNA, Shakespeare, Sonnets, binary, binary codeIt’s time to look at the language of life itself—DNA. As you might remember from 7th-grade science, DNA stands for deoxyribonucleic acid, the molecular structure that stores the genetic code for all life forms.

Scientists continue to wonder if this living blueprint is all that DNA can hold. Researcher Nick Goldman of the European Bioinformatics Institute (EBI) has recently stored all of Shakespeare’s 154 sonnets in DNA, and his synthetic double helix didn’t miss a line.

Because the alphabet of DNA is only four letters long (with A, T, G, C representing the nucleic bases adenine, thymine, guanine, and cytosine), English was not the best fit for the translation. Instead, researchers focused on binary code (the two-character mathematical system of 1s and 0s used by computers). With this technique Goldman’s team encoded text as well as audio files and images in the DNA macromolecule, including a 26-second audio clip of Dr. Martin Luther King’s “I have a dream” speech, and a photo of the EBI facility. (Get the full story here.)

Molecular geneticist George M. Church first attempted a translation of binary code into DNA at Harvard Medical School by encoding an HTML draft of his recent book Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves. But Church’s cipher was too simple (bases G and T represented 1s while A and C represented 0s), and it resulted in glitches when translating the DNA back into binary code.

Learning from Church’s mistake, Goldman developed a more complex code adapted to DNA’s natural tendency toward genetic variation. Goldman’s code makes every byte (or eight-character binary unit) represent a five-letter word out of As, Cs, Gs, and Ts. These combine to form strings of 117 letters. The DNA “sentences” overlap so that decoders can check against other strings if inconsistencies arises. Thus far the method has resulted in 100% accuracy.

So why bother with all this? What’s the advantage of converting data into DNA when it already exists in books and microchips? The answer is clear when you think about the massive amount of data our society is quickly amassing. If 46 microscopic chromosomes can carry all the information necessary to make a human, think of how many libraries could fit in that space. Goldman’s sequencing of Shakespeare’s Sonnets was microscopically miniscule, and his team believes that they could store all the data from the CERN Particle Physics Laboratory, nearly 90 petabytes (one quadrillion bytes), in just 41 grams of DNA.

But the best case for the DNA data storage is just how long it lasts. “The experiment was done 60,000 years ago,” Goldman told Nature, “when a mammoth died and lay there in the ice.” With this in mind, the extreme longevity of DNA has spectacular implications for the “apocalypse-proof” preservation of data in that our culture and literature could not only be archived, but fossilized. And though sequencing and decoding methods will undoubtedly change, the code in which all life on earth is written probably won’t go out of style in the way cassette tapes gave way to CDs, which in turn gave way to MP3s.

Most importantly, now thanks to DNA data storage, Shakespeare may actually be able to keep the promise he made to his mysterious mistress centuries ago in Sonnet 60:


And yet to times in hope, my verse shall stand
Praising thy worth, despite his cruel hand.


  1. RedLeafRenegade -  October 20, 2015 - 6:20 am

    Doing this could they make it so that you could play video games in you head by “downloading” the DNA it is encoded on into yourself? :) :o :|

  2. Ear4 -  March 28, 2013 - 12:08 am

    DNA…magic things..haha..

  3. gowshiga -  March 6, 2013 - 5:52 am

    great article …enchanting facts about DNA and its implications..thank you guys

  4. Nishtha -  February 27, 2013 - 1:21 am


  5. Chime Gochan -  February 25, 2013 - 10:25 pm

    Huh???? So, scientists are making libraries out of a human body????

  6. Noel Robles -  February 25, 2013 - 3:28 pm

    Hooray. Now what? I have spent too many minutes trying to comprehend what has been said here.. And all I get is that If mankind was able to make synthetic DNA we could store massive amounts of data in this microscopic apparatus. HOWEVER then what will we do? Analyze the double helix and get our information out of an organic construct? What about the synthetic factor like was said? Computerized DNA? Would that be like “qua-nary” code? Sounds fun but impractical. Maybe when we can construct microscopic technology that dwarfs modern day tech! I can’t wait! The sad part is that I will probably be aged and senile by the time it arrives. By the way, the first critical comment on this article kind of…Well…ran about wildly.. Just correct the erroneous number. YES WE KNOW they erred. We as developed humans have 23 homologous pairs of chromosomes. NOT 2. But no need to beat a dead horse!

  7. Talmid -  February 25, 2013 - 9:25 am

    @Brucker: Yep, that’s right on about the reading and writing process. This isn’t the kind of thing we’ll be seeing in our homes any time soon. Also, the technique is not exactly “apocalypse proof” because one needs the technology to read the DNA and interpret the binary encoding—not to mention, someone has to be around to read it! Nevertheless, it is clearly a superior data storage method and Goldman’s innovations are impressive.

    As for Christian Bök, I’d be surprised if Deinococcus radiodurans would maintain the integrity of his encoded poetry over successive generations.

  8. amptramp -  February 25, 2013 - 8:09 am

    It would be interesting to see what encoded DNA could do when transplanted into undifferentiated cells. For exqample, you could encode the works of Ernest Hemingway in DNA, add it to undifferentiated cells, allow the cells to grow and differentiate and you would get his works transformed into a living @sshole.

  9. Judy -  February 25, 2013 - 4:32 am

    Here’s a cool prank. Insert the DNA into a monkey, give it a typewriter, and watch all the probability theorists run for their calculators.

  10. Brucker -  February 24, 2013 - 4:51 pm

    To really make this hit home, there needs to be a basis for comparison. By itself, “nearly 90 petabytes (one quadrillion bytes), in just 41 grams of DNA” is nearly meaningless.

    Hopefully I’ve got this correct: Probably the most compact memory currently available on the market is a 64GB microSD, which weighs half a gram. If DNA can store nearly 90PB in 41 grams, that is about 1PB per half gram of DNA, or over a million times the storage capacity of of the microSD.

    Of course what is also not discussed is the nature of the coding/decoding process, which I assume is not as simple as plugging a strand of DNA into a USB port.

  11. Veronica -  February 23, 2013 - 10:31 pm

    I agree with FSB, stating that only 2 chromosomes carry all the genetic information necessary to make a human is a very false statement. It is indeed 46 chromosomes that carry all the instructions needed to create a human, which is pretty fundamental biology. Please edit that simply for the sake of not misinforming readers. Otherwise, GREAT article. I am in awe.

    Just please *twitches* fix the number of chromosomes. Please.

  12. alex -  February 23, 2013 - 3:24 pm

    Wow just wow

  13. The Warped Vinyl Junkie -  February 23, 2013 - 12:54 pm

    Thanks to FSB, above, for pointing out the inaccuracy in the article, which inaccuracy seemed a bit odd when I read it, even if I couldn’t put my finger on it immediately.

    And in response to “Anoymous” [sic], I don’t think the idea is for the newly created DNA to be incorporated into a human, but rather for the information simply to be stored at a truly molecular level. Note in this regard the following reference from the above-linked article at nature.com.

    “For example, CERN, the European particle-physics lab near Geneva, currently stores around 90 petabytes of data on some 100 tape drives. Goldman’s method could fit all of those data into 41 grams of DNA. This information should last for millennia under cold, dry and dark conditions….”

  14. KitKat -  February 23, 2013 - 4:47 am

    I am impressed by how science and technology has developed… I mean, look at us a hundred years ago… wow, just wow.

  15. Ghu -  February 23, 2013 - 4:03 am


  16. DJW -  February 23, 2013 - 3:55 am

    FSB on Feb. 22, 2013 comment would be wise to consider – more logical, sensible and accurately described. Thank you FSB !!

  17. Jordan -  February 22, 2013 - 11:47 pm

    Im gonna store134234GB on 1 DNA….. in the future or now.

  18. charles -  February 22, 2013 - 7:40 pm

    ok how about a sonnet in Greek? why a Sonnet (it is Italian isn’t it?)

  19. The Derp -  February 22, 2013 - 7:04 pm


  20. DNA | BLOGCHI@mayopia.com -  February 22, 2013 - 1:31 pm

    [...] ‘DNA’ — What was the question? — Encyclopedic regression? — The furthering of expression? — The Hot Word of the Why?  — Someone is trying to be impressive. — Domineeringly expressive. — What about the FUN of the Why? — Still all in all regressive. — But have you ever seen an Elephant Fly? –>>L..T.Rhyme This entry was posted in DICTCOMHOTWORD, L.T.Rhyme and tagged LT, LTRhyme, the HOT WORD on February 22, 2013 by LTRhyme. [...]

  21. Anoymous -  February 22, 2013 - 12:54 pm

    That is amazing! Only question now is: how do you get the augmented DNA to not be destroyed in a person?

  22. Bubba -  February 22, 2013 - 9:55 am

    Set the controls for the galactic core Commander Spock, my library books are overdue!

  23. Ole TBoy -  February 22, 2013 - 8:59 am

    If or when the apocalypse comes it better leave behind a few humans capable of decoding info from DNA or all that data still will be lost to time. Unless brainy aliens out of Sci-Fi drop by our 3rd rock with their own set of advanced skills.

  24. Neil -  February 22, 2013 - 8:43 am

    I think the editor has got smarter. I’m certainly enjoying the Hot Word a lot more than I used to! Language is so much more than people’s favourite word. Excellent article, extremely thought provoking. Thanks guys!

  25. FSB -  February 22, 2013 - 8:19 am

    There is a glaring and unacceptable mistake in this article regarding the number of chromosomes in each human cell. The sixth paragraph states that “…If two microscopic chromosomes can carry all the information necessary to make a human…”. In fact, each somatic human cell (all cells in a human body except reproductive cells, that is eggs and sperm, which are called “gametic”) has 23 pairs of chromosomes of different sizes. One member of each pair comes from the father; the other comes from the mother. Therefore, each human somatic cell has 46 (forty-six) chromosomes.

    This means that the statement should read “…If FORTY-SIX microscopic chromosomes IN EACH SOMATIC HUMAN CELL can carry all OF the information necessary to make a human…”. Please fix this error, which is unworthy of your website.

  26. ed -  February 22, 2013 - 7:57 am

    Great article. Given the example of tapes to CD’s to MP3′s, I wonder how soon the step beyond DNA will occur. And of course, the implications of that.


Leave A Comment

Your email address will not be published. Required fields are marked (required):

Related articles

Back to Top