Special Feature Science Technology
June 09, 2020

From cracking the Enigma to decoding living systems: Turing's gifts

Codebreaking, computing, technological development, mathematics, interdisciplinarity. All of these make Earlham Institute a powerful catalyst for life science research, and would perhaps not have been so swiftly brought to bear if it weren’t for Alan Turing, whose birthday, life and achievements we celebrate today.

Codebreaking, computing, technological development, mathematics, interdisciplinarity. All of these make Earlham Institute a powerful catalyst for life science research, and would perhaps not have been so swiftly brought to bear if it weren’t for Alan Turing, whose birthday, life and achievements we celebrate today.

 

A decade prior to the discovery of DNA’s double helix structure - a breakthrough that revealed the code of life on earth - scientists, linguists, and many others besides were hard at work cracking the military codes that would help turn the tide of the Second World War.

At Bletchley Park, the messages being intercepted by Alan Turing and colleagues as they modified a Polish machine that could decrypt the famous Enigma code were a matter of life and death. Enemy plans were divulged, torpedo attacks thwarted, and vital supplies continued to reach Britain.

75 years on, those code-breaking skills are being applied to life’s myriad forms to answer global challenges with no less importance, from health to biodiversity and food security. The code of life - what’s written in our DNA - holds the answers to how we can improve livelihoods worldwide.

From Bletchley Park to Earlham Institute, the pioneering work of Turing and others like him has contributed massively to the modern research amenities we employ today.

Bletchley Park in Buckinghamshire, the home of the WW2 Codebreakers.

Codebreaking

Enigma should have been unbreakable, given the sheer number of permutations enabled by regularly changing the settings.

But the linguists who cracked Enigma were able to spot certain characteristic phrases, or names, used at the start of a transmission. One was “Rosa”, the girlfriend of an Enigma operator. Other common phrases used were ‘nothing to report’ or ‘to the group’. From those small hints, and the knowledge that the Enigma machine would never code a letter as itself, it was possible to work out the rest of a message.

ATGC - the four bases that are the letters of DNA. Only four letters but they combine to produce a language that can describe much of what we see in the natural world. We can rightly analogise that those letters spell words and that those words combine to produce sentences, which we now know of as genes.

In DNA, we find the same complexity and sheer scale of numbers as in Enigma. We also see the same patterns of regularity. In DNA, our ‘Rosa’ is made of three bases with the specific order ATG. Each set of three bases is known as a codon, which codes for a specific amino acid. ATG tells us that we want the amino acid methionine, and this is the first word in almost every sentence of the DNA code. An uncanny parallel.

We use this code to understand life’s very essence, from its beginnings long ago in the primordial broth of a youthful Earth, tracing mutations, evolution and speciation throughout four billion or so years to give us the complex web of living wonders that we thrive amongst today.

We can compare the code of humans with that of algae and find similarities shared with lifeforms that look strange and yet are, in many ways, intimately related to us. We can find differences between more closely-related species that offer tantalising hints to how we can treat diseases. We can even begin to search the DNA of all of life to unlock secrets to making better medicines, or healthier and more sustainable food.

None of this, however, would be possible at scale without another gift of Turing and Bletchley Park.

 

In DNA, we find the same complexity and sheer scale of numbers as in Enigma. We also see the same patterns of regularity.

The birth of modern computing.

Talented as the Bletchley Park codebreakers were - narrowing down the potential options and garnering breakthroughs through shrewd problem solving and leaps of imagination - speed was of the essence, and decoding the many thousands of messages intercepted every day was impossible for humans to manage alone.

Turing was responsible for building a machine at Bletchley - the Bombe - which significantly increased the capacity to decode German messages (based on an earlier Polish Enigma-cracking strategy). Later, the world’s first programmable, digital computer - the Colossus - would join the ranks to help decrypt the Lorenz cipher, Enigma’s successor. After the war, while working in Manchester - where you can find a statue in his honour standing proudly in Sackville Gardens - Turing would contribute to the development of the Manchester Computers, the world’s first stored program computers.

Today, EI is home to one of the most powerful supercomputers of its type in Europe that can process many billions of letters of DNA code in a matter of hours, or days depending on how complex the task is. From those codebreaking beginnings, we’ve come an incredibly long way but the usage remains strikingly familiar.

Cracking the code of life is possible without a computer, but we’d need an impractical number of people working on it. People are fantastic - and far better than computers - at spotting patterns. Manual curation by scientists is still absolutely required to fine tune what we discover through computer algorithms, much like the linguistic insights required to crack the Enigma.

Yet, as with the code-breaking at Bletchley, there’s nothing like a computer to help speed the process along. It takes long enough to read through The Lord of the Rings

Mathematics

Alan Turing was primarily a mathematician and it was his statistical insights based on probability theory that helped really ramp up the Bletchley code-breaking operation.

Without statistics, we’d also be hard pressed to deliver the significant results of modern bioscience. One of the tenets of science these days is the importance of showing your results aren’t due to chance, or placebo. Statistics are also crucial for fine tuning and improving how we assemble genomes, as scientists in EI’s Clavijo Group are intimately aware.

Take the K-mer Analysis Tool, or KAT for short, a quality control tool which helps scientists understand the flaws in their DNA sequence data. Through applying mathematics, it’s possible to discern the quality of the information we glean from our DNA sequencing machines, and therefore help the computer algorithms down the pipeline to assemble something as close as possible to the likely truth. Good data in, good data out.

One particularly useful feature of the KAT tool strikes another parallel for the use of statistics in decoding messages.

Rather than try every single iteration of the wheat genome - an enormously complex amount of DNA totalling around 17 billion letters - KAT was able to narrow down how to assemble the genome in one run on the computer, rather than twenty or more, which saved a lot of time and money (and provided the most complete and highly accurate version at the time).

As with KAT and genomes, Turing’s Banburismus technique - now known as sequential analysis - did a similar job, ruling out certain parameters of the Enigma codes and freeing up lots of time for the computers to crunch the more likely combinations. Statistics helped speed up the Bletchley decrypting operation much like they speed up the Earlham Institute DNA decoding operation.

 

EI Open  164SML
EI Open  1130SML

Equality

Alan Turing was a great man whose contributions helped turn the tide of a gruelling and gruesome war. He should have been regarded as a hero in his lifetime.

Yet, because of his sexuality, Turing was forced to make a choice between jail or chemical castration. In 1954, he consumed a fatal dose of cyanide and was discovered by his housekeeper deceased at the age of 41.

Imagine how much more he could have advanced not only the field of computing, but his other work relating to biology and chemistry? Imagine the role he could have played in shaping the current research landscape of data-driven discovery. Imagine how much better off we will be once we have a truly equal society, where great minds of all persuasions are able to come together and tackle the grandest of global challenges.

We have come a long way since Turing’s day. At the Earlham Institute, we have built a culture that promotes equality and diversity, which is valued just as highly as the suite of powerful tools that we have, many of which owe their existence to Turing’s work in the mid-twentieth century.

With the capacity to sequence the DNA of every animal, plant, fungus and protist on E, it will be exciting to discover what secrets we can decrypt from the code of life for the good of societies now and far into the future.