Article Technology

The machines are learning, and they’re coming for bioscience

From self driving cars to bioinformatics, we’re helping the machines work out better ways to manage our data and our science.

06 June 2017

The life sciences, especially genomics research, increasingly rely on computing expertise in order to find information in the midst of huge datasets. But, compared to people, computers aren’t great when it comes to spotting patterns.

Unless, of course, we teach them how to learn for themselves...

Machine learning is everywhere. From telling you the best way to get from home to work and working out how long it will take you - to eventually driving your car there for you - it is becoming more and more prevalent in our everyday lives.

So, too, this sort of advanced computing technology is becoming more widely used in the biological sciences; and there are plenty of reasons why.

 

Data, complexity and patterns.

IBM estimate that every single day we produce 2.5 quintillion bytes of data. To be clear, a quintillion has 18 zeros. That’s an incredible amount of data, far too much for people to process (and even computers).

That’s one of the many reasons that we now use computers to solve most of our problems for us; from the most simple things, such as pocket calculators, through to the assembly software that helps us piece together the complex puzzle of a wheat genome.

Since the first computers beat human chess players in the 1980s, nowadays it is impossible for a human alone to beat a computer. However, where humans excel is in lateral thinking.

Humans are able to instantly recall any bit of information from our entire experience of being alive and incorporate it into our current problem. Computers have great difficulty in doing this, being better at deep but very narrow problems (such as calculating the best chess move in ten moves time).

An interesting example of this is when you pitch a computer against another computer, plus a human sidekick. In these instances, the computer plus human combination proves stronger.

The problem for computers - especially in a game such as chess, which supposedly has more permutations in terms of possible eventualities than there are atoms in the known universe - is that, under the sheer weight of options, it’s hard for even them to decide on the best move.

That is, unless they can learn.

 

What is machine learning?

It’s likely that many of the most difficult problems in computing (vision, language, speech and translation to name a few) will be solved by trying to mimic the human capacity for lateral thinking and context swapping.

I asked EI’s Felix Shaw and Toni Etuk, both of whom have PhDs in machine learning, to help shed some light on how we make computers learn.

Felix explained, “machine learning tries to find patterns. At first, you give it a training set of data, get the program to find patterns, and then you test it.”

Thus, we get our best chess computers to play as many games of chess as possible. Each time, with increasing data and proof of winning options, they increase their skill. The computer can learn what moves are useless, reducing the bottleneck associated with the huge amount of possible permutations of a game of chess.

Importantly, however, as Felix and Toni point out, machine learning can be both supervised, as with the above example, or unsupervised.

Recently, a robot managed to write a book completely on its own (amusingly a story about a robot writing a book), which made it to the final stages of a Japanese literary competition.

Soon, we’re going to be sharing our roads with cars that drive themselves (while we already ride often enough on aeroplanes that fly themselves), so we can sit back and drink a cappuccino, read the amazing articles on the Earlham Institute website and not worry about the zebra crossing coming up.

In fact, the biggest difficulty with driverless cars may be convincing humans to relinquish control to a machine.

We can give the car all the information it needs so that it can learn to identify patterns, which is the training phase of machine learning. Essentially, the computer will observe us drive for thousands of hours and, in that time, learn what the driver does given the image of the road in front.

However, once the computer is in control, what it does with that neural network of billions of high resolution images, actions and eventualities, is down to the car itself.

 

Machine learning in the life sciences.

As we have already emphasised, we have so much data that we, mere humans, simply haven’t got the time to spot the patterns in it. This is especially true for the life sciences, an area increasingly ruled by truly enormous datasets.

And biology is a complex beast.

If the billions of letters of genetic code that make up genomes such as ours weren’t enough, within those letters are levels of complexity which we're still trying to make sense of.

We have an idea that there are these things called “genes” that seem to govern certain characteristics in the body. However, some of these genes control several things at the same time, depending on further levels of complexity. Things such as microRNA, which tells which cells to do what in certain regions of the body, or the proteins that help to package our DNA into the nucleus, which can fold DNA in different ways.

Essentially, the more we look, the more we find. Yet, it doesn’t always make sense, or follow a pattern that human brains can make particular sense of. Here especially is where machine learning can help us.

“Finally,” Toni said, “the question we’ve been waiting for you to ask us. How is machine learning useful in the life sciences?”

In many ways.

There are obvious uses, such as in highlighting mutations in a gene sequence, or highlighting networks between different genes, proteins, organisms and ecosystems. With the right algorithm, we can get computers to spot these efficiently and quickly.

 

Pulling the rabbit out of the hat.

However, it’s the unknown which is the most exciting.

If we can teach cars to drive themselves, imagine the possibilities with biological data. As Felix mentions, “there is so much data, we can use data driven unsupervised approaches to identify new features that we would not have picked ourselves.”

We can say to computers, “hey, here’s all of this data, there is something of interest here somewhere, what do you make of it?” We are already at the stage where we can ask computers to write their own programs, which might help in finding connections we didn’t ask them to find in the first place.

Toni put it well when he told me this, “You can give a robot rules and data, what it will do with this is unknown. What it will learn will be very complex. You can’t really predict what the machine will put out, but there might be interesting results that you would never have thought of.”