Biology is going large. With the advent of next generation (and third generation) genome sequencing and along with it an explosion in the amount of biological data being produced daily, scientists are now playing with terabytes or even petabytes (lots and lots of zeros) of data at a time. It would take infinite infinities for a single person to analyse this digital abundance.
Even with the biggest and best supercomputers in the world, there isn’t enough time to analyse all of this data - and anyway, just what are we looking for and why?
This is where bioinformaticians come in - and we have loads at EI, so we asked them what bioinformatics actually means, what it is and what it isn’t...
Digital biology requires large HPC infrastructure for big data analysis
What is bioinformatics?
"The use of informatics to answer biological questions."
Alice Minotto kicked us off with the broadest of descriptions, but one which opens up the term nicely. Information is key.
Indeed, Jose De Vega added a little more…
"Bioinformatics is a research field that uses computational experiments to find answers to biological hypotheses and observations relying - typically - on a lot of data [“big data”, if you will] for enough power."
Another of our colleagues, who chose not to be named lest their comment be deemed utterly too controversial, told us that “big data” is misleading. What farmers, industry and breeders want is information.”
Maybe it’s hard to prize apart the difference between “data” and “information.” However, data can be described as “facts and statistics collected together for reference or analysis”, whereas information is more like “facts provided or learned about something or someone”.
So information means we’ve learned something. Meaning among the data.
Finding meaning in big data is one of the many roles of a bioinformatician. Using this meaningful information to answer biological questions, or even to ask new ones, from assembling genomes and understanding the role of genes and other genetic elements in developmental biology through to the interactions within and between cells, tissues, organisms and ecosystems, seems a pretty good description of what bioinformatics is.
Bioinformatics uses high performance computing to assemble genomic data and understand gene function
When you say bioinformatics do you include computational biology or do you separate this?
How do we even analyse all that data though?
It’s all well and good having all of this information, but hardware is just the start. To actually find the answers to those biological questions, we need software. Ben Ward chipped in with a more detailed description of bioinformatics, which nicely summarises the different approaches that are taken in institutes such as EI.
"The quickest answer is to quote the Wikipedia article first paragraph:"
Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. As an interdisciplinary field of science, bioinformatics combines biology, computer science, mathematics and statistics to analyze and interpret biological data.
"However! Not everyone who calls themselves a bioinformatician necessarily fits that description."
"Some of us, like those in the Clavijo group fit to the definition where it states 'that develops methods and software tools for understanding biological data' - we develop and test methods, release tools and try to push these forward. The aim being to increase our capacity to analyse ever more complex sets of data, and more complex processes."
"But there are other people who would self-describe as a bioinformatician, and what they do is more accurately described as applying said tools and methods to some dataset(s) of interest, in order to learn something about their biological system of interest. Maybe they write some novel scripts that chain tools together like some kind of glue code, but they are not necessarily developing new methods."
"Then there’s the people somewhere in between those two"
Bioinformatics combines biology, computer science, mathematics and statistics to analyse and interpret biological data
The latest buzz
We’ll leave the last word to Paul Bailey. Machine learning and artificial intelligence are, as Paul says, “the latest buzz.” But let’s face it, even the speediest of computers need a little help from squishy brains, and will long into our cyborg-laden future.
"Bioinformatics IS the analysis of biological data with computer software programs. The programs use one or more algorithms to analyse a particular data type in a procedural manner."
"Bioinformatics is NOT biological information – use Wikipedia for that, or now, much information can be found in the learned literature as open access articles - it’s FREE, as are online courses on how to analyse all that biological information and data - who needs university?!"
"And if you want to throw in artificial intelligence into the mix – the latest buzz – it might be able to generate all those hypotheses you will need to analyse all that data – just pop into https://iris.ai to see – BUT watch out. It doesn’t start analysing all the data for you. That IS bioinformatics."
Even the speediest of computers need a little help from squishy brains, and will long into our cyborg-laden future.