Metadata matters: introducing a big data broker for life science

Metadata matters in the life sciences. COPO helps researchers track the context of their data so that it can be used many decades later.

With huge amounts of big data taking up petabyte upon petabyte of storage space in our supercomputers, there is great promise that we can use it to tackle many of our global challenges. However, unless that data is put in context, many years on it might well be practically useless.

Earlham Institute’s COPO is helping life scientists to understand that metadata matters, providing an easy-to-use platform to ensure that data is easily searchable, reusable, and properly attributed.

We sat down with Dr Rob Davey, Dr Felix Shaw and Dr Toni Etuk to discuss why metadata matters in a world of big data science.

Covering a range of topics, from the importance of putting data into context through to how that data might be used to tackle large projects such as the Darwin Tree of Life Project, we had a light-hearted but informative discussion about how better metadata management can help science and scientists.

As Dr Etuk says;

Open quote marks

We must solve the problem of metadata. Solving the problem of metadata will never be easy.

Closing quote marks

Why metadata matters

Metadata is anything about a bit of data that isn’t the data itself.

But what sort of data do we work with? We work with sequence data, what we’d call ‘raw reads’, but these are just files to the computer. These files are being stored on your computer without any context.

What context?

It could be sample information, for example, as the team explains in the podcast. “What organism is this? Where was the original material collected from? Was it the DNA from some particular organ, the leaf or root of a plant? What growth stage was it at? Was it stressed in any way? Did you give it some sort of treatment?"

Open quote marks

That’s all metadata. The important thing about metadata is that you can never have too much, it’s all useful information.

Closing quote marks

Introducing COPO

One of the things that COPO does is to collect up the metadata that is associated with a data file because that context is important. That context can make a data file relevant many years down the line.

It also, importantly, makes this very easy for the user via an intuitive wizard.

Furthermore, COPO provides a platform for users to get recognition for their work, which is especially important for researchers who do a lot of bioinformatics and coding, contributing hugely to research outcomes, but who perhaps don’t get the recognition they deserve.

There’s much more to the story, and a hugely interesting conversation to boot, so join the team as they explore why metadata matters and the importance of COPO.

You can access COPO here.

Article author

Peter Bickerton

Scientific Communications & Outreach Manager