Coordinating bioinformatics data, tools, standards and training in the UK, for Europe and beyond.
Biology is becoming a big data science. Life scientists are generating large data sets, for example of gene sequence, gene expression, genome modification, in experimental populations of plants or animals and from microbial communities. Making the best sense of these data sets is a key objective of the Earlham Institute.
The kinds of life science big data vary greatly in their type - ranging from multiple genome sequences to geo-located data from field trials. One of the key challenges in the next decade will be fusing this massive range of data to allow new knowledge to be inferred from disparate data sets. The challenges of biological big data go beyond the usual big data challenges (volume, variety, velocity) because of the wide range of tools and data sources that are available.
ELIXIR is working towards an “internet of biological data” that will allow data from these many different sources to be analysed using standard data descriptions and well-characterised tools. Ultimately this will give rise to an integrated platform upon which new rich analyses will be built.
ELIXIR operates across a wide range of life science data from human genomics to large-scale agricultural and environmental data. Its impact will be in the greater availability of this data and its provision in more standardised forms.
This will facilitate basic and applied academic research and industry innovation - from more efficient identification of genes and processes involved in rare diseases to improved crop traits - by linking disparate data from a wide range of sources.