Machine learning tech that hunts for plant biomarkers awarded UKRI funding

26 July 2022
Field of wheat with yellow and brown rust

An Earlham Institute team that has developed machine learning technology to find genetic markers for important traits in plants has been awarded £25,000 of funding from UKRI. The funding will support market discovery and skills development for the project, helping to commercialise the BBSRC-funded research.

TraitSeq, developed by Josh Colmer during his PhD at the Earlham Institute, is an end-to-end laboratory and computational pipeline that uses cutting-edge machine learning (ML) methods to generate biomarkers using transcriptomic data. 

These biomarkers have the potential to predict useful physiological, biochemical, or metabolic traits and changes. 

The technology is the culmination of Colmer’s involvement in a number of projects during his PhD in the Anthony Hall Group at the Earlham Institute.

Colmer and the team behind TraitSeq have received funding from UKRI’s Innovation to Commercialisation of University Research (ICURe) pilot programme, which supports research teams to shorten the time it normally takes to move a promising idea in the lab to the point of commercialisation.

The team behind TraitSeq will now spend 12 weeks carrying out market discovery activities to build a clearer picture of how the technology could be applied and the areas of industry with the greatest potential to benefit from it.

Josh Colmer, TraitSeq entrepreneurial lead and PhD student based at the Earlham Institute, said: “TraitSeq was born out of a few projects where we realised how valuable it’d be to have a diagnostic tool for spotting biomarkers. 

“These could flag a range of important traits for plant breeders, from obvious benefits such as climate resilience or yield through to more subtle things like taste!”

TraitSeq involves both laboratory and computational approaches, which the Earlham Institute’s facilities are uniquely placed to support. 

“The lab component consists of a low-cost, high throughput RNA extraction and sequencing pipeline optimised for plant material,” says Colmer. “The computational aspect consists of bespoke ML algorithms and bioinformatics tools for detecting biomarkers and producing trait prediction models using the resulting high-dimensional RNA-Seq datasets.”

Dr Liliya Serazetdinova, Head of Business Development and Impact at the Earlham Institute, said: “What makes TraitSeq so innovative is the computational component. This is how we’re able to accurately and robustly predict measurable targets that relate to changes in phenotype, physiology or metabolism under varying environmental conditions.

“The Earlham Institute works to bridge the gap between biology and data science, and this innovation is a perfect example of how data-intensive bioscience could deliver significant impact.”

TraitSeq uses a bespoke set of gene selection algorithms and machine learning models developed by Colmer and colleagues at the Earlham Institute. These are able to identify a specified number of transcriptomic biomarkers for prediction (trait measurement) and inference (gene regulation understanding) purposes.

With support from ICURe and Earlham Enterprises Ltd, the commercial arm of the Earlham Institute, Colmer will now dedicate time to develop a range of new skills in how to translate research into a commercial venture, how to test it in the market, and how to pitch it to potential investors.

Professor Anthony Hall, Group Leader at the Earlham Institute and science lead on this project, said: “Biomarker-based diagnostics has significantly advanced in precision medicine, yet this approach represents an opportunity for the agriculture sector. 

“TraitSeq was initially designed to predict the presence of plant diseases and the circadian clock in plants, but we’ve shown it’s also applicable to human or even livestock trait prediction. The pipeline has already been used in a trial project to predict cancer subtypes in humans at unmatched levels of accuracy.”

Colmer is actively seeking input on the potential of TraitSeq from anyone who thinks the technology may have an application in their area of work. Please use the ‘contact us’ page if you’d like to find out more.

Notes to editors.

Josh Colmer’s PhD is supported by the UKRI Biotechnology and Biological Sciences Research Council (BBSRC) Norwich Research Park Biosciences Doctoral Training Partnership (NRPDTP), as a National Productivity Investment Fund CASE Award, in collaboration with Syngenta.


For more information, please contact:

Greg Bowker, Head of Communications, Earlham Institute (EI)

 +44 (0)1603 450 895 /

About the Earlham Institute

The Earlham Institute (EI) is a world-leading research institute focusing on the development of genomics and computational biology. EI is based within the Norwich Research Park and is one of eight institutes that receive strategic funding from Biotechnology and Biological Science Research Council (BBSRC) - £6.45M in 2015/2016 - as well as support from other research funders. EI operates a National Capability to promote the application of genomics and bioinformatics to advance bioscience research and innovation.

EI offers a state of the art DNA sequencing facility, unique by its operation of multiple complementary technologies for data generation. The Institute is a UK hub for innovative bioinformatics through research, analysis and interpretation of multiple, complex data sets. It hosts one of the largest computing hardware facilities dedicated to life science research in Europe. It is also actively involved in developing novel platforms to provide access to computational tools and processing capacity for multiple academic and industrial users and promoting applications of computational Bioscience. Additionally, the Institute offers a training programme through courses and workshops, and an outreach programme targeting key stakeholders, and wider public audiences through dialogue and science communication activities.