Earlham Institute helps to finally crack the wheat code

16 August 2018

As part of an international research effort led by the International Wheat Genome Sequencing Consortium (IWGSC) published today in Science, Earlham Institute (EI) plays a vital role in generating the most complete and accurately annotated reference genome available to date for bread wheat. The sequence of the 21 wheat chromosomes covers 94% of the genome with the location of nearly 108,000 genes identified. This resource is expected to help accelerate wheat research over the next decades.

The result of 13 years of collaborative international research, the reference genome of the bread wheat variety Chinese Spring is the highest quality wheat genome assembly produced to date. Sequencing the bread wheat genome was long considered an impossible task, due to its enormous size – five times larger than the human genome – and complexity – bread wheat has three sub-genomes and more than 85% of the genome is composed of repeated elements.

Genome annotation is the process of identifying functional DNA sequences – locating coding regions is a vital starting point for interpreting the genome and assigning functions to genes. Researchers in the consortium applied a variety of different tools and bioinformatic approaches to define the structure of genes, but in a large complicated genome such as wheat annotation is challenging and gene models can look very different.

Utilising their recently published tool Mikado, the Swarbreck Group (EI) integrated the independently generated gene models to deliver the most comprehensive annotation of the 21 wheat chromosomes to date.

Recipient of this year’s IWGSC Leadership Award, Group Leader at Earlham Institute Dr David Swarbreck, said: “My Group had a special role in assessing the accuracy of the predicted gene structures, we developed metrics to examine how well supported a gene model was by the different types of evidence (proteins, RNA-Seq etc). Using these metrics, we were able to identify potential errors in the annotation allowing us to cherry pick the best most accurate gene models from across a large pool of alternative annotations provided by other groups.”

Mikado was developed to support earlier wheat annotation projects at EI, enabling our researchers to leverage different methods for assembling the transcripts of genes. The tool was refined for the IWGSC project and used as a framework for integrating alternative gene models. As such, Mikado was an important part of ensuring a high quality final annotation.

Paving the way for the production of wheat varieties better adapted to climate challenges, with higher yields, enhanced nutritional quality and improved durability, the IWGSC research project involved more than 200 scientists from 73 research institutions in 20 countries.

With the reference genome sequence now completed, breeders have at their disposal new tools to address these challenges. They will be able to identify more rapidly genes and regulatory elements underlying complex agronomic traits such as yield, grain quality, resistance to fungal diseases, and tolerance to abiotic stress – and produce hardier wheat varieties.

It is predicted that the high-quality reference genome sequence will boost wheat’s improvement over the next decades, with the benefits similar to those observed with maize and rice genome references.

“The publication of the wheat reference genome is the culmination of the work of many individuals who came together under the banner of the IWGSC to do what was considered impossible,” explained Kellye Eversole, Executive Director of the IWGSC. “The method of producing the reference sequence and the principles and policies of the consortium provide a model for sequencing large, complex plant genomes and reaffirms the importance of international collaborations for advancing food security.”

“The tools and experience gained from assembling and annotating the Chinese Spring wheat cultivar can now be applied to our efforts to do the same for other wheat cultivars. It is only by sequencing multiple wheat genomes that we can identify the full complement of wheat genes and provide the best resource for wheat researchers and breeders to continue to improve wheat quality and production. Mikado and related tools will go on to help our future wheat project that uses a pan-genomic approach exploring multiple wheat genomes - focusing on ten different wheat varieties,” added Dr Swarbreck.

The Science article is entitled "Shifting the limits in wheat research and breeding using a fully annotated reference genome" and can be read here.

Notes to editors

A key crop for food security, wheat is the staple food of more than a third of the global human population and accounts for almost 20% of the total calories and protein consumed by humans worldwide, more than any other single food source. To meet future demands of a projected world population of 9.6 billion by 2050, wheat productivity needs to increase by 1.6 per cent each year. In order to preserve biodiversity, water, and nutrient resources, the main way this can be achieved is via crop and trait improvement.

The IWGSC sequence data was assembled and ordered along the 21 chromosomes using highly efficient algorithms, and genes were identified with dedicated software programmes. The study also presents the precise location of 107,891 genes and of more than 4 million molecular markers, as well as sequence information between the genes and markers containing the regulatory elements influencing the expression of genes.

All IWGSC reference sequence resources are publicly available at the IWGSC data repository at URGI-INRA Versailles and at other international scientific databases such as GrainGenes and Ensembl Plant.

For more information, please contact:

Hayley London

Marketing & Communications Officer, Earlham Institute (EI)

  • +44 (0)1603 450 107


The Earlham Institute (EI) is a world-leading research Institute focusing on the development of genomics and computational biology. EI is based within the Norwich Research Park and is one of eight institutes that receive strategic funding from Biotechnology and Biological Science Research Council (BBSRC) - £5.43m in 2017/18 - as well as support from other research funders. - as well as support from other research funders. EI operates a National Capability to promote the application of genomics and bioinformatics to advance bioscience research and innovation.

EI offers a state of the art DNA sequencing facility, unique by its operation of multiple complementary technologies for data generation. The Institute is a UK hub for innovative bioinformatics through research, analysis and interpretation of multiple, complex data sets. It hosts one of the largest computing hardware facilities dedicated to life science research in Europe. It is also actively involved in developing novel platforms to provide access to computational tools and processing capacity for multiple academic and industrial users and promoting applications of computational Bioscience. Additionally, the Institute offers a training programme through courses and workshops, and an outreach programme targeting key stakeholders, and wider public audiences through dialogue and science communication activities.


The IWGSC, with 2,400 members in 68 countries, is an international, collaborative consortium, established in 2005 by a group of wheat growers, plant scientists, and public and private breeders. The goal of the IWGSC is to make a high-quality genome sequence of bread wheat publicly available, in order to lay a foundation for basic research that will enable breeders to develop improved varieties. The IWGSC is a U.S. 501(c)(3) non-profit organization. www.wheatgenome.org


Isabelle Caugant

IWGSC Communications Director


+1 916 840 8801