• Research

Darwin Tree of Life Project

Sequencing the genome of every eukaryote species in the British and Irish Isles

Project summary.

Led by: Neil Hall & Seanna McTaggart

Start date: 2019

Funding: Wellcome Trust

The world is faced with the daunting prospect of a sixth mass extinction event, as human activity pushes biodiversity into ever-thinning corridors. From the protists that produce most of the oxygen we breathe to the animals and plants that provide us with abundant food, building materials, and medicines, we are reliant on biodiversity to survive. It’s absolutely crucial that we learn to protect it before it’s too late.

The Darwin Tree of Life Project (DTOL) is part of the global Earth Biogenome Project which aims to document and understand all eukaryotic life - every animal, plant, fungus and protist - on Earth. As part of DTOL, the Earlham Institute is applying expertise in single-cell genomics, bioinformatics, and data management to develop robust pipelines for both sequencing novel organisms and enabling the DTOL community to share genomic data in a findable, accessible, interoperable and reproducible way.

This project will generate an invaluable open-source catalogue of data for research into how organisms develop and respond to pathogens, parasites, environmental change, and intra-species interactions. Revealing the evolutionary underpinnings of the human genome, our food sources and parasites will help to unearth processes that generate genomic diversity.

Above: Single-celled flagellate Eukaryote Euglena under the microscope


Earlham Institute is integral to the Darwin Tree of Life project, focusing on three main areas:

1. Collaborative Open Omics (COPO)

Experts in research software engineering and data science at the Earlham Institute are providing COPO to the Darwin Tree of Life community - a resource that helps to manage the crucial metadata associated with genome sequencing projects, ensuring that it is accessible and shareable long into the future. 

COPO is central to the DTOL project and is being further extended to the European Reference Genome Atlas project (ERGA), another initiative under the Earth Biogenome Project umbrella.

2. Sequencing protist genomes

Protists defy convention. They are eukaryotic microorganisms (containing a cell nucleus) which are not an animal, plant, or fungus. Yet almost all life depends on protists and their critical position at the bottom of our food chains.

Our understanding of biodiversity and evolution is highly biased based on species that can be collected in large quantities from the environment or microbes that can be readily cultured in the lab. However, the vast majority of microbial diversity is unculturable. Protists, in particular, are massively understudied, underappreciated, and underrepresented in genomic databases despite their importance.

Protists, along with bacteria, produce most of the oxygen on our planet through photosynthesis. They also play a central role in our environment's nutrient cycle - nature's recycling system - where they maintain the health of the soils that grow our food, supporting our forest land, farms, and aquatic life. Protists also include some of the most destructive pathogens of animals and plants, ranging from the lethal "brain-eating amoeba" to devastating plant pathogenic "oomycetes" that have the potential to wipe out crops and forests.

Above: Diatom Sampler Pack from Conneticut River. Credit: Dr Sally Warring

Researchers at the Earlham Institute are investigating these diverse and understudied organisms as part of the Darwin Tree of Life Project, working to establish a new way of sorting and documenting protists from environmental samples. 

Out of the 60,000 species that are being sequenced as part of DTOL, there are only around 500 described protist species on that list. Yet there are thousands of undescribed protist species in the UK, which means this project is both sequencing those described species as well as discovering new ones.

As part of DToL, we are developing a high-throughput wet-lab and bioinformatics pipeline to sequence, assemble, and annotate the genomes of single-celled protists. By combining the latest cutting edge techniques in single-cell genomics and transcriptomics with novel metagenomics methods, we skip the problem of culturing - allowing us to generate high-quality annotated genomes from single cells and discover new species in the process. This work is only possible due to the Institute’s unique and cutting-edge pipeline for the analysis of single cells.

3. Barcoding the Broads

A public engagement project that enables schools and nature groups to document and understand the biodiversity on their doorstep. Using a tried and tested DNA barcoding method developed at the Cold Spring Harbour DNA Learning Centre, New York, this project will help to foster an appreciation of the importance of genomics research in recording, understanding, protecting, and ultimately benefiting from biodiversity.


Collaborative OPen Omics (COPO)

COPO is a portal for plant scientists to describe, store and retrieve data more easily. Data description is critical to increase the value of the data itself, allowing scientists (and online search tools) to better understand its relevance.


The Wellcome Sanger Institute serves as the genomics hub in the UK, and alongside Earlham Institute, will collaborate with the Natural History Museum in London, Royal Botanic Gardens - Kew, University of Edinburgh, EMBL-EBI, University of Cambridge, Royal Botanic Gardens -  Edinburgh, Wytham Woods, University of Oxford, and the Marine Biological Association, in sample collection, DNA sequencing, assembling and annotating genomes and storing the data.

Find out more.

Impact statement.

Within the genomes of the animals, plants, fungi, and protists of the United Kingdom are untold and untapped solutions to many of our greatest global challenges. We need to protect, understand, and learn how to benefit from biodiversity before it’s wiped out forever. At the Earlham Institute, we are using our expertise in genomics, bioinformatics, and data science to help make the global moonshot of sequencing the DNA of every single eukaryote on Earth a reality.