• Research group

Clavijo Group

Tackling biological complexity through simple and significant properties, algorithms, and methods.

Group Leader - Bioinformatic Algorithms
Bernardo Clavijo

Group activities.

Assembly algorithms: development of assembly tools and pipelines with different combinations of data types, including short, long and linked reads, optical maps and Hi-C data. We are working on integration of genomic and genetic data from markers and genetic maps. We collaborate with the Di Palma Group and the G10K initiative on vertebrate genome assembly and analyses.

Wheat genome sequencing: we published our CS42 reference in 2017 and since then we have assembled another 5 wheat genomes that are publicly available under the Toronto agreement. We are working on improving our methods and providing wheat sequencing services to Industry.

OctoSeq: sequencing the Octopoid Strawberry. This project will provide a reference-quality haplotype-phased assembly for the complex octoploid strawberry genome, and a set of second-tier references to enable multi-genome analyses.

Single and Multi Genome Graph analyses: we are developing tools for the analysis of genome assembly graphs, enabling richer and more precise analysis than was previously possible with linear references. Moving beyond the reference for complex genomes.

Our impact.

We achieve impact through our open methods, our scientific results and our training programme. We produced short-read assembly methods that can now robustly generate genome sequences for complex crops. We have delivered a leap-forward assembly of the wheat genome, with unprecedented access to genic content and are working towards the adoption of multi-genome methods for crops with a full vision from genomics, through genetics to breeding and population analyses. With a regular training on genome assembly and a strong commitment to disseminate scientific knowledge, we are helping to shape the next generation of genomics experts.

Tools.

KAT

The K-mer Analysis Toolkit, a tool to analyse k-mer spectra of NGS datasets and assemblies. Highly recommended for fast and simple at-a-glance QC of genome assembly content inclusion from Illumina datasets.

http://github.com/TGAC/KAT

w2rap

The assembly pipeline that is routinely and robustly delivering crop genomes from cost-effective short-read data.

http://github.com/bioinfologics/w2rap

w2rap-contigger

The contigger part of our assembly pipeline, forked from DISCOVAR denovo and improved to better handle complex and repetitive genomes, with more flexible parameters and improved results representation including standard graph outputs.

http://github.com/bioinfologics/w2rap-contigger

Data.

Triticum aestivum CS42 assembly and annotation

The wheat reference assembly described on our Genome Research publication, produced by our w2rap open method

http://opendata.earlham.ac.uk/Triticum_aestivum/TGAC/v1/

5 more wheat genomes (pre-publication access)

W2rap based assemblies of 4 hexaploids and 1 tetraploid wheat genomes. Open access under Toronto Agreement conditions.

http://opendata.earlham.ac.uk/Triticum_aestivum/EI/v1.1/

http://opendata.earlham.ac.uk/Triticum_turgidum/EI/v1.1/