Event Scientific training

Genome Annotation Workshop 2022

FULLY BOOKED - This 3-day course will help to provide scientists with an overview of eukaryotic genome annotation approaches, covering advances in Next Generation Sequencing (NGS) technologies, transcriptome assembly, best practice guidance for building gene models utilising short and long read sequencing data or cross species proteins, how to integrate and assess different gene models and create a publication/release ready gene set.

Start date: 17 May 2022

End date: 19 May 2022

Time: 08h30 - 15h00

Venue: Online (via Zoom)

Organiser: David Swarbreck

Enquiries:

training@earlham.ac.uk

Registration deadline: 17 April 2022

Cost: £150

About the event

Please Note: This event is now fully booked. Please contact training@earlham.ac.uk to be notified of future dates.

The value of a genome assembly depends heavily on the quality of its accompanying genome annotation.

Automated procedures for annotation are therefore required that are robust to different genome characteristics and can efficiently and accurately annotate not only high-quality reference genomes but also large, fragmented “draft” genomes. The benefits of high-quality annotated genomes are well known - more accurate and complete annotation enables better interpretation of genome information within and across species, aiding researchers in their work to understand biological processes.

Earlham Institute (EI) has developed a number of tools to support high quality annotation, including Mikado (PMID: 30052957), which is an open-source Python3 and Cython program that provides a framework for integrating transcripts from multiple sources into a consolidated set of gene annotations, and Portcullis (PMID: 30418570), which is a tool to aid accurate splice junction detection.

Both tools were utilised to aid the annotation of the wheat genome as part of the International wheat genome sequencing consortium (IWGSC, PMID: 30115783) and are being used to support EIs work in large collaborative genome sequencing projects such as the Wheat 10+ and Darwin Tree of Life. To see a full list of tools featured in the workshop, please see the Further Information tab.

In this 3-day virtual course, you will:

Get an overview of Next Generation Sequencing technologies relevant to genome annotation, gaining a deeper understanding of the benefits of each platform.
Understand what to look for in a sample that will pass quality control and that will likely succeed in producing viable sequencing data, including how to assess the quality of RNA-Seq data.
Learn about de novo and reference guided transcriptome assembly and steps for processing short and long read data.
Learn about alternative approaches for annotating protein coding genes in eukaryotic species utilizing transcriptome and homology data, via projection and evidence guided gene prediction. Discuss the challenges of annotation in different contexts.
Hands on experience of annotation tools including tools and pipelines developed at the Earlham Institute. You will build gene models from transcriptome data, explore the use of combiners/choosers for integrating alternative gene predictions (EVM / MINOS) and assess the accuracy of different annotation tools.

Who is this training for?

Advanced PhD students and post-doctoral researchers who are undertaking projects involving annotating a genome assembly or generating transcriptome assemblies, and looking to improve your awareness of different approaches and pipelines.

Prerequisites

You are expected to have experience with using the command line, and should be comfortable using the functions covered in the Software Carpentry lesson, The Unix Shell. We suggest you refresh your memory of these lessons if needed.

Organisers and trainers

Emily Angiolini (Organiser)

Roanne Ephithite (Organiser)

Programme

Day 1

09:30 - 10:15

Welcome, course overview and flash presentations from participants

10:15 - 11:00

Introduction to NGS technologies and practical considerations for isolating high quality RNA

11:00 - 11:15

Break

11:15 - 11:45

Data QC

11:45 - 12:30

Reference-Guided Transcriptome Assembly

12:30 - 13:30

Lunch Break

13:30 - 14:00

A practical guide to De Novo transciptome assembly

14:00 - 15:00

Hands-on: Introduction to working with transcriptome data

Day 2

09:30 - 10:15

Long read data processing

10:15 - 11:00

Demo: Reference-Guided Transcriptome Assembly

11:00 - 11:15

Break

11:15 - 12:30

Hands-on: Reference-Guided Transcriptome assembly

12:30 - 13:30

Lunch Break

13:30 - 14:00

Demo: High quality gene models with long-reads

14:00 - 15:00

Genome Annotation - Building high quality gene models

15:00 - 16:00

Optional Q&A Session

Day 3

09:30 - 10:30

Demo: Evidence-guided gene prediction using Augustus

10:30 - 11:00

Hands on: Using gene prediction combiners (EVM) Pt I

11:00 - 11:15

Break

11:15 - 11:45

Hands on: Using gene prediction combiners (EVM) Pt II

11:45 - 12:30

Demo: Selecting gene models with Minos

12:30 - 13:30

Lunch

13:30 - 14:00

Hands-on: Minos

14:00 - 14:45

Assessing the quality of a genome annotation

14:45 - 15:00

Wrap up and Feedback

Further information

Tools featured on the course

HISAT2 - http://daehwankimlab.github.io/hisat2/manual/

StringTie - https://ccb.jhu.edu/software/stringtie/

Scallop - https://github.com/Kingsford-Group/scallop

Minimap2 - https://github.com/lh3/minimap2

Portcullis - https://portcullis.readthedocs.io/en/latest/

2passtools - https://github.com/bartongroup/2passtools

Mikado - https://mikado.readthedocs.io/en/stable/

Spaln - https://github.com/ogotoh/spaln

Augustus - https://github.com/Gaius-Augustus/Augustus

EVidenceModeler - http://evidencemodeler.github.io/

Minos - https://github.com/EI-CoreBioinformatics/minos

REAT - https://github.com/EI-CoreBioinformatics/reat

IGV - http://software.broadinstitute.org/software/igv/home

Apollo - https://genomearchitect.readthedocs.io/en/latest/

BUSCO - https://busco.ezlab.org/

Terms and Conditions

Please carefully review our standard online event booking terms and conditions prior to registering for this event. Completing an online registration and associated payment process will mean that you are bound by these terms and conditions. Any supplemental terms or changes to these conditions on a per event basis will be included on this page. If you have any queries regarding our events or in relation to your booking, please contact us at training@earlham.ac.uk

Register today.

Registration deadline: 17 April 2022

Participation: Closed

Genome Annotation Workshop 2022

About the event

Who is this training for?

Prerequisites

Organisers and trainers

Programme

Day 1

Day 2

Day 3

Further information

Tools featured on the course

Register today.

Related reading.

New approaches for metagenome assembly with short reads

Earlham Institute helps to finally crack the wheat code

EI boost National Capability in Genomics and Single Cell Analysis with ultra-high throughput sequencing platform

Koala genome cracked Down Under to help save species

New DNA on the block: Genomics expert to lead our pioneering sequencing

How are we related? A Compara-bly easy workflow to find gene families

Data Carpentry Workshop 2020

De Novo Assembly 2020

Single-Cell RNAseq Training Course 2020

Norwich Single-Cell Symposium 2021

Metagenomic assembly algorithms