PAG XXIX: Life sciences on a global platform

20 December 2021

Researchers from the Earlham Institute will be at the largest plant and animal genomics conference in the world, based in San Diego, US, from 8-12 January 2022.

The largest international plant and animal genome conference (PAG) is taking place in San Diego, US, from 8-12 January 2022. With over 3,000 attendees from more than 60 countries, the most influential genomics experts in the field of biotechnology will be coming together to share recent developments and future plans for plant and animal genome research.

In 2022, the Earlham Institute will be attending online and in person to share our latest scientific advancements in wheat pan-genomics for crop improvement; artificial intelligence and plant health; pathogen genomics in fight against potato blight; and conservation genomics protecting the andean bear.

Presenting the latest techniques and tools in genomics and bioinformatics research, PAG brings EI’s pioneering research into the spotlight - working with the scientific community and industry to drive forward life science applications on a global platform.

The wheat pan-transcriptome: Identifying consensus gene networks and developing de novo annotations

Rachel Rusholme-Pilcher, Postdoctoral Scientist, Anthony Hall Group

Date/Time: Saturday 8 January / 10.55

Session: Triticeae Genetics and Genomics, Session 1

Abstract: Wheat is the most widely cultivated crop in the world with over 215 million hectares grown annually. To meet the demands of a growing global population, breeders face the challenge of increasing wheat production by around 60% within the next 40 years.

The international wheat community, as part of the 10+ Wheat Genomes Project, has recently sequenced and assembled an additional 15 wheat cultivars to develop a pan-genome of wheat, quantify available diversity and identify genomic regions under selection. The pan-transcriptome initiative builds on this work, conducting de novo annotation and differential expression analysis on these diverse assemblies.

These de novo annotations, available soon on Ensembl plants, represent the most complete and uniform structural annotations to date for wheat. They have enabled the computation of a comprehensive set of orthogroups, which has facilitated the discovery of genes novel to specific cultivars, high-resolution tandem arrays and genes absent from the existing Chinese Spring reference. Detailed co-expression analysis of these varieties has revealed a core set of consensus network modules. These modules demonstrate patterns of expression that are conserved across both cultivars and tissue types. Analysis of the genes within these modules has revealed triads that are split between divergent expression modules and genes that show cultivar-specific expression profile

Through interaction with the wider wheat community, presence/absence variation, changes in gene models and patterns of expression will be associated with local adaptations and cultivar-specific traits giving valuable insight into the diversity of wheat beyond a single reference genome.

This joint talk, given by Dr Rachel Rusholme Pilcher (EI) and Dr Thomas Lux (PGSB) will describe the construction of pan-transcriptome consensus gene networks and the development of de novo annotations for nine wheat cultivars.

*Dr Ben White will be presenting a poster on this piece of work as well. See below for abstract.

ChronoGauge is an open-source AI-based approach to telling the circadian time of a plant by using an artificial neural network and a set of rhythmically-expressed biomarker genes.

Joshua Colmer, PhD Student, Anthony Hall Group

Date/Time: Saturday 8 January / 18:40

Session: Systems Biology and Machine Learning

Abstract: The circadian clock is an endogenous cycle entrained by external cues such as light and temperature that enables the plant to synchronise its behaviour and processes with the environment. To predict the endogenous time of the plant, we first identify genes that are rhythmically expressed and cluster them by their phases. Using a customised sequential feature selection method, we further reduce the number of rhythmic genes to use in our model by minimising the error between the actual time and predicted time.

With a feature set containing the best performing rhythmic genes, we train a circular variant of a linear regression model as well as an artificial neural network in Tensorflow with a custom loss function. Mapping our input (gene expression) to our output (time) requires a regression loss function that considers the circular nature of time and how it wraps around its modulus (24). For this, we created a loss function that quantifies the angular error between a prediction and label when transformed onto a unit circle.

We trained ChronoGauge on labelled, publicly available circadian Arabidopsis thaliana RNA-Seq datasets and tested it on additional circadian datasets as well as datasets that include clock mutants and temperature variations. By training an ensemble of models using multiple biomarker feature sets, we identify in which mutant and varied environmental datasets certain biomarker sets are effective and ineffective predictors – enabling biological inference regarding the effects of clock mutations and environmental changes on gene expression and regulation.

In the future, we plan to apply ChronoGauge to transcriptome data from the 1001 Genomes Project to investigate circadian clock variation in Arabidopsis thaliana accessions. In addition to this, we will apply ChronoGauge to different Brassica species as well as wheat cultivars – generating circadian clock biomarkers for each of these crops.

Alien introgressions underlie genetic associations in CIMMYT wheat germplasm

Ben Coombes, PhD Student, Anthony Hall Group

Date/Time: Saturday 8 January / 11.45

Session: Triticeae Genetics and Genomics, Session 1

Abstract: CIMMYT have developed a collection of germplasm to represent the diversity found in their 75000 spring wheat collection, and contains exotic derived lines, with introgression lines, Mexican landraces and primary synthetics in their pedigree history. Several genome-wide association studies have been conducted on this panel revealing marker trait associations (MTAs) for traits including heat tolerance, photosynthetic capacity, biomass accumulation, and radiation use efficiency.

Enrichment capture sequencing data combined with whole genome sequencing data from wheat wild relatives has enabled us to characterise exotic material across this panel, and search for introgressions responsible for MTAs, of which we found many examples. An Aegilops ventricosa introgression, also seen in the recently assembled pangenome cultivars, is responsible for an MTA with chlorophyll content, which may explain the positive impact on yield of this introgression and its increasing frequency in pre-breeding material.

We explore the structure and expression of genes within this segment in the pangenome cultivars to help us understand both heterogeneity of introgressions after exposure to selection, and how genes within this region are expressed.

In introgressions from the primary genepool, recombination occurs readily allowing us to narrow down the region responsible for the trait through in silico mapping. For example, we identified a 32Mbp Ae. tauschii introgression underlying a heat tolerance association. This was narrowed down to a 1.34Mbp region common in lines with the favourable haplotype. The corresponding region from the Ae. tauschii genome was extracted, revealing a set of gene candidates, including novel Ae. tauschii genes and a type B two component response regulator gene with a myb binding domain missing in the wheat orthologue. This gene family is involved in response to abiotic stress such as drought and heat. This approach demonstrates a novel approach to employ downstream of GWAS studies, emphasising in silico mapping to reduce the size of intervals and using non-reference genomes to explore gene candidates.

Whole genome sequencing uncovers the structural and transcriptomic landscape of hexaploid wheat/Ambylopyrum muticum introgression lines

Ben Coombes, PhD Student, Anthony Hall Group

Date/Time: Sunday 9 January / 8am - 10.00 (TBC)

Session: (TBC)

Abstract: Wheat is a globally vital crop, but its limited genetic variation creates a challenge for breeders aiming to maintain or accelerate agricultural improvements over time. Introducing novel genes and alleles from wheat’s wild relatives into the wheat breeding pool via introgression lines is an important component of overcoming this low variation. However, a lack of genomic resolution and understanding of genomic impact limits their optimal utilisation.

Through whole genome sequencing data of 17 hexaploid wheat/Ambylopyrum muticum introgression lines and their parent lines, we have precisely pinpointed the size and position of introgressed segments, including previously unreported segments. We report a genome assembly and annotation of Am. muticum that has facilitated the identification of Am. muticum resistance genes commonly introgressed in lines resistant to stripe rust.

Our analysis has identified an abundance of structural disruption and homoeologous pairing across the introgression lines, likely caused by the suppressed Ph1 locus. mRNAseq analysis of six of these introgression lines revealed that introgressed genes tend to be downregulated or silenced when compared to the wheat orthologue they replace. This shifts the expression balance of triads towards suppression of the introgressed region, with little compensation in the expression of the homoeologous copies.

This is the highest resolution analysis of a set of introgression lines and will contribute to untangling the genomic impact of introgression breeding. Our outlined approach will provide an affordable way for breeders to better characterise introgression lines and more effectively deploy wild relative variation.

Circadian Regulation of the Transcriptome in Polyploid Wheat

Hannah Rees, PhD Student, Anthony Hall Group

Date/Time: Sunday 9 January / 13:30

Session: Functional genomics

Abstract: The circadian clock is a finely balanced time-keeping mechanism that coordinates programmes of gene expression. In polyploids, this regulation must be coordinated over multiple subgenomes. Here, we generate and analyse a high-resolution time-course dataset to investigate the circadian balance between sets of three homoeologous genes (triads) from hexaploid bread wheat. We find a large proportion of circadian triads exhibit unbalanced rhythmic expression patterns, with no specific subgenome favoured.

In wheat, period lengths of rhythmic transcripts are found to be longer and have a higher level of variance than in other plant species. Biological processes under circadian control are largely conserved between wheat and Arabidopsis, however striking differences are seen in agriculturally critical processes such as starch metabolism. Together, this work highlights the ongoing selection for balance versus diversification in circadian homoeologs, and identifies clock-controlled pathways that might provide important targets for future wheat breeding.

The Wheat Pan-Transcriptome

Ben White (poster presentation), Postdoctoral Scientist, Anthony Hall Group

Date/Time: Monday 10 January / 15:00 - 16:30

Wheat is the most widely cultivated crop in the world with over 215 million hectares grown annually. However, to meet the demands of a growing global population, breeders face the challenge of increasing wheat production by around 60% within the next 40 years. The 10+ Wheat Genomes Project recently sequenced and assembled an additional 15 wheat cultivars to develop our understanding of genetic diversity and selection within the pan-genome of wheat. In this work, we provide a wheat pan-transcriptome with de novo annotation and differential expression analysis for 14 wheat cultivars over five different tissues and whole seedlings sampled at dusk/dawn. Analysis of these de novo annotations, available in Ensembl Plants release 52, facilitated the discovery of novel genes specific to particular cultivars, including genes absent from the existing Chinese Spring reference. Through our gene expression analysis, we improve the definition of the core and dispensable genomes, highlight tissue-specific genes, and reveal changes in bias of sub-genome homeolog expression between cultivars. Additionally, we performed a network level analysis which establishes a core set of consensus network modules conserved across all cultivars. Conversely, we also find divergence of sub-genome homeolog expression, and show cultivar-specific expression profiles. Finally, we develop a novel visualisation tool for sub-genome expression changes which highlights changes in balance of expression that could be used as a valuable predictive tool to breeders. In summary, this work provides both a valuable resource for the wider wheat community and reveals several important findings about the transcriptional diversity.

Notes to editors.

Notes to editors

For more information, please contact:

Hayley London - Media and Communications Officer, Earlham Institute

hayley.london@earlham.ac.uk
07760 438218

About Earlham Institute

The Earlham Institute (EI) is a world-leading research Institute focusing on the development of genomics and computational biology. EI is based within the Norwich Research Park and is one of eight institutes that receive strategic funding from Biotechnology and Biological Science Research Council (BBSRC) - £5.43m in 2017/18 - as well as support from other research funders. EI operates a National Capability to promote the application of genomics and bioinformatics to advance bioscience research and innovation.

EI offers a state of the art DNA sequencing facility, unique by its operation of multiple complementary technologies for data generation. The Institute is a UK hub for innovative bioinformatics through research, analysis and interpretation of multiple, complex data sets. It hosts one of the largest computing hardware facilities dedicated to life science research in Europe. It is also actively involved in developing novel platforms to provide access to computational tools and processing capacity for multiple academic and industrial users and promoting applications of computational Bioscience. Additionally, the Institute offers a training programme through courses and workshops, and an outreach programme targeting key stakeholders, and wider public audiences through dialogue and science communication activities.

www.earlham.ac.uk

@EarlhamInst