Research

Felix Shaw

Research Software Engineer

Papatheodorou Group

Biography

Contact details:

+44 (0) 1603 450 090

felix.shaw@earlham.ac.uk

I'm a researcher working on the COPO (Collaborative Open Plant Omics) platform; a tool for creating, depositing and searching for bundles of research objects.

I hold a First Class Honours degree in Computer Science and gained my PhD from the University of East Anglia in 2015, studying facial animation, computer graphics, signal processing and machine learning.

My current interests are in the fields of data science, open source and how to apply machine learning techniques in these areas to improve the tools available to researchers. I have industry experience working as a web developer in several start-ups and three years experience teaching to undergraduate level.

Projects

Publications

ELIXIR-UK

Coordinating data, tools, training and standards across the UK

Neil Hall

Director

Activities

Group activities.

ELIXIR is a Europe wide data infrastructure for the life sciences. It aims to facilitate sharing of life science data by integrating data sources and tools under a unified set of standards and provide computing infrastructure and the training needed to make these resources accessible to bioinformaticians and life scientists.

EI is the Lead Institute for ELIXIR in the UK. As such it forms the bridge between the UK’s data infrastructure falling under ELIXIR’s platforms (Data, Tools, Compute, Standards and Training) and ELIXIR Hub.

ELIXIR operates a hub and nodes model with the Hub being located at Hinxton. EI hosts the office coordinating the UK Node, bringing UK based resources into the wider project and providing a nucleating centre for UK data, tools, standards and training provision.

EI’s ELIXIR activities are currently headed by Professor Neil Hall, who is supported by Ariadna Miquel Clopés (Research Administrator), Xenia Perez Sitja (Communications and Community Manager) and Nicola Soranzo (Technical Co-ordinator).

ELIXIR-UK is a distributed network co-led by Prof Neil Hall and Prof. Carole Goble at the University of Manchester. ELIXIR-UK is supported by three Research Councils: BBSRC, MRC and NERC. It was funded in 2014 to provide training coordination both across ELIXIR and in the UK.

As well as playing an active leading role in the UK’s ELIXIR contributions, the ELIXIR group at EI also contributes to the development of UK data infrastructure through participation in strategic initiatives and by making contributions in infrastructure development, for example by hosting major activity in BioJS development.

Projects

Team

Tools and data

Publications

Jon Wright

Bioinformatician

Anthony Hall Group

Biography

Personal Pronouns: He/Him

Contact details:

+44 (0) 1603 450 986

jon.wright@earlham.ac.uk

@jonwright99

ORCiD

As a Bioinformatician in Anthony Hall’s group, I’m particularly interested in genome assembly and am currently contributing to the Delivering Sustainable Wheat and Decoding Biodiversity Institute Strategic Programmes.

As part of these programmes, I am assembling the genomes of important UK wheat varieties and historical lines from the Watkins bread wheat landrace collection utilising the latest sequencing technologies available at the Earlham Institute such as PacBio, Nanopore and HiC. These high-quality genomes will allow us to identify structural rearrangements and introgressions that underlie important agronomic traits in these lines, and will also contribute to the wheat pangenome and pangenome-related tools for wheat.

I also collaborate with researchers at Rothamsted Research and the European Reference Genome Atlas (ERGA) to generate a genome assembly of Alopecurus aequalis, a problematic agricultural weed, to understand the genetic basis of weediness.

I joined the Earlham Institute in 2010 to generate assemblies of the 42 chromosome arms of hexaploid wheat for the chromosome-based draft sequence within the International Wheat Genome Sequencing Consortium. Since then, I have been involved in several collaborations to generate genome assemblies for important wheat lines from the UK and around the world using our w2rap pipeline.

My first degree was in Chemistry, after which I completed a MSc in Information Technology before working in the commercial software industry for several years.

I received my PhD in Computational Biology from the John Innes Centre in 2010 where I worked on the model grass Brachypodium distachyon and comparative genomics in the grass family.

Projects

Publications

Tilapia Genomic Resources

Developing genomic resources in wild and farmed tilapia for the benefit of the aquaculture industry globally and in Africa.

Summary

Project Summary.

Previous Funders:

UKRI Natural Environment Research Council (NERC)
UKRI Biotechnology and Biological Sciences Research Council (BBSRC) GCRF Data and Resources

Current Funders:

UKRI Biotechnology and Biological Sciences Research Council (BBSRC)
EU Development Fund

Most of the world's fisheries are overexploited and it seems inevitable that fish-farming will largely replace fishing, just as livestock farming has replaced hunting as the main source of food from land animals. Unfortunately, many farmed fish, such as salmon, are themselves predators and need to be fed on fish meal.

Global tilapia production hit 6 million tonnes for the first time in 2020 and is now a $7.6 billion dollar industry, highlighting the huge growth in the freshwater aquaculture industry. However, that growth is threatened by a number of factors - including disease and arising pathogens, unwanted hybridization with wild fish, and climate change.

Most tilapia production is based on a handful of strains, but there are more than 50 wild species throughout Africa which could harbour valuable genes for growth, disease resistance, temperature & salt tolerance etc. Many tilapia will hybridize readily, so that the natural genetic traits could easily be bred into farmed strains without the need for GM technology. However, this feature also renders them vulnerable to genetic swamping by stocking with alien farmed strains into natural water bodies, a practice now widespread in Africa..

In this project we are establishing the needed genomic resources to enable the study of genomic diversity of wild tilapia populations, including past and ongoing gene flow between populations and across species. These species present different degrees of environmental adaptation for salinity, temperature, oxygen content. Using comparative genomics and system biology approaches we aim to investigate, characterise, and understand the genomic bases associated with those traits of interest. In close collaboration with WorldFish, we are generating similar high quality genomic resources for elite farmed strains of Nile tilapia with the aim of enabling future selective breeding for traits such as pathogen and environmental resilience.

Image above courtesy of WorldFish.

Detail

Details.

Genetic diversity of wild populations in Tanzania (BBSRC, NERC)

In this project, we are surveying the natural tilapia diversity of Tanzania, a particular hotspot for wild tilapia strains. We applied coverage low genome sequencing of nearly 600 tilapia sampled across water bodies in Tanzania. We are investigating population structure including past and contemporary gene flow between populations and species taking into account water basins.

We have developed quick molecular diagnostic tests of hybridization for the benefit of fishery managers in other locations, and use these to calibrate simple visual methods to identify hybrids in the field.

The genome sequence information of all of these tilapia strains will be deposited in online public databases, where it will provide a major resource for future studies in tilapia strain development.

Genetic diversity of tilapia stocks in Lake Victoria Basin (EU Development Fund, WoldFish)

Supported by a EU Development Fund project (TrueFish), in close collaboration with WorldFish, the Lake Victoria Fisheries Organization (LVFO), the Kenya Marine and Fisheries Research Institute (KMFRI), the Uganda National Fisheries Resources Research Institute (NaFIRRI), and the Tanzania Fisheries Research Institute (TAFIRI), we are investigating the genetic diversity of tilapia stocks in the water bodies constituting the Lake Victoria Basin. We aim to apply low coverage genome sequencing to assess the quality of the stocks to enable information based policy for the preservation and exploitation of these populations. In close collaboration with our partners we will deliver active training to local staff and students

High quality genomic resources for strain improvement (BBSRC, WorldFish)

The success of tilapia aquaculture is associated with the development of elite strains, often selected for improved growth rate. Because of the origin of those strains current reference genomes are not adequate to enable genomic selection. In close collaboration with WorldFish we are producing high quality genome resources for two of those strains through the application of long read technologies to deliver chromosome scale genome references and associated high quality annotations.

Publications

Technology

Collaborators

Impact statement.

Tilapia are now the third most important freshwater species in aquaculture. Our work in close collaboration with National Fisheries organisations in Uganda, Kenya, and Tanzania aims at identifying the genetics status of existing stocks in water bodies of East Africa countries.

We will disseminate our research and deliver relevant bioinformatics training for local staff and research students through our partners in Uganda, Kenya, and Tanzania. Our work in close collaboration with WorldFish will lead to enhanced genomic resources for elite strains enabling further selective breeding.

Our activities will lead to improved research and monitoring and ultimately to enhancement of the aquaculture industry. We believe that this research will be of great interest to students and the general public in the UK and internationally and we will carry out a variety of dissemination activities through a range of media. Furthermore, the genome sequence information will be publicly available to future researchers, benefiting the wider academic community interested in research themes as diverse as selective breeding, fish health, and evolutionary biology.

People working on the project.

EI Lead

Wilfried Haerty

Group Leader

Graham Etherington

Senior computational biologist

Sam Speak

Postdoctoral Research Scientist

Felix Shaw

Biography

Projects

Data Science for Cellular Genomics

Collaborative OPen Omics (COPO)

Earth Biogenome Project

Darwin Tree of Life Project

Characterising gene function, biosynthetic pathways, and variation in agri/aquacultural traits

Publications

Related reading.

From idea to innovation: inspiring entrepreneurship at Earlham Institute

Pangenome annotation opens up a multiverse of genes

Every cell tells a story: single-cell analysis in forensic science

AI and life sciences: why FAIR data is essential

Analysing cell imaging data at scale with AI

LITE2 takes affordable genome sequencing to a new level

Mapping cell diversity with long-read sequencing and single-cell genomics

ELIXIR-UK

Neil Hall

Group activities.

Projects.

ELIXIR-UK

Team members.

Nicola Soranzo

Neil Hall

Xenia Perez Sitja

Ariadna Miquel Clopés

Data.

De novo assembly of Dioscorea alata, Guinea yam

Publications.

Single-cell sequencing reveals unexpected genetic diversity among Bodo spp. flagellates and their bacterial endosymbionts

Comparative single-cell genomics of two uncultivated Naegleria species harboring Legionella cobionts

Developing a crop- wild-reservoir pathogen system to understand pathogen evolution and emergence

Leveraging existing data to maximise quality and consistency across gene model annotations: a Fusarium pan-annotation

Evolutionary genomics reveals variation in structure and genetic content implicated in virulence and lifestyle in the genus Gaeumannomyces

Applying the FAIR Principles to computational workflows

Potential links between human bloodstream infection by Salmonella enterica serovar Typhimurium and international transmission to Colombia

Nuclear and Mitochondrial Genome Assemblies for the Endangered Wood-Decaying Fungus Somion occarium

Improved Extraction Methods to Isolate High Molecular Weight DNA From Magnaporthaceae and Other Grass Root Fungi for Long-Read Whole Genome Sequencing

Microbiology Galaxy Lab: The first community-driven gateway for reproducible and FAIR analysis of microbial data

Multiple independent genetic code reassignments of the UAG stop codon in phyllopharyngean ciliates

Multiple Independent Genetic Code Reassignments of the UAG Stop Codon in Phyllopharyngean Ciliates

The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update

How to increase the findability, visibility, and impact of Galaxy tools for your scientific community

A research data management (RDM) community for ELIXIR

Data files for Hill et al. (2024) Evolutionary genomics reveals variation in structure and genetic content implicated in virulence and lifestyle in the genus Gaeumannomyces.

Evolutionary genomics reveals variation in structure and genetic content implicated in virulence and lifestyle in the genus Gaeumannomyces

The genomic epidemiology of shigellosis in South Africa

Identification of a non-canonical ciliate nuclear genetic code where UAA and UAG code for different amino acids

The genomic epidemiology of shigellosis in South Africa

The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond

Galaxy Training: A powerful framework for teaching!

Identification of a Non-Canonical Ciliate Nuclear Genetic Code Where UAA and UAG Code for Different Amino Acids

Extracellular vesicles produced by the human gut commensal bacterium Bacteroides thetaiotaomicron elicit anti-inflammatory responses from innate immune cells

Euglena International Network (EIN): Driving euglenoid biotechnology for the benefit of a challenged world

LotuS2: an ultrafast and highly accurate tool for amplicon sequencing analysis

Single-cell genomics reveals population structures from in vitro evolutionary studies of Salmonella

Flaviviruses hijack the host microbiota to facilitate their transmission

Galaxy Training: A Powerful Framework for Teaching!

How to sequence 10,000 bacterial genomes and retain your sanity: an accessible, efficient and global approach

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update

Planemo: a command-line toolkit for developing, deploying, and executing scientific data analyses

Pathogenomic analyses of Shigella isolates inform factors limiting shigellosis prevention and control across LMICs

The Earth BioGenome Project 2020: Starting the clock

Standards recommendations for the Earth BioGenome Project

Why sequence all eukaryotes?

Expanding the Galaxy’s reference data

Mitochondrial RNA editing in Trypanoplasma borreli: New tools, new revelations

LotuS2: An ultrafast and highly accurate tool for amplicon sequencing analysis

A Bioinformatics Approach to Identifying Potential Biomarkers for Cryptosporidium parvum: A Coccidian Parasite Associated with Fetal Diarrhea

An accessible, efficient and global approach for the large-scale sequencing of bacterial genomes

Production, Isolation, and Characterization of Bioengineered Bacterial Extracellular Membrane Vesicles Derived from Bacteroides thetaiotaomicron and Their Use in Vaccine Development

Analysis of wild plant pathogen populations reveals a signal of adaptation in genes evolving for survival in agriculture in the beet rust pathogen ( Uromyces beticola )

COVID-19 adaptive evolution during the pandemic – Implications of new SARS-CoV-2 variants on public health policies

Informing shigellosis prevention and control through pathogen genomics

The COVID-19 Data Portal: accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing

Demonstrating public value to funders and other stakeholders—the journey of ELIXIR, a virtual and distributed research infrastructure for life science data

Subtelomeric assembly of a multi-gene pathway for antimicrobial defense compounds in cereals

Ecological niche adaptation of Salmonella Typhimurium U288 is associated with altered pathogenicity and reduced zoonotic potential

Phase 2 of the Norwich COVID-19 testing initiative: an evaluation