Accessible single-cell RNA-sequencing bioinformatics training using Galaxy

Going from the wet lab to bioinformatics isn’t always easy, but it’s part and parcel of the job for many working on single cell genomics. Earlham Institute scientist Dr Graham Etherington thinks an accessible new training and analysis resource for RNA-sequencing can help both learners and trainers.

Dr Graham Etherington is an expert in conservation genomics. Among his various projects, he has contributed to decoding the koala genome and designating a new species of harrier. He’s also interested in mustelid species, looking at how ferrets and polecats are hybridising in the UK, as well as how we can use genomics to protect the threatened black-footed ferret from the perils of inbreeding as it comes back from the brink of extinction.

When he was asked to teach the bioinformatics portion of a single-cell RNA sequencing (scRNA-seq) course, he didn’t quite know where to begin. But, for the trainees and trainers alike, this would prove to be an advantage.

Dr Graham Etherington showcasing EI science at our Open Day in 2019

Starting from scratch

“At the beginning I didn’t really know where to start with scRNA-seq,” says Etherington. “There are a number of different courses available, but they all run off the command line. That really isn’t very easy for a wet lab scientist.”

The Earlham Institute offers the scRNA-seq training course as part of its BBSRC National Capability in Advanced Training. For many who attend the course, it’s a route into understanding how to apply bioinformatics to analyse data. However, if you’ve never coded in your life, jumping straight in at the deep end isn’t helpful for either you (the trainee) or the person trying to teach you.

Etherington was determined to make his portion of the course more accessible and less demanding on those less familiar with bioinformatics, empowering them to take that step towards being able to confidently analyse their data.

“I wanted to step away from the command line,” Etherington explains. “It takes a long time and it causes confusion, whereas everybody knows how to use a web browser.

“I thought ‘if we want people to understand what’s going on, it’s better in an environment they’re used to’. In that way, students could spend more time understanding what they’re doing and why, rather than worrying about how.”

That’s where Galaxy, and more of EI’s expertise, comes in.

Linda Slot, Postdoc, Leiden University Medical Center (LUMC)

Developing skills for the future

We are passionate about skills development and proud to offer high quality training in a variety of subjects for the life sciences. We hold the BBSRC National Capability in Advanced Training, working with EI experts and industry to deliver hands-on training courses and workshops in cutting edge genomics, bioinformatics and high-performance computing.

Try Galaxy, it’s accessible and reproducible

Galaxy is an open, web-based platform for accessible, reproducible, and transparent data-intensive biomedical research. It’s an invaluable platform for making bioinformatics analyses FAIR (findable, accessible, interoperable and reusable) - a cornerstone of what EI is all about.

Essentially, Galaxy works by taking a command line tool and wrapping it, so a user can perform tasks via a simple web browser rather than typing in commands using code - exactly what Etherington wanted to achieve with his training course.

Dr Nicola Soranzo runs a Galaxy instance at EI, and was therefore perfectly placed to help make scRNA-seq training FAIR, too. The result is a totally free, readily available and standalone Galaxy pipeline that can be used by anyone.

“It was a perfect fit, really,” says Etherington of Galaxy. “You can share everything you use, and it’s reproducible. A trainee can go back to their institute knowing that they’ll be using the same tools, and if they use the same data they should get the same result. It’s also incredibly easy to use, and therefore easy to teach.”

Going from the wet lab to bioinformatics isn’t always easy, but it’s part and parcel of the job for many working on single cell genomics

Mabel Vidal, PhD student, Universidad de Concepción (Chile)

Developing leading digital infrastructure

In addition to Galaxy, EI offers a number of tools and applications for bioinformatics. As part of our BBSRC National Capability in e-Infrastructure, our digital infrastructure is supporting data research advances across the life sciences.

Explore our tools and resources

The workflow - bioinformatics in a nutshell

The specific workflow Etherington has developed is built around an R package called scater, combined with some visualisation tools he built himself. The paradigm is visualise, filter, visualise. A lot is down to quality control.

“You get your data and you look at it,” explains Etherington. “Imagine you have a 96 well plate, 96 cells, 96 reads. Some of the cells might break, the nucleus might escape. Some might show signs of degradation, if they have really high mitochondrial DNA for example. So you need to filter that data.”

The first step in the workflow is therefore to visualise data using an array of different charts and graphs, where you can compare gene expression levels in each cell. At this stage, if you notice irregularities - perhaps cells showing little or no data - you can filter them out. First visualise, then filter out.

You can then set some parameters. In the case of mitochondrial DNA, you might set a threshold of >20%. If there’s too much, you can filter that out. You then visualise the data again. Maybe you’ve removed too much, perhaps not enough. There’s also an auto-filter, a PCA-based tool that looks for anomalies in the data.

“The point being that good data in = good information out,” Etherington stresses. “It’s best to optimise.”

The next set of tools then look at confounding effects, mainly biological factors that affect the appearance or quality of the data. That could also include metadata, such as where, when or how the sample was treated. It’s another vital piece of the jigsaw when analysing data.

“You have to consider everything,” says Etherington. “Is treatment A slightly different from treatment B, or has nurse A done something different than nurse B, or was the technology just working better on Monday? Confounding factors are paramount.”

John Thomas, Academic Clinical Fellow, Norfolk & Norwich University Hospital

How to access the training suite

To access the workflow, along with several others developed by the Galaxy community, it’s as simple as clicking on this link: singlecell.usegalaxy.eu

You can also read the paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7574357/

Accessible single-cell RNA-sequencing bioinformatics training using Galaxy

Starting from scratch

Developing skills for the future

Try Galaxy, it’s accessible and reproducible

Developing leading digital infrastructure

The workflow - bioinformatics in a nutshell

How to access the training suite

Related reading.

Why code? Bioinformatics for decoding living systems

Real time sequencing solutions: taking the lab to the sample and beyond

Single-cell genomics: what’s in a cell?

£500m investment to support next generation of researchers

EI Business Development and Impact team shortlisted for award

First EI LEGO sequencer, human DNA and endangered species: Earlham Institute does Norwich Science Festival

Earlham Institute at New Scientist Live 2016