• Event
  • Scientific training

Data exploration with Python 2022

This workshop is ideal for researchers and technical workers with a background in biology and a basic knowledge of Python, to work with large, complex datasets, mine them for biological insights, and create visualizations to display the results.

Start date:

21 November 2022

End date:

02 December 2022

Time:

09h00 - 12h30

Venue:

Online (via Zoom)

Registration deadline:

16 October 2022

Cost:

£450

About the event.

Much of the popularity of Python stems from the availability of high quality libraries of existing code that we can use for our own projects. Libraries ("packages", in Python terminology) are even more useful when they are designed to work together.

For scientific programming, we are lucky to have a collection of mature packages which work together to form a stack:

  • numpy for numerical processing
  • pandas for reading, cleaning and processing tabular data files
  • matplotlib as a low-level charting library
  • seaborn as a high-level charting library for rapid dataset exploration through visualization

In this course we will learn how to use these packages together to quickly explore large biological datasets, find meaningful patterns in the data, and present our results clearly. We will focus on the high level packages - pandas and seaborn - as this will allow us to do the most work with the smallest amount of code. By concentrating on just two packages for an entire course, we will be able to cover a large part of what these tools can do.

This event will be delivered virtually via Zoom & Slack, see more details on the further information tab.

Please see here for a detailed syllabus of the course.

Who is this event for

The course is intended for anyone interested in using Python for analysis and visualization of biological datasets. Some previous experience of Python is required, as we won't cover the absolute basics of the language, so you'll need to know the very basic syntax. The introduction to Python for Biologists course gives a suitable background.

If you would like to attend the course, but have no Python experience, get in touch with Martin Jones and he will be able to suggest resources to get up to speed. If you are unsure about the suitability of this course for your needs, questions can also be directed to Martin.

This course includes plenty of practical time, including opportunities to work on your own datasets, so it might be particularly suitable for people at the start of the data analysis stage of a project.

About the trainer

Martin started his programming career by learning Perl during the course of his PhD in evolutionary biology, and started teaching other people to program soon after. Since then he has taught introductory programming to hundreds of biologists, from undergraduates to PIs, and has maintained a philosophy that programming courses must be friendly, approachable, and practical.

In his academic career, Martin mixed research and teaching at the University of Edinburgh, culminating in a two year stint as Lecturer in Bioinformatics. He now runs programming courses for biological researchers as a full time freelancer.

"Fantastic course-- excellent organisation and course content. Martin is a great teacher. Learnt a lot, especially coming from a programming-naive background.”

"This course exceeded all my expectations. Martin was a great instructor, who clearly knows how to frame any programming topic into a biology question. Now I feel very confident to keep improving my Python skills (after a couple of failed attempts with other courses in the past).”

Introduction to Python for Biologists virtual course attendees, July 2020

This training forms part of our BBSRC National Capability in Advanced Training

Programme.

Please see here for a detailed syllabus of the course.

See the further information tab for full details on how this will be delivered virtually. 

Day 1 - 21 November 2022

Time

Topic

09:30 - 12:30

Environment, packages, data files and data model

Day 2 - 22 November 2022

Time

Topic

09:30 - 12:30

Series objects and thinking in columns

Day 3 - 23 November 2022

Time

Topic

09:30 - 12:30

Introducing seaborn

Day 4 - 24 November 2021

Time

Topic

09:30 - 12:30

Categorical axes with seaborn

Day 5 - 25 November 2022

Time

Topic

09:30 - 12:30

Grouping and categories with pandas

Day 6 - 28 November 2022

Time

Topic

09:30 - 12:30

Long vs. wide form data and heatmaps

Day 7 - 29 November 2022

Time

Topic

09:30 - 12:30

Complex data files with pandas

Day 8 - 30 November 2022

Time

Topic

09:30 - 12:30

High performance pandas

Day 9 - 01 December 2022

Time

Topic

09:30 - 12:30

Data workshop

Day 10 - 02 December 2022

Time

Topic

09:30 - 12:30

Data workshop

Further information.

This event will be delivered virtually, in the following format:

  • The programme will be delivered over ten days, from Monday 21 November – Friday 2 December 2022, weekdays only. 
  • On each day there will be 3.5 hours of live input (via Zoom) from the trainer (9:00-12:30 GMT, including breaks).
  • Training will consist of lectures, demonstrations and practical exercises, with the trainer on hand to assist and offer 1-1 support.
  • Slack will be used to share important updates and for asking questions.
  • Lectures/input will be recorded and made available to participants as soon as possible for anyone who needs to catch up.
  • You will need to have an account for Zoom and Slack. Please download the clients for these rather than using the browser version. 

Hardware

To follow along with the live programming examples you'll need to be able to have two windows open - one for the zoom video, and one for your own code. The best way to do this is to either have a single large monitor, or two small ones.

If you're using a laptop, an external monitor is a good idea. Working on just a single laptop screen is possible, but it will involve a lot of switching between windows. Remember that your zoom window will need to be big enough for you to see code, so a small window that works fine for chatting will probably not be big enough.

Software

To run the Python code and follow the interactive notebooks, you'll need to download and install Anaconda from this link: https://www.anaconda.com/products/individual

Make sure you get the right installer for your operating system (Windows, Mac or Linux) and make sure you get the Python 3.7 version. Please install this even if you already have a version of Python on your system, as we will need to be all running the same environment for the course to go smoothly. 

The Anaconda package takes a while to download and install, so please do this well in advance of the course and get in touch if you have problems - don't leave it until the last minute. We won't have time during the class to stop and troubleshoot problems with your installation, but we can help you get it set up in advance.

Terms and Conditions

Please carefully review our standard online event booking terms and conditions prior to registering for this event. Completing an online registration and associated payment process will mean that you are bound by these terms and conditions. Any supplemental terms or changes to these conditions on a per event basis will be included on this page. If you have any queries regarding our events or in relation to your booking, please contact us at training@earlham.ac.uk

  • Register today.

Registration deadline: 16 Oct 2022 - 23:45

Participation: First come, first served