Liberators of knowledge: COPO launches for Darwin Tree of Life
28 October 2020
A big data broker for life science, Collaborative OPen Omics (COPO) is now available for users on the ambitious Darwin Tree of Life Project, which aims to sequence the DNA of more than 60,000 species from across the British Isles.
Data is almost meaningless without metadata - the essential information about where, how and why the data has been collected. These bits of metadata are often in scientific papers, or lab notebooks, and are rarely shared in a way that’s reusable.
Methods for effectively recording and managing metadata have historically been lacking. That’s where COPO, a big data broker for life sciences, comes in. COPO, developed by Earlham Institute’s Davey Group, allows users to easily upload all of the metadata associated with an experiment, ensuring that important data is found along with crucial context. It’s also readily accessible for anyone who needs it.
Today, COPO launches in support of the Darwin Tree of Life Project (DToL), which aims to understand our biodiversity by sequencing the DNA of all the animals, plants, fungi and protists in the British Isles. It's open to a small set of users before training is rolled out to the wider community.
“When you do a sequencing experiment, the DNA comes from an organism which has to be collected either out in the wild, in a herbarium, from a seed bank, or even grown up in culture in a lab,” says Dr Rob Davey, Head of e-Infrastructure at Earlham Institute.
“There’s valuable information about the sample - where, when, how and why it was collected, for example, and perhaps the specific body part of an insect, or the salinity of the water where a single-celled organism was found. When you put all that together, there’s a lot of metadata to record.
“COPO is being used to track those collection events so that, when an organism’s DNA has been sequenced, the associated metadata is also available as this might be vital information for someone else’s experiments.”
DToL is a collaboration of 10 research institutes in the UK, who between them are working to record Britain’s biodiversity at an unprecedented scale. Such a collaboration requires an agreement on how data is stored and managed, so that it is easily accessible and searchable for anyone working on the project. COPO is the solution for this, allowing users to upload a single spreadsheet, automatically submitting all the data to the European Nucleotide Archive (ENA).
“COPO ensures that metadata is validated,” says EI Research Software Engineer Alice Minotto. “This could be metadata such as taxonomies. which can be tricky, as identifying organisms is not a fixed process. Names and species identification can change over time, and even within specific communities.
“Instead of having to check and submit this information manually, which would take a very long time for each row in the spreadsheet, COPO automates the process. This makes it far less time consuming, easier, and eliminates errors.”
Importantly, COPO is open to any life scientist who wants to annotate and submit data more quickly and easily.
“Anyone who wants to submit data to a repository that we support can use COPO,” says Minotto. “It makes the process of annotating your data easier.
“It can be expanded to other communities, similar to DToL. Since it’s open source, people could do that themselves.”
To find out more about COPO, you can visit the website.
To speak to the team, contact Dr Felix Shaw and Alice Minotto.
Notes to editors
The Darwin Tree of Life project is part of the Earth Biogenome Project:
The consortium of ten research institutes, museums and associated organisations ultimately aims to sequence the genetic code of 60,000 species that live in the British Isles:
- Earlham Institute (EI)
- University of Cambridge
- University of Edinburgh
- EMBL’s-European Bioinformatics Institute (EMBL-EBI)
- The Marine Biological Association (Plymouth)
- Natural History Museum
- Royal Botanic Gardens Kew
- Royal Botanic Garden Edinburgh
- University of Oxford
- Wellcome Sanger Institute
For further enquiries please contact Dr Peter Bickerton, Scientific Communications and Outreach Manager - email@example.com
The Earlham Institute (EI) is a world-leading research Institute focusing on the development of genomics and computational biology. EI is based within the Norwich Research Park and is one of eight institutes that receive strategic funding from Biotechnology and Biological Science Research Council (BBSRC) - £5.43m in 2017/18 - as well as support from other research funders. EI operates a National Capability to promote the application of genomics and bioinformatics to advance bioscience research and innovation.
EI offers a state of the art DNA sequencing facility, unique by its operation of multiple complementary technologies for data generation. The Institute is a UK hub for innovative bioinformatics through research, analysis and interpretation of multiple, complex data sets. It hosts one of the largest computing hardware facilities dedicated to life science research in Europe. It is also actively involved in developing novel platforms to provide access to computational tools and processing capacity for multiple academic and industrial users and promoting applications of computational Bioscience. Additionally, the Institute offers a training programme through courses and workshops, and an outreach programme targeting key stakeholders, and wider public audiences through dialogue and science communication activities.