Brassica Information Portal
Towards integrating phenotypic and genotypic Brassica data.
Start date: March 2015
Overall grant value: £410,000
Value of activities related to BIP: £50,000
Funders: BBSRC, DEFRA
RIPR BBSRC Renewable Industrial Products from Rapeseed Programme
We are developing a resource that collects, integrates and analyses phenotypic data from pre-breeding experiments on Brassica crop species.
The Brassica genus includes the most types of vegetables that we eat, use as fuel and as livestock fodder: rapeseed, turnips, broccoli, cabbage, swede, are all from this genus. They are also very healthy as they are rich in vitamins, minerals, polyphenols.
Brassica are important to provide healthy and nutritious food to a growing population in times where climate change impacts food security. At the same time, farmers are pressed to use less chemicals, fertilizer and water. Therefore, it is important to further develop today’s varieties in response to these new challenges.
The Brassica Information Portal developed by us aims to facilitate Brassica crop breeding with a data-driven approach - the data collected from phenotyping experiments are made accessible and available in a standardised manner. This data can be integrated with genotypic information. In the future, associative analysis tools will enable on the fly analysis between traits hosted in the Portal and external genotypic information.
Brassica pre-breeding research is focused on findings that can be applied to generating new varieties for commercialisation. This includes search of traits that enhance yield, pest resilience or micro-nutrient content. Phenotyping of crops comes with several difficulties:
Crop phenotyping experiments are expensive as they need to be done on a large scale to achieve statistically significant results. Re-use of existing datasets is needed to optimise resources spent on field trials.
Phenotyping data formats are diverse and vary from image data to spreadsheet recorded single data points to time course image or measurement documentations.
Methods to measure the same trait are diverse and often not sufficiently documented, inhibiting comparisons between datasets that generated supposedly similar traits.
BIP aims to address these problems by providing adequate standardisation and organisation of the data. In result, reuse of high-throughput phenotyping data will lead to more insights beyond their intentional use and increasing the visibility of existing research.
Integrating phenotype-genotype data is another aim of the Brassica Information Portal. As the Portal provides the phenotypic data, it will draw sequence data from external sequence read archives to perform associative analysis (see image below). In the future be able to perform such analysis on the fly.
How the Brassica Information Portal works.
Image information: Associative analysis integrates genotypic and phenotypic information by associating measurements of phenotypic traits with a specific region in the genetic code of these very same plants. This can be a region associated with the trait but not necessarily causing the trait, or even a trait defining polymorphism within a specific gene. During selective breeding, it is assumed that the presence of this region in the genome ensures the presence of this trait in the plants.
The database is build on the FAIRDOM principles for good data management to make data: Findable, Accessible, Interoperable and Re-usable.
The BIP is a much needed storage facility for Phenotype data. It aims to make it a common practice to store Brassica trait data systematically in a similar way as has become routine for sequencing and other scientific data. It is an open-access open-source community resource that makes it possible for users to store and publish their data. Datasets can be assigned DOIs for convenient referencing in associated publications.
The Portal and its published content is accessible to everyone ( bip.earham.ac.uk) . Scientists registered with an ORCHiD can submit data to the database. Uploaded data can be kept private until the user decides to publish it.
With meaningful metadata annotations, it is possible to keep datasets alive after the original project is finished and researchers have left the institute. This is guaranteed by making certain fields during the submission process to BIP compulsory.
Use of universal and standard formats such as .csv and .json facilitate handling of the data downloaded from BIP. Application of ontologies (Trait-, Plant-, Crop- and Taxonomy ontology) supports consistent data handling across projects and in turn enables community wide reuse and global data interpretation beyond the single experiment.
Data re-use also leads to new knowledge. For example, available data from the BIP can serve 1) as evidence for new claims, 2) to identify knowledge gaps, 3) to re-cap on existing knowledge 4) to bridge between subfields or 5) to help with selecting research directions. Additionally, our database can serve as tool for collaboration across countries, fields or areas of expertise.
The core of this tool is a relational database storing and managing the data. It uses PostgreSQL RDBMS technology. Another element is the the web application acting as data browsing interface, facilitates wizard-based data submissions and gives the possibility to manage data through the API. The third element is the data indexing and full-text search engine based using Elasticsearch technology.
BIP works on a server hosted by EI. Currently it is using very little computational power but we envision future links with EI HPC to run more costly queries.
Rachel Wells, Judith Irwin
Feedback on usability from a user’s perspective, help in curation issues with legacy data.
Development of web-based analytics.
Dan Bolser, Guy Namarati
Future collaborateurs for cross-linking with EnsemblPlants resource.
Provision of new traits and data, give feedback on development, usability from a user’s perspective.
Provision of new traits and data.
Tomasz Gubała, Tomasz Szymczyszyn, Piotr Nowakowski
Ben Ward, Anil Thanki, Rob Davey , Matt Clark, Manuel Corpas, Sarah Ayling, Carlos Horro
Web-based analytics for the BIP is developed in collaboration with the Clark Group (EI), and Ian Bancroft (University of York). Anil Thanki (Davey Group, EI) and Dan Bolser (EBI) are helping to cross-link BIP with TGAC Browser and Ensembl Plants. The Davey Group will also help with integrating BIP with iPlant and COPO - projects supporting collaborative and open plant science. Finally, Manuel Corpas and his team will contribute to further development of phenotyping API and oversee implementation of bioinformatics standards developed within the frame of ELIXIR.
As a new community resource, the Brassica Information Portal will prove a valuable tool for efficient storage and use of standardised phenotypic data. It facilitates crop improvement by filling the gap for standardised trait data which makes the database content readily comparable and cross-linkable to that of other tools and resources. The Portal can be integrated in all aspects of Brassica research from generating a research question and collaborations from selecting or defining new traits to measure, through data handling and storage to associative analysis.