Project GENESYS: Genetic Search System
Energy efficient DNA sequence searches using a revolutionary optical processor.
Led by: Scientific Computing Group
Start date: 1 March 2015
End date: 30 April 2017
Duration: 24 months
Biomedical and bioscience researchers worldwide routinely query DNA sequences against large databases of already sequenced genomes. Comparing DNA from individuals of the same species can give many valuable insights e.g. cancer genes in humans or disease immunity in vital agriculture crops. But advances in sequencing devices have resulted in output doubling every 18 months or less - a single device can sequence up to 1 trillion bases per run. To analyse this in reasonable time requires large high-performance computers (HPCs) that can be prohibitively expensive due to initial capital and ongoing power and cooling costs. At the Earlham Institute these computers consume 130KW+ of power while PetaFLOP systems at the largest research labs can draw 1MW+ costing millions of dollars.
In this project, Earlham Institute's leading genomics research expertise and Optalysys's radical optical processing technology (potential ExaFLOP processing from a standard mains supply) join to build the basis of a new system for large, energy-efficient DNA sequence searching that can potentially reduce energy costs by over 95% while significantly reducing the environmental impact of running traditional HPC.
The technical aim of the project is to demonstrate that
The test scenario uses a "BLAST"-type process with 500k reads (input samples) from the Human Microbiome Project Mock Community, consisting of 301 base pairs and compares these against a sequence database of 20 types of gut bacteria (64m base pairs). The task currently runs on a single EI HPC node (16 cores and 128GB RAM) taking 28 hours and consuming 11.2kWh of power. Thousands of these BLAST-type tasks can be run at EI (or other genomics institutes and labs) in a given year. A sequence match in the database is determined by direct comparison (and some statistical scoring), allowing them to be associated with one or more locations in the bacteria types.
This process will be adapted to run on a custom-built version of the Optalysys optical processing technology, based
Environmental - HPC systems consume vast amounts of power and generate significant heat e.g. the world's fastest supercomputer the Tianhe-2 uses 24MW of power and costs $21m/year to run. A comparable optical "supercomputer" based on Optalysys technology will run from a standard mains supply at 4 orders of magnitude less, as it is based on low power liquid crystal and laser devices whilst
Sequence searches (such as BLAST) are used in numerous projects that provide an early response to plant and tree pathogens. We have been heavily involved in recent national efforts to respond quickly to the dieback threat that is severely damaging UK Ash trees. A successful project will result in an energy-efficient BLAST-like searching technology that opens up analysis to many more researchers who do not necessarily have access to large HPC resources or
Following a successful project and subsequent product development and commercial launch, Optalysys expects to create significant UK job opportunities by 2020.