RAMPART is a configurable pipeline for de novo assembly of DNA sequence data. RAMPART is not a de novo assembler.
There are already many freely available assembly tools, however, few will produce a good assembly suitable for annotation and downstream analysis, first time around. The reason for this is that genome assembly of non-model organisms is often complex and involves tuning of parameters and potentially, pre and post processing of the assembly. There are many combinations of tools that could be tried and no clear way of knowing a priori, which will work best.
RAMPART makes use of tried and tested tools for read pre-processing, assembly and assembly improvement, and allows the user to configure these tools and specify how they should be executed in a single configuration file. RAMPART also provides options for comparing and analysing sequence data and assemblies.
This functionality means that RAMPART can be used for at least four different purposes:
- Analysing sequencing data and understanding novel genomes
- Comparing and testing different assemblers and related tools on known datasets
- An automated pipeline for de novo assembly projects
- Provides a single common interface for a number of different assembly tools
The intention is that RAMPART gives the user the possibility of producing a decent assembly that is suitable for distribution and downstream analysis. In practice, most assembly projects are complicated by the following variables:
- sequencing quality
- sequencing depth
- read length
- read insert size
- genome size
- genome ploidy
- genome repetitiveness
RAMPART enables a bioinformatician to get a reasonable assembly, given the constraints just mentioned, with minimal effort. In many cases, particularly for organisms with haploid genomes or relatively simple (i.e. not too heterozygous and not too repetitive) diploid genomes, where appropriate sequencing has been conducted, RAMPART can produce an assembly that is suitable for annotation and downstream analysis.
RAMPART is designed with High Performance Computing (HPC) resources in mind. Currently, LSF and PBS schedulers are supported and RAMPART can execute jobs in parallel over many nodes if requested. Having said this RAMPART can be told to run all parts of the pipeline in sequence on a regular server provided enough memory is available for the job in question.