How our tools can help you: Mikado
To produce our magnitude of scientific data, we also need a host of software tools to highlight and decipher analyses that pinpoint integral information to answer ambitious biological questions.
Have you often wanted to know the best approach before you start
The open-source bioinformatics tool can remove redundancies and select the best transcript models according to user-specific metrics while solving common genetic
What are Mikado's top three best features?
Lightweight – developed to be as modest in resources as possible. Mikado adapts to the scale of the dataset; an essential component, given that it was created for heavy-duty analysis on one of the most complex genomes, wheat.
Flexible – can integrate data from multiple sources. As long as the source is formatted in one of the three most common file formats used in genomics (GTF, GFF3
Why the name, 'Mikado'?
The name references the eponymous game, 'Mikado', where the goal is to find and pick up the highest-scoring sticks without touching or moving the others in the bunch. This is conceptually similar to the job done by my programme: given a bunch of transcripts (the 'sticks'), it assesses their quality (the coloured band on the sticks), selects the highest-scoring, and picks them up while discarding the rest (similar to a Mikado player who picks up the highest-scoring stick while leaving the others on the table).
Also, similar to the Mikado game, it is possible to use sticks you have already retrieved as levers to obtain others still on the table; in the
... the final Mikado dataset reduced the amount of transcript data by 30 times - from 12 million to approximately 350,000 transcripts.
How is Mikado invaluable to scientific research?
Mikado was employed during the wheat genome project to provide the transcriptomic backbone of the wheat genome annotation. It was essential in cutting through the noise of twelve million potential transcripts, before starting the ab initio genome annotation process.
This helped in two ways: firstly, the accuracy of the dataset was increased by removing many fragments, chimeras, and assorted
Secondly, the final Mikado dataset reduced the amount of transcript data by 30 times - from 12 million to approximately 350,000 transcripts. This massive reduction in the dataset size was instrumental in making subsequent steps computationally feasible.
Mikado was also used in the parallel sequencing and assembling effort coordinated by the International Wheat Genome Sequencing Consortium (IWGSC). In this project, we exploited the flexibility of Mikado to perform a different task - rather than for integrating multiple transcriptome assemblies, we used it for comparing and selecting proper gene models from different genome annotation pipelines.
For each gene locus, Mikado was able to compare the two different annotations and assess how well either fit with external validating data - such as the degree of support from Illumina or PacBio data, or the homology from known genes. The integrated annotation is the backbone of the current gene annotation for the IWGSC genome assembly.
What impact has it made to the bioinformatics community?
Apart from the above-mentioned wheat projects, Mikado has been used in the Fraxinus project, which was published in Nature, and we are currently supporting other researchers through our GitHub page.
For example, we helped Dr Torresen, based at the University of Oslo, in using Mikado on a fish species of interest, cod; as well as annotating the wild relative of potato Solanum
Mikado increases the accuracy of genome datasets by removing multiple fragments, chimeras, and assorted
How does Mikado differ from other similar bioinformatics software?
There are not many tools which perform like Mikado. The operations performed by Mikado are usually done later, in conjunction with ab initio predictors - whereas we act earlier, directly on the transcript assemblies.
However, a key feature of Mikado is that we don’t rely on transcriptomic data alone - which is the favoured approach by current tools - but rather try to integrate data from multiple sources: homology, junction analysis, ORF calling.
This additional information allows us to correctly define loci at little additional computational cost. It also allows to identify and remove blatant artefacts that are present in the results of many tools, such as chimeric transcripts that exhibit more than one ORF.
These kinds of artefacts tend to be common with many RNA-Seq centric tools which typically ignore the coding information of transcripts they assemble and can wreak havoc on downstream analyses.
How does Mikado work closely with the RNA-Seq bioinformatics tool Portcullis?
In all the projects cited above, Portcullis was used as a companion of Mikado. Our transcript assembly and evaluation pipeline, contained in Mikado - ‘Daijin’ - explicitly includes Portcullis.
How would you sum Mikado up in one sentence?
“Hunt for the transcripts you need and want, in a transparent and reproducible way.”