top of page

Pipeline to automate bacterial genome assembly

22 September 2021

A QCIF bioinformatician and University of Queensland researchers have developed a workflow to automate high-quality complete bacterial genome construction.


MicroPIPE is an easy-access, reproducible, end-to-end bacterial genome assembly pipeline using sequence data from Oxford Nanopore Technologies (ONT) in combination with Illumina.


ONT long-read sequencing has become a popular platform for microbial researchers worldwide due to its accessibility and affordability. MicroPIPE aims to help researchers by streamlining the more challenging aspects of analysing ONT data.


QCIF’s Valentine Murigneux co-led the microPIPE project alongside Dr Leah Roberts, a postdoctoral researcher in UQ’s Centre for Clinical Research (CCR) when the project began, now at the European Bioinformatics Institute (UK). The project also involved other collaborators from CCR and UQ’s School of Chemistry and Molecular Biosciences.


A paper about their work was published last month in the journal BMC Genomics.


Valentine said: “MicroPIPE incorporates the best performing bioinformatics tools at each step of the genome reconstruction.


“A lot of tools for genome assembly have been developed and are regularly updated, which makes it difficult for researchers to decide which ones to use. MicroPIPE reduces indecision during that process.”


Valentine worked on the project as part of her embedded position with Associate Professor Scott Beatson’s lab at SCMB.


“When we began there were no simple to use, end-to-end assembly software optimised for bacterial genome assembly,” said Scott.


Testing of microPIPE on publicly available data demonstrated that complete circularised chromosomes and plasmids reconstruction could be achieved without manual intervention.


“The improvement in ONT data quality over the last few years has been nothing short of remarkable,” said Scott. “Although we found the best assemblies were achieved by combining ONT and Illumina data, ONT data alone will be sufficient for high-quality complete genomes in the near future.”


Another important feature of the pipeline is its modularity: microPIPE was built in modules using Singularity container images and the bioinformatics workflow manager Nextflow, allowing changes and adjustments to be made in response to future tool development.


“In the paper, we show that we can rerun the pipeline on previous data sets after updating certain software, and that the results were improved,” said Valentine.

 

The pipeline tool is suitable for both GPU and CPU-enabled high-performance computers.


MicroPIPE was part of a larger Queensland Genomics project about whole genome sequencing to track, treat and prevent hospital acquired infections.


The microPIPE project was supported by funding from Queensland Genomics (formerly Queensland Genomics Health Alliance).


bottom of page