Titel: Ultraplexing – a method for more efficient bacterial whole-genome sequencing
ID: 19/MSV
Art: Abstractautor
Session: Workshop 04
Molecular Epidemiology of Infectious Diseases (StAG RK, FG MS)

Referent: Sebastian Meyer (Düsseldorf)

Abstract - Text

Background: Accurate and comprehensive whole-genome sequences are a prerequisite to study bacterial phenotype/genotype associations. The current gold-standard is the combination of short- (e.g. Illumina technology) and long-read sequencing data (e.g. Oxford Nanopore technology, ONT). Generating long-read data with ONT for a high number of samples is expensive, since a maximum of 12 samples can be multiplexed by barcoding, which does not efficiently use flow-cell capacity.

Aim: We aimed to overcome the limits of ONT barcoding to generate cost efficient hybrid assemblies. We developed a new bioinformatic tool, called ultraplexer, to match non-barcoded long-reads to corresponding bar-coded short reads based on k-mer frequencies. This allows to more efficiently use flow-cell capacity.

Material and Methods: The performance of the ultraplexer, was evaluated by two simulated and one experimental data set, each containing long- and short-read sequencing data. Datasets were simulated for ten different bacterial species with relevance in clinical microbiology and for five sets of 10-50 S. aureus genomes (NCBI RefSeq Database). The experimental dataset contained 10 S. aureus isolates. Following read allocation by ultraplexing, a hybrid assembly was performed with Unicycler. For each experiment we assessed the accuracy at the level of correctly classified reads and at the level of assemblies (contigs; mean base-pair accuracy; coverage).

Results: Using the simulated dataset of ten different bacteria species, 100% of the reads were classified correctly. The assemblies had one contig, a base-pair accuracy of 13.58 single nucleotide polymorphisms (SNPs) per megabase (mb) and completely covered the original genome. When using the simulated set of ten S.aureus genomes, 96% of the reads (mean) were classified correctly. Each assembly had one contig, a base-pair accuracy of 12.85 SNPs per mb, and completely covered the original genome. In the datasets with up to 50 different S. aureus genomes the number of correct classified reads was about 40%, and yielded a single contig in 97%. Base-pair accuracy was 12.44 SNPs per mb with a coverage of nearly 100%. The assemblies of the experimental dataset had 1.8 contigs (mean) and the longest contig reached an average length of 2.84mb.

Conclusion: We developed a method (ultraplexing), that allows more efficient sequencing for hybrid assemblies and maximizes ONT flow-cell capacity. Thus sequencing is more speedy and less expensive.