Home

Background

Long sequencing reads are information-rich: aiding de novo assembly and reference mapping, and consequently have great potential for the study of microbial communities. However, the best approaches for analysis of long-read metagenomic data are unknown. Additionally, rigorous evaluation of bioinformatics tools is hindered by a lack of long-read data from validated samples with known composition.

Methods

We sequenced two commercially-available mock communities containing ten microbial species (ZymoBIOMICS Microbial Community Standards) with Oxford Nanopore GridION and PromethION. Both communities and the ten individual species isolates were also sequenced with Illumina technology.

Data

We generated 14 and 16 Gbp from two GridION flowcells and 150 and 153 Gbp from two PromethION flowcells for the evenly-distributed and log-distributed communities respectively. Read length N50 ranged between 5.3 Kbp and 5.4 Kbp over the four sequencing runs. Basecalls and corresponding signal data are made available (4.2 TB in total).

Results

Alignment to Illumina-sequenced isolates demonstrated the expected microbial species at anticipated abundances, with the limit of detection for the lowest abundance species below 50 cells (GridION). \textit{De novo} assembly of metagenomes recovered long contiguous sequences without the need for pre-processing techniques such as binning.
We present ultra-deep, long-read nanopore datasets from a well-defined mock community. These datasets will be useful for those developing bioinformatics methods for long-read metagenomics and for the validation and comparison of current laboratory and software pipelines.

Illumina and PacBio draft assemblies

	Nicholls et al.		McIntyre et al.
Organism	Illumina FASTQ	SPAdes Assembly	PacBio RSII Reads	PacBio Sequel Reads	PacBio Assembly
Bacillus subtilis	ERR2935851	Zymo-Isolates-SPAdes-Illumina.fasta (69 MB, `e7b60972c37869e0e91513a06bdf9d33`)	SRR7498042	SRR7415629	bsubtilis_pb.fasta (GitHub)
Cryptococcus neoformans x Cryptococcus deneoformans	ERR2935856		-	-	-
Enterococcus faecalis	ERR2935850		SRR7415622	SRR7415630	efaecalis_pb.fasta (GitHub)
Escherichia coli	ERR2935852		SRR7498041	-	ecoli_pb.fasta (GitHub)
Lactobacillus fermentum	ERR2935857		-	-	-
Listeria monocytogenes	ERR2935854		SRR7415624	SRR7415635	lmonocytogenes_pb.fasta (GitHub)
Pseudomonas aeruginosa	ERR2935853		SRR7498043	-	paeruginosa_pb.fasta (GitHub)
Saccharomyces cerevisiae	ERR2935855		SRR7498048	SRR7415638	scerevisiae_pb.fasta (GitHub)
Salmonella enterica	ERR2935848		SRR7415626	SRR7415636	senterica_pb.fasta (GitHub)
Staphylococcus aureus	ERR2935849		SRR7415627	SRR7415637	saureus_pb.fasta (GitHub)

Nanopore sequencing data sets

FAST5 Signal Accession	FASTQ Accession	Sequencer	Community Standard (Lot)	Time (h)	Reads (M)	N50 (Kbp)	Quality (Median Q)
ERR2887847	ERR3152364	GridION	Zymo CS Even (ZRC190633)	48	3.49	5.3	10.3
ERR2887850	ERR3152366	GridION	Zymo CSII Log (ZRC190842)	48	3.67	5.4	9.8
ERR2887848	ERR3152365	PromethION	Zymo CS Even (ZRC190633)	64	35.7	5.4	10.5
ERR2887849	ERR3152365	PromethION	Zymo CS Even (ZRC190633)	Restarts	35.7	5.4	10.5
ERR2887851	ERR3152367	PromethION	Zymo CSII Log (ZRC190842)	64	34.5	5.4	10.7
ERR2887852	ERR3152367	PromethION	Zymo CSII Log (ZRC190842)	Restarts	34.5	5.4	10.7

Further information

Please refer to Josh Quick's talk at Genome Science 2018.

License

Data are available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, i.e. you are free to use the data with attribution.

Acknowlegements

We are grateful to Radoslaw Poplawski (University of Birmingham) for assistance with CLIMB virtual machines and file systems to support this research. We thank Divya Mirrington (Oxford Nanopore Technologies) for advice on PromethION library preparation and sequencing. We thank Hannah McDonnell at Cambridge Biosciences for providing the ZymoBIOMICS Microbial Community Standards. We thank Jared Simpson (Ontario Institute for Cancer Research), Matt Loose (University of Nottingham) and John Tyson (University of British Columbia) for useful discussions and advice. We thank Christopher Mason and Alexa McIntyre (Cornell University) for making PacBio data available ahead of publication.