View on GitHub

Loman Lab Mock Community Experiments

Home

Background

Long sequencing reads are information-rich: aiding de novo assembly and reference mapping, and consequently have great potential for the study of microbial communities. However, the best approaches for analysis of long-read metagenomic data are unknown. Additionally, rigorous evaluation of bioinformatics tools is hindered by a lack of long-read data from validated samples with known composition.

Methods

We sequenced two commercially-available mock communities containing ten microbial species (ZymoBIOMICS Microbial Community Standards) with Oxford Nanopore GridION and PromethION. Both communities and the ten individual species isolates were also sequenced with Illumina technology.

Data

We generated 14 and 16 Gbp from two GridION flowcells and 150 and 153 Gbp from two PromethION flowcells for the evenly-distributed and log-distributed communities respectively. Read length N50 ranged between 5.3 Kbp and 5.4 Kbp over the four sequencing runs. Basecalls and corresponding signal data are made available (4.2 TB in total).

Results

Alignment to Illumina-sequenced isolates demonstrated the expected microbial species at anticipated abundances, with the limit of detection for the lowest abundance species below 50 cells (GridION). \textit{De novo} assembly of metagenomes recovered long contiguous sequences without the need for pre-processing techniques such as binning.
We present ultra-deep, long-read nanopore datasets from a well-defined mock community. These datasets will be useful for those developing bioinformatics methods for long-read metagenomics and for the validation and comparison of current laboratory and software pipelines.

Illumina and PacBio draft assemblies

Nicholls et al. McIntyre et al.
Organism Illumina FASTQ SPAdes Assembly PacBio RSII Reads PacBio Sequel Reads PacBio Assembly
Bacillus subtilis ERR2935851 Zymo-Isolates-SPAdes-Illumina.fasta (69 MB, e7b60972c37869e0e91513a06bdf9d33) SRR7498042 SRR7415629 bsubtilis_pb.fasta (GitHub)
Cryptococcus neoformans x Cryptococcus deneoformans ERR2935856 - - -
Enterococcus faecalis ERR2935850 SRR7415622 SRR7415630 efaecalis_pb.fasta (GitHub)
Escherichia coli ERR2935852 SRR7498041 - ecoli_pb.fasta (GitHub)
Lactobacillus fermentum ERR2935857 - - -
Listeria monocytogenes ERR2935854 SRR7415624 SRR7415635 lmonocytogenes_pb.fasta (GitHub)
Pseudomonas aeruginosa ERR2935853 SRR7498043 - paeruginosa_pb.fasta (GitHub)
Saccharomyces cerevisiae ERR2935855 SRR7498048 SRR7415638 scerevisiae_pb.fasta (GitHub)
Salmonella enterica ERR2935848 SRR7415626 SRR7415636 senterica_pb.fasta (GitHub)
Staphylococcus aureus ERR2935849 SRR7415627 SRR7415637 saureus_pb.fasta (GitHub)

Nanopore sequencing data sets

FAST5 Signal Accession FASTQ Accession Sequencer Community Standard (Lot) Time (h) Reads (M) N50 (Kbp) Quality (Median Q)
ERR2887847 ERR3152364 GridION Zymo CS Even (ZRC190633) 48 3.49 5.3 10.3
ERR2887850 ERR3152366 GridION Zymo CSII Log (ZRC190842) 48 3.67 5.4 9.8
ERR2887848 ERR3152365 PromethION Zymo CS Even (ZRC190633) 64 35.7 5.4 10.5
ERR2887849 Restarts
ERR2887851 ERR3152367 PromethION Zymo CSII Log (ZRC190842) 64 34.5 5.4 10.7
ERR2887852 Restarts

Further information

Please refer to Josh Quick's talk at Genome Science 2018.

License

Data are available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, i.e. you are free to use the data with attribution.

Acknowlegements

We are grateful to Radoslaw Poplawski (University of Birmingham) for assistance with CLIMB virtual machines and file systems to support this research. We thank Divya Mirrington (Oxford Nanopore Technologies) for advice on PromethION library preparation and sequencing. We thank Hannah McDonnell at Cambridge Biosciences for providing the ZymoBIOMICS Microbial Community Standards. We thank Jared Simpson (Ontario Institute for Cancer Research), Matt Loose (University of Nottingham) and John Tyson (University of British Columbia) for useful discussions and advice. We thank Christopher Mason and Alexa McIntyre (Cornell University) for making PacBio data available ahead of publication.