Background
Long sequencing reads are information-rich: aiding de novo assembly and reference mapping, and consequently have great potential for the study of microbial communities. However, the best approaches for analysis of long-read metagenomic data are unknown. Additionally, rigorous evaluation of bioinformatics tools is hindered by a lack of long-read data from validated samples with known composition.Methods
We sequenced two commercially-available mock communities containing ten microbial species (ZymoBIOMICS Microbial Community Standards) with Oxford Nanopore GridION and PromethION. Both communities and the ten individual species isolates were also sequenced with Illumina technology.Data
We generated 14 and 16 Gbp from two GridION flowcells and 150 and 153 Gbp from two PromethION flowcells for the evenly-distributed and log-distributed communities respectively. Read length N50 ranged between 5.3 Kbp and 5.4 Kbp over the four sequencing runs. Basecalls and corresponding signal data are made available (4.2 TB in total).Results
Alignment to Illumina-sequenced isolates demonstrated the expected microbial species at anticipated abundances, with the limit of detection for the lowest abundance species below 50 cells (GridION). \textit{De novo} assembly of metagenomes recovered long contiguous sequences without the need for pre-processing techniques such as binning.We present ultra-deep, long-read nanopore datasets from a well-defined mock community. These datasets will be useful for those developing bioinformatics methods for long-read metagenomics and for the validation and comparison of current laboratory and software pipelines.
Illumina and PacBio draft assemblies
Nicholls et al. | McIntyre et al. | ||||
---|---|---|---|---|---|
Organism | Illumina FASTQ | SPAdes Assembly | PacBio RSII Reads | PacBio Sequel Reads | PacBio Assembly |
Bacillus subtilis | ERR2935851 | Zymo-Isolates-SPAdes-Illumina.fasta (69 MB, e7b60972c37869e0e91513a06bdf9d33 ) |
SRR7498042 | SRR7415629 | bsubtilis_pb.fasta (GitHub) |
Cryptococcus neoformans x Cryptococcus deneoformans | ERR2935856 | - | - | - | |
Enterococcus faecalis | ERR2935850 | SRR7415622 | SRR7415630 | efaecalis_pb.fasta (GitHub) | |
Escherichia coli | ERR2935852 | SRR7498041 | - | ecoli_pb.fasta (GitHub) | |
Lactobacillus fermentum | ERR2935857 | - | - | - | |
Listeria monocytogenes | ERR2935854 | SRR7415624 | SRR7415635 | lmonocytogenes_pb.fasta (GitHub) | |
Pseudomonas aeruginosa | ERR2935853 | SRR7498043 | - | paeruginosa_pb.fasta (GitHub) | |
Saccharomyces cerevisiae | ERR2935855 | SRR7498048 | SRR7415638 | scerevisiae_pb.fasta (GitHub) | |
Salmonella enterica | ERR2935848 | SRR7415626 | SRR7415636 | senterica_pb.fasta (GitHub) | |
Staphylococcus aureus | ERR2935849 | SRR7415627 | SRR7415637 | saureus_pb.fasta (GitHub) |
Nanopore sequencing data sets
FAST5 Signal Accession | FASTQ Accession | Sequencer | Community Standard (Lot) | Time (h) | Reads (M) | N50 (Kbp) | Quality (Median Q) |
---|---|---|---|---|---|---|---|
ERR2887847 | ERR3152364 | GridION | Zymo CS Even (ZRC190633) | 48 | 3.49 | 5.3 | 10.3 |
ERR2887850 | ERR3152366 | GridION | Zymo CSII Log (ZRC190842) | 48 | 3.67 | 5.4 | 9.8 |
ERR2887848 | ERR3152365 | PromethION | Zymo CS Even (ZRC190633) | 64 | 35.7 | 5.4 | 10.5 |
ERR2887849 | Restarts | ||||||
ERR2887851 | ERR3152367 | PromethION | Zymo CSII Log (ZRC190842) | 64 | 34.5 | 5.4 | 10.7 |
ERR2887852 | Restarts |