This TechNote addresses key quality parameters of three commonly used sequencing library protocols for shotgun metagenomics. Protocol A: Illumina Nextera DNA Flex (input: 100 ng), Protocol B: Illumina Nextera XT DNA (input: 1 ng), Protocol C: Down-scaled Illumina Nextera XT DNA (input: 0.1 ng). In a recent publication, Hillmann et al. (2018) recommended the down-scaled Nextera XT DNA approach as a cost-efficient protocol for shotgun metagenomics.
Sample and analysis details
To test how well the protocols can detect the theoretical composition of a microbial community standard, we performed three library preparations for each protocol using the same standard. We used a commercially available microbial community standard that consists of DNA from ten microbial species (five gram-positive bacteria, three gram-negative bacteria, two fungi) at a known composition. The standard was designed to explicitly cover a wide variety of genome sizes and GC contents and thus mimics the conditions in more complex communities. Also, we investigated the repeatability and detectable diversity of all protocols using a complex microbial DNA pool, generated from a fecal sample containing stool from ten individuals.
All sequencing libraries were sequenced on the Illumina NovaSeq with a read length of 2×100 bp and a sequencing depth of 10 million clusters. Raw data of the standard samples were analyzed by mapping against a database containing only the ten microbial species theoretically available in the standard. For the analysis of the complex microbial DNA pool from stool samples, we used CeGaT’s standard bioinformatics pipeline. This pipeline is based on mapping all reads against an extensive database with thousands of species (NCBI RefSeq).
As shown in figure 1, all protocols were able to capture the microbial species present in the standard. However, regarding the correct representation of the community composition (i.e. the theoretical relative sequence abundance of the detected microbial species) the protocols revealed differences. The theoretical composition was most accurately determined with protocol A (figure 1 and table 1) as it showed the lowest Bray-Curtis dissimilarity values and highest correlation coefficients (Pearson-R and Lin’s Concordance Correlation Coefficient).