Tech Note
Recent scientific publications impressively demonstrate the urgent need for standardized microbiome analysis workflows that accurately detect all microbes in a sample and thus significantly improve comparability between studies. Although the two most commonly applied analysis techniques, 16S rRNA gene analysis and shotgun metagenomics, have been optimized and applied for years, so far, a scientific “best practice” consensus has not been found. Accordingly, data quality, analysis accuracy and repeatability strongly depend on the individual wet lab protocol and analysis workflow chosen by the scientist.
Metagenomic Profiling (shotgun metagenomics)
This TechNote addresses key quality parameters of three commonly used sequencing library protocols for shotgun metagenomics. Protocol A: Illumina Nextera DNA Flex (input: 100 ng), Protocol B: Illumina Nextera XT DNA (input: 1 ng), Protocol C: Down-scaled Illumina Nextera XT DNA (input: 0.1 ng). In a recent publication, Hillmann et al. (2018) recommended the down-scaled Nextera XT DNA approach as a cost-efficient protocol for shotgun metagenomics.
Sample and analysis details
To test how well the protocols can detect the theoretical composition of a microbial community standard, we performed three library preparations for each protocol using the same standard. We used a commercially available microbial community standard that consists of DNA from ten microbial species (five gram-positive bacteria, three gram-negative bacteria, two fungi) at a known composition. The standard was designed to explicitly cover a wide variety of genome sizes and GC contents and thus mimics the conditions in more complex communities. Also, we investigated the repeatability and detectable diversity of all protocols using a complex microbial DNA pool, generated from a fecal sample containing stool from ten individuals.
All sequencing libraries were sequenced on the Illumina NovaSeq with a read length of 2×100 bp and a sequencing depth of 10 million clusters. Raw data of the standard samples were analyzed by mapping against a database containing only the ten microbial species theoretically available in the standard. For the analysis of the complex microbial DNA pool from stool samples, we used CeGaT’s standard bioinformatics pipeline. This pipeline is based on mapping all reads against an extensive database with thousands of species (NCBI RefSeq).
Results
As shown in figure 1, all protocols were able to capture the microbial species present in the standard. However, regarding the correct representation of the community composition (i.e. the theoretical relative sequence abundance of the detected microbial species) the protocols revealed differences. The theoretical composition was most accurately determined with protocol A (figure 1 and table 1) as it showed the lowest Bray-Curtis dissimilarity values and highest correlation coefficients (Pearson-R and Lin’s Concordance Correlation Coefficient).
Figure 1: Comparison of the results from three different protocols with the theoretical composition in the microbial community standard (Theory). Left: Relative sequence abundances in Theory and Protocols A, B, C. Bars of the protocols A, B, C represent means of three library preparations (n=3). Right: Neighbor-joining tree showing the similarity (Bray-Curtis) between Protocols A, B, C, and the theoretical values (Theory).
Table 1: Correlation coefficients (Pearson-R and Lin’s CCC) originating from correlating the theoretical relative sequence abundances with the relative sequence abundances determined in the protocols A, B, C. Values represent means ± standard deviations of three library preparations (n=3).
To determine the repeatability and detectable diversity of the different protocols, we performed three library preparations for each protocol using a complex microbial DNA pool from stool samples. As shown in figure 2, all protocols were able to detect the most common gut microbiome representatives (various Firmicutes and Bacteroidetes members) and covered a wide range of different microbes.
Figure 2: Comparison of three protocols using a complex microbial DNA pool from stool samples. This graph shows the taxonomic composition of all detected microbial taxa with a relative sequence abundance >0.1%. Each node represents a taxonomic unit (e.g. species at the tips), and the size of the circle indicates the mean relative sequence abundance of the respective taxonomic unit across all samples and protocols. Circles highlighted with small letters represent the most abundant genera and species (annotated on the right). Outer rings represent relative sequence abundances for each sample separately. Darker colors indicate higher relative sequence abundance.
To estimate how well the three protocols were able to detect the diversity within a sample (alpha diversity), we calculated the species richness and evenness as well as the Shannon Diversity Index. All of these parameters were slightly higher with protocol A (table 2). Furthermore, Protocol A also showed slightly higher repeatability. In comparison with protocol B and C, protocol A had the lowest variation between replicates among the most abundant taxonomic units and also showed the lowest coefficient of variation (CV), when considering all microbial species with a relative sequence abundance >0.01% (figure 3).
Table 2: Alpha diversity values determined from datasets generated with the three different protocols (Protocol A, B, C). Values represent means ± standard deviations of three library preparations (n=3).
Figure 3: Comparison of three protocols using a complex microbial DNA pool from stool. This heatmap shows the variation between replicates within each protocol for the most abundant taxonomic units at the phylum, class, order, family, genus and species level. Colors are scaled row-wise to allow an evaluation of each taxonomic unit separately. The darker the color of a field, the higher is the deviation of this value from the mean of all three samples of the respective protocol. Numbers in the fields represent the relative sequence abundance. Coefficients of variation (CV) are mean values of all microbial species with a relative sequence abundance >0.01%.