![]() ![]() Second, wet bench scientists and computational biologists should work together to ensure optimal design of microbiome experiments and interpretation of the resulting sequencing data. This should include information about primer trimming, read merging/joining, denoising, OTU picking, database selection, and taxonomy assignment, specifying software version numbers and when default parameters were used or modified. This variability has two important ramifications.įirst, researchers must clearly report their analysis protocols in published work so that individual analyses can be reproduced and the results of different studies can be appropriately compared. Rather than say that one analysis approach is better, we merely intend to show that decisions about the bioinformatic pipeline, decisions that might appear innocuous to a novice, can have an enormous impact on the biological interpretation of the results. We illustrate these differences using the popular pipelines, Qiime 1 and Qiime 2, but these general concepts hold true with any 16S analysis pipeline. Here, we show that implementing different 16S analysis algorithms to profile a commonly used standardized microbial community can lead to drastically different biological interpretations. 8, 9 (This is also why, in the example above, the merged Helicobacter sequence was identified as Flexispira.) Had we not known this subtlety, we might have erroneously concluded that the experimental contaminant, Turicibacter, was one of the Lachnospiraceae sequences we expected in these mice as part of their defined ASF flora. ![]() Why, if the sequences in the sample perfectly match a database entry for Turicibacter, were those sequences identified as anything else? This is because the database search algorithms used by both Qiime 1 and Qiime 2 (uclust and vsearch) are heuristic, which means that they aim to find some database match better than 97% identity, but not necessarily the best match. In fact, all three read sections are 100% identical to a database entry for Turicibacter, an experimental contaminant correctly identified by most of the other calling methods. bilis-negative samples, we again observed that the three read regions identified three different taxonomies for the same sequence: forward reads were classified as Lachnospiraceae, reverse reads as Bacillaceae, and merged reads were discarded. Third, when using Qiime 1’s closed-reference calling on the H. Although both Qiime 1 and Qiime 2 implement conceptually identical closed-reference calling, Qiime 1 uses one program (uclust) while Qiime 2 uses another (vsearch), each of which utilizes a slightly different search algorithm, leading to different results. Second, when using Qiime 2’s closed-reference calling on the same sequences, the merged read was classified as Helicobacter, rather than the Flexispira classification made by Qiime 1. The reverse sequence, on the other hand, is a perfect match to the corresponding region of the database sequence. This identity falls below the default 97% cutoff, and the read is discarded. The forward read is only 96.2% similar to the corresponding region of the database sequence. bilis in the Greengenes reference database, so the sequence similarity between the sample sequence and the most similar Helicobacter database sequence varies depending on the read region. The forward and reverse reads are both derived from the same piece of DNA, so why is one assigned to Helicobacter and the other discarded? The answer is subtle. (The black bars representing discarded sequences in our figure are not present in standard Qiime output files we added them for emphasis.) We may not have even been aware that a significant portion of sequences were discarded, as it is a common practice for analytical pipelines to silently discard these reads. If we had only used the forward reads, which may have been necessary depending on the sequencing platform and chosen primers, we might have concluded that Helicobacter did not engraft in these mice. bilis sequences to be identified three different ways: the reverse reads were identified as Helicobacter, the merged reads were identified as Flexispira, and the forward reads were discarded because they did not match any sequence in the reference database. bilis-positive samples, the choice of read direction led the H. We point out three examples.įirst, when using Qiime 1’s closed-reference calling on the H. The different analysis pipelines, run on the different read directions, reported markedly different bacterial community compositions (Figure 1). ![]()
0 Comments
Leave a Reply. |