Methods

Sequencing workflow

Sample Preparation
For library construction, DNA/RNA is extracted from a sample. After performing quality control (QC), passed sample is proceeded with the library construction.

Library construction
The sequencing library is prepared by random fragmentation of the DNA or cDNA sample, followed by 5' and 3' adapter ligation. Alternatively, "tagmentation" combines the fragmentation and ligation reactions into a single step that greatly increases the efficiency of the library preparation process. Adapter-ligated fragments are then PCR amplified and gen purified.

Sequencing
For cluster generation, the library is loaded onto a flow cell where fragments are captured on a lawn of surface-bound oligos complementary to the library adapters. Each fragment is then amplified into distinct, clonal clusters through amplification. When cluster generation is complete, the templates are ready for sequencing. Illumina SBS technology utilizes a proprietary reversible terminator-based method that detects single bases as they are incorporated into DNA template strands. As all 4 reversible, terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias and greatly reduces raw error rates compared to other technologies. The result is highly accurate base-by-base sequencing that virtually eliminates sequence-context-specific errors, even within repetitive sequence regions and homo-polymers.

Analysis workflow

Preprocessing

Citation

Taxonomy assignment for reads

Centrifuge v1.0.4 was performed for taxonomy assignment of trimmed data with primary assignments set as 1. The count and portion of the reads assigned to each taxon number were calculated at each taxon ranks (Kingdom, Phylum, Class, Order, Family, Genus, Species and Subspecies) using in-house scripts and normalized abundance of taxon at level of subspecies, species and genus was calculated using following equation. After calculating abundance, relative abundance which is ratio of abundance compared with total abundance sum was calculated.

$Abundance = \frac{Ra}{Rt*G}$

· Ra : Read count assigned to the taxon
· Rt : Total read count
· G : Average genome size of the taxon

Citation

Kim, D., Song, L., Breitwieser, F. P., & Salzberg, S. L. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome research, 26(12), 1721-1729.

Pathway analysis

Fun4Me was used for pathway analysis of the trimmed reads. Enzyme commission numbers (E.C. number) retrieved by Fun4Me were collected and enrichment of E.C. number was visualized using in-house script.

Citation

Sharifi, F., & Ye, Y. (2017). From gene annotation to function prediction for metagenomics. In Protein Function Prediction (pp. 27-34). Humana Press, New York, NY.

De novo assembly

Taxonomy assignment for contigs

Assembly results were subject to taxonomy assignment using Centrifuge v1.0.4 with primary assignments set as 1. The count and portion of the contigs assigned to each taxon number were calculated at each taxon ranks (Kingdom, Phylum, Class, Order, Family, Genus, Species and Subspecies) using in-house scripts and normalized abundance of taxon at level of subspecies, species and genus was calculated using following equation.

$Abundance = \frac{Lc*D}{Rt*Rl*G}$

· Lc : Length of the contig
· D : Average mapping depth of the contig
· Rt : Total read count
· Rl : Average read length
· G : Average genome size of the taxon

In detail, length of the contig * average mapping depth of contig / average reads length was regarded as reads count assigned to the taxon. They were normalized using average genome size of the taxon and total reads count. After calculating abundance, relative abundance which is ratio of abundance compared with total abundance sum was calculated.

Citation

Kim, D., Song, L., Breitwieser, F. P., & Salzberg, S. L. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome research, 26(12), 1721-1729.

Binning for unclassified contigs

Unclassified contigs, which did not be assigned taxon by centrifuge, were binned to classify using characteristics of the sequences. CONCOCT v1.1.0 was used to bin the unclassified contigs without length threshold.

Citation

Alneberg, J., Bjarnason, B. S., De Bruijn, I., Schirmer, M., Quick, J., Ijaz, U. Z., ... & Quince, C. (2014). Binning metagenomic contigs by coverage and composition. Nature methods, 11(11), 1144-1146.

Gene prediction and annotation

Prokka v1.13 was used to predict genes on the contigs with following options; --compliant --force --rnammer --addgenes --gcode 11 --metagenome. The predicted genes were annotated using EggNOG-mapper v2.0.1 based on EggNOG database v5.0.

Citation

Seemann, Torsten. "Prokka: rapid prokaryotic genome annotation." Bioinformatics 30, no. 14 (2014): 2068-2069.
Huerta-Cepas, J., Forslund, K., Coelho, L. P., Szklarczyk, D., Jensen, L. J., Von Mering, C., & Bork, P. (2017). Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Molecular biology and evolution, 34(8), 2115-2122.
Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernández-Plaza, A., Forslund, S. K., Cook, H., ... & von Mering, C. (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research, 47(D1), D309-D314.

Pan-genome analysis

Roary v1.007001 was used for pan-genome analysis with MAFFT v7.427 aligner. The result of pan-genome analysis was visualized by UpSet plot to show how many genes are common or not among the samples.

Citation

Page, A. J., Cummins, C. A., Hunt, M., Wong, V. K., Reuter, S., Holden, M. T., ... & Parkhill, J. (2015). Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics, 31(22), 3691-3693.
Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular biology and evolution, 30(4), 772-780.
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R., & Pfister, H. (2014). UpSet: visualization of intersecting sets. IEEE transactions on visualization and computer graphics, 20(12), 1983-1992.