Sample Preparation
For library construction, DNA/RNA is extracted from a sample. After performing quality control (QC), passed sample is proceeded with the library construction.
Library construction
The sequencing library is prepared by random fragmentation of the DNA or cDNA sample, followed by 5' and 3' adapter ligation. Alternatively, "tagmentation" combines the fragmentation and ligation reactions into a single step that greatly increases the efficiency of the library preparation process. Adapter-ligated fragments are then PCR amplified and gen purified.
Sequencing
For cluster generation, the library is loaded onto a flow cell where fragments are captured on a lawn of surface-bound oligos complementary to the library adapters. Each fragment is then amplified into distinct, clonal clusters through amplification. When cluster generation is complete, the templates are ready for sequencing. Illumina SBS technology utilizes a proprietary reversible terminator-based method that detects single bases as they are incorporated into DNA template strands. As all 4 reversible, terminator-bound dNTPs are present during each sequencing cycle, natural competition minimizes incorporation bias and greatly reduces raw error rates compared to other technologies. The result is highly accurate base-by-base sequencing that virtually eliminates sequence-context-specific errors, even within repetitive sequence regions and homo-polymers.
Centrifuge v1.0.4 was performed for taxonomy assignment of trimmed data with primary assignments set as 1. The count and portion of the reads assigned to each taxon number were calculated at each taxon ranks (Kingdom, Phylum, Class, Order, Family, Genus, Species and Subspecies) using in-house scripts and normalized abundance of taxon at level of subspecies, species and genus was calculated using following equation. After calculating abundance, relative abundance which is ratio of abundance compared with total abundance sum was calculated.
Citation
Fun4Me was used for pathway analysis of the trimmed reads. Enzyme commission numbers (E.C. number) retrieved by Fun4Me were collected and enrichment of E.C. number was visualized using in-house script.
Citation
Assembly results were subject to taxonomy assignment using Centrifuge v1.0.4 with primary assignments set as 1. The count and portion of the contigs assigned to each taxon number were calculated at each taxon ranks (Kingdom, Phylum, Class, Order, Family, Genus, Species and Subspecies) using in-house scripts and normalized abundance of taxon at level of subspecies, species and genus was calculated using following equation.
In detail, length of the contig * average mapping depth of contig / average reads length was regarded as reads count assigned to the taxon. They were normalized using average genome size of the taxon and total reads count. After calculating abundance, relative abundance which is ratio of abundance compared with total abundance sum was calculated.
Citation
Unclassified contigs, which did not be assigned taxon by centrifuge, were binned to classify using characteristics of the sequences. CONCOCT v1.1.0 was used to bin the unclassified contigs without length threshold.
Citation
Prokka v1.13 was used to predict genes on the contigs with following options; --compliant --force --rnammer --addgenes --gcode 11 --metagenome. The predicted genes were annotated using EggNOG-mapper v2.0.1 based on EggNOG database v5.0.
Citation
Roary v1.007001 was used for pan-genome analysis with MAFFT v7.427 aligner. The result of pan-genome analysis was visualized by UpSet plot to show how many genes are common or not among the samples.
Citation