Folder Structure


File tree

Data file information

Number Icon Summary - ASVs (data)
    • Excel Icon  QC_statistics.xlsx: This file is a representation of the results obtained from the analysis, including information such as the raw data, quality filter, and chimeric read removal.
    • Fasta Icon  ASVs_rep.fasta: This is a fasta file containing nucleotide sequences represented by unique identifiers ASVs.
    • Txt Icon  ASVs_phylo.tre: A phylogenetic tree file (e.g., Newick format) representing the phylogenetic relationships among ASVs.
    • Txt Icon  ASV_table.biom: An ASV table in BIOM format, containing abundance data used for diversity analyses.

Number Icon Taxonomy_analysis (*_DB)
    • Excel Icon  TAXONOMY_Assignment.xlsx: Taxonomys Profile Excel file. Check out the glossary sheet for a detailed explanation of each item.
    • Excel Icon  Taxonomy_Top20_Input.xlsx: This input file presents the abundance ratio of the top 20 taxonomies utilized in the graph, with each taxonomy level displayed on individual sheets.
    • Txt Icon  ASVs_{Alignment tool}_{Database}.biom: A BIOM table that includes taxonomic assignments for each ASV, based on the alignment tool and database used.

Number Icon Taxonomy_analysis (Count, Ratio)
    • Txt Icon  ASVs_table.*_L*.txt: These files offer the count and ratio of ASVs separately for each database and taxonomy level, providing a comprehensive view of microbial composition.
    • Excel Icon  Taxonomy_abundance_count.xlsx: The result is displayed in separate sheets for each taxonomy level, showing the read count corresponding to each sample.
    • Excel Icon  Taxonomy_abundance_ratio.xlsx: The result is displayed in separate sheets for each taxonomy level, showing the relative abundance corresponding to each sample.

Number Icon Alpha_Diversity (Community_diversity)
    • Excel Icon  Diversity_Index.xlsx: This file offers a single table for shannon, simpson, PD_whole_tree, and ASV values per sample.

Number Icon Beta_Diversity (DistanceMatrix, PCoA, UPGMA_tree)
    • Excel Icon  *_PC.xlsx: This file contains the results of PCoA analysis. It includes the explained variance for each principal component (PC) and the corresponding PC values.
    • Excel Icon  *_DistanceMatrix.xlsx: This file includes the Distance Matrix results from the PCoA analysis. It stores the distance information between data points obtained through PCoA.
    • Tre Icon  UPGMA_*_*.tre: The results of the UPGMA tree(Bray Curtis, Weighted UniFrac, Unweighted UniFrac) are presented in the .tre format.

Analysis Workflow



Analysis Tools


  Cutadapt is a software tool that removes unwanted sequences like adapters, primers, and poly-A tails from high-throughput sequencing reads. These sequences can interfere with downstream analysis and are commonly found in small-RNA sequencing and amplicon reads. Cutadapt helps in trimming tasks by identifying and removing these sequences in an error-tolerant way. It can also modify and filter single-end and paired-end reads and demultiplex reads. Cutadapt is available under the MIT license.



  More information can be found here:

https://cutadapt.readthedocs.io/en/v3.2/

  DADA2 is an R package for the processing of amplicon sequencing data. It implements a complete pipeline for removing errors from amplicon sequencing data, including denoising and chimera removal. The package has features for quality filtering, sequence trimming, merging paired-end reads, and taxonomic assignment of amplicon sequence variants. DADA2 is designed to provide high-resolution taxonomic information with minimum error rates. It is a powerful tool for accurately profiling microbial communities and is widely used in various fields of microbial ecology and microbiome research.



  More information can be found here:

https://benjjneb.github.io/dada2/index.html

  QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics. This includes demultiplexing and quality filtering, OTU picking, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations. QIIME has been applied to studies based on billions of sequences from tens of thousands of samples.



  More information can be found here:

http://qiime.org/index-qiime1.html

  MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.



  More information can be found here:

https://mafft.cbrc.jp/alignment/software/

  FastTree and FastTreeMP are programs used for constructing phylogenetic trees from nucleotide or amino acid sequence data. FastTree is a single-threaded version, while FastTreeMP uses OpenMP to parallelize computation and run on multiple CPUs. FastTreeMP can provide a speedup of 1.5-1.7x with three CPUs, but using more CPUs will not speed up the maximum-likelihood phase. While FastTreeMP may not give exactly the same results as FastTree, the results are of similar quality.



  More information can be found here:

http://www.microbesonline.org/fasttree/


Analysis Methods



Data Preprocessing and ASV generation


Taxonomy analysis and Community diversity



Publication