Folder Structure

Data file information


Summary - ASVs (data) QC_statistics.xlsx: This file is a representation of the results obtained from the analysis, including information such as the raw data, quality filter, and chimeric read removal. ASVs_rep.fasta: This is a fasta file containing nucleotide sequences represented by unique identifiers ASVs. ASVs_phylo.tre: A phylogenetic tree file (e.g., Newick format) representing the phylogenetic relationships among ASVs. ASV_table.biom: An ASV table in BIOM format, containing abundance data used for diversity analyses. *Taxonomy_analysis (_DB) TAXONOMY_Assignment.xlsx: Taxonomys Profile Excel file. Check out the glossary sheet for a detailed explanation of each item. Taxonomy_Top20_Input.xlsx: This input file presents the abundance ratio of the top 20 taxonomies utilized in the graph, with each taxonomy level displayed on individual sheets. ASVs_{Alignment tool}_{Database}.biom: A BIOM table that includes taxonomic assignments for each ASV, based on the alignment tool and database used. Taxonomy_analysis (Count, Ratio) ASVs_table._L.txt: These files offer the count and ratio of ASVs separately for each database and taxonomy level, providing a comprehensive view of microbial composition. Taxonomy_abundance_count.xlsx: The result is displayed in separate sheets for each taxonomy level, showing the read count corresponding to each sample. Taxonomy_abundance_ratio.xlsx: The result is displayed in separate sheets for each taxonomy level, showing the relative abundance corresponding to each sample. Alpha_Diversity (Community_diversity) Diversity_Index.xlsx: This file offers a single table for shannon, simpson, PD_whole_tree, and ASV values per sample. Beta_Diversity (DistanceMatrix, PCoA, UPGMA_tree)** _PC.xlsx: This file contains the results of PCoA analysis. It includes the explained variance for each principal component (PC) and the corresponding PC values. _DistanceMatrix.xlsx: This file includes the Distance Matrix results from the PCoA analysis. It stores the distance information between data points obtained through PCoA. *UPGMA__.tre:* The results of the UPGMA tree(Bray Curtis, Weighted UniFrac, Unweighted UniFrac) are presented in the .tre format.

Summary - ASVs (data)

QC_statistics.xlsx: This file is a representation of the results obtained from the analysis, including information such as the raw data, quality filter, and chimeric read removal.
ASVs_rep.fasta: This is a fasta file containing nucleotide sequences represented by unique identifiers ASVs.
ASVs_phylo.tre: A phylogenetic tree file (e.g., Newick format) representing the phylogenetic relationships among ASVs.
ASV_table.biom: An ASV table in BIOM format, containing abundance data used for diversity analyses.

Taxonomy_analysis (*_DB)

TAXONOMY_Assignment.xlsx: Taxonomys Profile Excel file. Check out the glossary sheet for a detailed explanation of each item.
Taxonomy_Top20_Input.xlsx: This input file presents the abundance ratio of the top 20 taxonomies utilized in the graph, with each taxonomy level displayed on individual sheets.
ASVs_{Alignment tool}_{Database}.biom: A BIOM table that includes taxonomic assignments for each ASV, based on the alignment tool and database used.

Taxonomy_analysis (Count, Ratio)

ASVs_table.*_L*.txt: These files offer the count and ratio of ASVs separately for each database and taxonomy level, providing a comprehensive view of microbial composition.
Taxonomy_abundance_count.xlsx: The result is displayed in separate sheets for each taxonomy level, showing the read count corresponding to each sample.
Taxonomy_abundance_ratio.xlsx: The result is displayed in separate sheets for each taxonomy level, showing the relative abundance corresponding to each sample.

Alpha_Diversity (Community_diversity)

Diversity_Index.xlsx: This file offers a single table for shannon, simpson, PD_whole_tree, and ASV values per sample.

Beta_Diversity (DistanceMatrix, PCoA, UPGMA_tree)

*_PC.xlsx: This file contains the results of PCoA analysis. It includes the explained variance for each principal component (PC) and the corresponding PC values.
*_DistanceMatrix.xlsx: This file includes the Distance Matrix results from the PCoA analysis. It stores the distance information between data points obtained through PCoA.
UPGMA_*_*.tre: The results of the UPGMA tree(Bray Curtis, Weighted UniFrac, Unweighted UniFrac) are presented in the .tre format.

Analysis Workflow

Analysis Tools

Cutadapt is a software tool that removes unwanted sequences like adapters, primers, and poly-A tails from high-throughput sequencing reads. These sequences can interfere with downstream analysis and are commonly found in small-RNA sequencing and amplicon reads. Cutadapt helps in trimming tasks by identifying and removing these sequences in an error-tolerant way. It can also modify and filter single-end and paired-end reads and demultiplex reads. Cutadapt is available under the MIT license.