Under the hood¶
Snakemake, environments and containers¶
Snakemake is the center-piece of this pipeline. Snakemake is a Python-based workflow-manager that enables the processing of a large set of amplicon-based metagenomics sequencing reads into actionable outputs. Each step is defined as a rule in which input/output files, software dependencies (Conda or containers), scripts and command-lines are specified (See snakemake’s docs for more details).
Conda is a language-independent package and environment management tool. A Conda environment is a collection of installed Conda packages. For example, a research project might require VSEARCH 2.20.0 and its dependencies, whereas another environment associated with a completed project might necessitate the use of VSEARCH 2.15. Changing the environment, has no effect on the others. Switching between environments is simple because they can be easily activated or deactivated.
The concept of reproducible analysis in bioinformatics extends beyond good documentation and code sharing. Analyses typically depend on an entire environment with numerous tools, libraries, and settings. Storage, reuse, and sharing environments via container software such as Docker and Singularity could improve reproducibility and productivity. By using containers (apptainer, docker, podman …), users can create a single executable file that contains all aspects of their environment and allows to safely run environments from a variety of resources without requiring privileged access.
Logging and traceability¶
logs¶
Upon each execution, zAMP automatically creates a log file where all the standard output is recorded:
zamp_out/zamp.log
config file¶
In addition to logs, zAMP copies a config file listing all the parameters used in the run unde
zamp_out/config.yaml
Sequencing reads QC¶
QC rules assess the sequencing quality of all each sample with FastQC [1]. Then, a MultiQC [2] report generates a report for each sequencing run (based on “run” column indicated in sample sheet ). A global MultiQC report is generated as well, but without interactive features to deal with the high number of samples