How to run locally¶

Run snpArcher on a single machine: a laptop, workstation, or interactive cluster session.

Prerequisites¶

You need:

Installed snpArcher and activated the snparcher conda environment
Created a sample sheet
Configured your run with a config.yaml

Set up your project directory¶

Organize each analysis as a self-contained project directory:

my_project/
├── config/
│   ├── config.yaml
│   └── samples.csv

Copy the default config from the snpArcher repository as a starting point:

mkdir -p my_project/config
cp /path/to/snpArcher/config/config.yaml my_project/config/config.yaml

Edit my_project/config/config.yaml to point to your sample sheet and reference genome.

Tip

A single snpArcher clone can serve many projects. You do not need a copy of the repository in each project directory.

Dry run first¶

Always perform a dry run before committing compute resources:

conda activate snparcher
snakemake \
  --snakefile /path/to/snpArcher/workflow/Snakefile \
  --directory /path/to/my_project \
  --use-conda \
  --dry-run

Check the output for:

Sample IDs match your sample sheet. Sample identifiers in the job list should correspond to your sample_id values.
Expected rules are listed. If you enabled the QC module, you should see QC rules; if postprocessing is disabled, those rules should be absent.
No errors about missing inputs or invalid config.

Run the pipeline¶

Once the dry run looks correct, launch the full run:

snakemake \
  --snakefile /path/to/snpArcher/workflow/Snakefile \
  --directory /path/to/my_project \
  --use-conda \
  --cores 8  # <-- change to the number of cores available

Key flags¶

Flag	Purpose
`--snakefile` / `-s`	Path to snpArcher's `workflow/Snakefile`. Not needed if you run from inside the snpArcher directory.
`--directory` / `-d`	Path to your project directory (where `config/config.yaml` lives).
`--use-conda`	Required. Tells Snakemake to create isolated environments for each pipeline step.
`--cores`	Maximum number of CPU cores to use concurrently. Set this to the number of available cores.
`--workflow-profile`	Path to a workflow profile directory for resource settings. Defaults to `workflow-profiles/default` in the snpArcher repo.

By default, Snakemake creates per-step conda environments inside the project directory under .snakemake/conda/, so each project gets its own copy of every tool.

To share environments across projects, set a central conda prefix:

snakemake \
  --snakefile /path/to/snpArcher/workflow/Snakefile \
  --directory /path/to/my_project \
  --use-conda \
  --conda-prefix ~/snparcher_envs \
  --cores 8

Any subsequent run using the same --conda-prefix will reuse existing environments instead of rebuilding them.

Use a custom workflow profile¶

The workflow profile controls per-rule resource allocation (threads, memory). snpArcher ships with a default profile at workflow-profiles/default/config.yaml.

To use a custom profile, copy it to your project and pass the directory path:

cp -r /path/to/snpArcher/workflow-profiles/default my_project/workflow-profile
# Edit my_project/workflow-profile/config.yaml as needed
snakemake \
  --snakefile /path/to/snpArcher/workflow/Snakefile \
  --directory /path/to/my_project \
  --use-conda \
  --workflow-profile my_project/workflow-profile \
  --cores 8

Prevent disconnection from killing your run¶

Local runs can take hours. Use a terminal multiplexer to keep the process alive after disconnecting:

tmux new -s snparcher
snakemake \
  --snakefile /path/to/snpArcher/workflow/Snakefile \
  --directory /path/to/my_project \
  --use-conda \
  --conda-prefix ~/snparcher_envs \
  --cores 8
# Detach: Ctrl-b, then d
# Reattach later: tmux attach -t snparcher

Resume after interruption¶

Snakemake tracks which steps have completed. If the pipeline is interrupted, re-run the same command and it will pick up where it left off.

# Same command as before; Snakemake resumes automatically
snakemake \
  --snakefile /path/to/snpArcher/workflow/Snakefile \
  --directory /path/to/my_project \
  --use-conda \
  --cores 8

Tip

If a job failed partway through and left behind incomplete output files, add --rerun-incomplete to force those jobs to re-run:

snakemake \
  --snakefile /path/to/snpArcher/workflow/Snakefile \
  --directory /path/to/my_project \
  --use-conda \
  --cores 8 \
  --rerun-incomplete

Where to find outputs¶

All outputs are written to the results/ directory inside your project. Key output files include:

Output	Path
Hard-filtered VCF	`results/vcfs/filtered.vcf.gz`
Raw (unfiltered) VCF	`results/vcfs/raw.vcf.gz`
Per-sample BAMs	`results/bams/markdup/{sample}.bam`
Per-sample gVCFs	`results/gvcfs/{sample}.g.vcf.gz`
QC dashboard	`results/qc/qc_dashboard.html`
Callable sites BED	`results/callable_sites/callable_sites.bed`

For a complete listing, see the outputs reference.

Next steps¶

Interpret QC reports to check for problems
Filter and postprocess to prepare a clean VCF
For larger datasets, consider running on HPC