How to run locally¶
Run snpArcher on a single machine: a laptop, workstation, or interactive cluster session.
Prerequisites¶
You need:
- Installed snpArcher and activated the
snparcherconda environment - Created a sample sheet
- Configured your run with a
config.yaml
Set up your project directory¶
Organize each analysis as a self-contained project directory:
Copy the default config from the snpArcher repository as a starting point:
Edit my_project/config/config.yaml to point to your sample sheet and reference genome.
Tip
A single snpArcher clone can serve many projects. You do not need a copy of the repository in each project directory.
Dry run first¶
Always perform a dry run before committing compute resources:
conda activate snparcher
snakemake \
--snakefile /path/to/snpArcher/workflow/Snakefile \
--directory /path/to/my_project \
--use-conda \
--dry-run
Check the output for:
- Sample IDs match your sample sheet. Sample identifiers in the job list should correspond to your
sample_idvalues. - Expected rules are listed. If you enabled the QC module, you should see QC rules; if postprocessing is disabled, those rules should be absent.
- No errors about missing inputs or invalid config.
Run the pipeline¶
Once the dry run looks correct, launch the full run:
snakemake \
--snakefile /path/to/snpArcher/workflow/Snakefile \
--directory /path/to/my_project \
--use-conda \
--cores 8 # <-- change to the number of cores available
Key flags¶
| Flag | Purpose |
|---|---|
--snakefile / -s |
Path to snpArcher's workflow/Snakefile. Not needed if you run from inside the snpArcher directory. |
--directory / -d |
Path to your project directory (where config/config.yaml lives). |
--use-conda |
Required. Tells Snakemake to create isolated environments for each pipeline step. |
--cores |
Maximum number of CPU cores to use concurrently. Set this to the number of available cores. |
--workflow-profile |
Path to a workflow profile directory for resource settings. Defaults to workflow-profiles/default in the snpArcher repo. |
Share conda environments across projects¶
By default, Snakemake creates per-step conda environments inside the project directory under .snakemake/conda/, so each project gets its own copy of every tool.
To share environments across projects, set a central conda prefix:
snakemake \
--snakefile /path/to/snpArcher/workflow/Snakefile \
--directory /path/to/my_project \
--use-conda \
--conda-prefix ~/snparcher_envs \
--cores 8
Any subsequent run using the same --conda-prefix will reuse existing environments instead of rebuilding them.
Use a custom workflow profile¶
The workflow profile controls per-rule resource allocation (threads, memory).
snpArcher ships with a default profile at workflow-profiles/default/config.yaml.
To use a custom profile, copy it to your project and pass the directory path:
cp -r /path/to/snpArcher/workflow-profiles/default my_project/workflow-profile
# Edit my_project/workflow-profile/config.yaml as needed
snakemake \
--snakefile /path/to/snpArcher/workflow/Snakefile \
--directory /path/to/my_project \
--use-conda \
--workflow-profile my_project/workflow-profile \
--cores 8
Prevent disconnection from killing your run¶
Local runs can take hours. Use a terminal multiplexer to keep the process alive after disconnecting:
tmux new -s snparcher
snakemake \
--snakefile /path/to/snpArcher/workflow/Snakefile \
--directory /path/to/my_project \
--use-conda \
--conda-prefix ~/snparcher_envs \
--cores 8
# Detach: Ctrl-b, then d
# Reattach later: tmux attach -t snparcher
Resume after interruption¶
Snakemake tracks which steps have completed. If the pipeline is interrupted, re-run the same command and it will pick up where it left off.
# Same command as before; Snakemake resumes automatically
snakemake \
--snakefile /path/to/snpArcher/workflow/Snakefile \
--directory /path/to/my_project \
--use-conda \
--cores 8
Tip
If a job failed partway through and left behind incomplete output files, add --rerun-incomplete to force those jobs to re-run:
Where to find outputs¶
All outputs are written to the results/ directory inside your project.
Key output files include:
| Output | Path |
|---|---|
| Hard-filtered VCF | results/vcfs/filtered.vcf.gz |
| Raw (unfiltered) VCF | results/vcfs/raw.vcf.gz |
| Per-sample BAMs | results/bams/markdup/{sample}.bam |
| Per-sample gVCFs | results/gvcfs/{sample}.g.vcf.gz |
| QC dashboard | results/qc/qc_dashboard.html |
| Callable sites BED | results/callable_sites/callable_sites.bed |
For a complete listing, see the outputs reference.
Next steps¶
- Interpret QC reports to check for problems
- Filter and postprocess to prepare a clean VCF
- For larger datasets, consider running on HPC