Slurm issue
Created by: AlcaArctica
I want to try out your pipeline to see whether we could use it for routine assembly and QC in our lab. However, I can not get the test data set to run. I am sure its a simple issue. Would you kindly guide me through it?
We are using a unix HPC cluster andI have set up your pipeline in the following way:
conda create --prefix=/lustre/projects/dazzler/uelze/conda_envs/culebrONT
conda activate /lustre/projects/dazzler/uelze/conda_envs/culebrONT
conda install python=3.7
python3 -m pip install culebrONT
culebrONT install_cluster --scheduler slurm --env modules --bash_completion --create_envmodule --modules_dir /lustre/projects/dazzler/uelze/conda_envs
When running the test data set the pipeline cannot complet all jobs which require slurm. The error message is always the same
[Wed Sep 13 10:00:12 2023]
Job 19:
making dag ...
snakemake -s /lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/snakefiles/Snakefile --use-envmodules --rulegraph --configfile /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/config_corrected.yaml > /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.tmp
dot -Tpng /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.tmp > /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.png
Reason: Missing output files: /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.png
(snakemake -s /lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/snakefiles/Snakefile --use-envmodules --rulegraph --configfile /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/config_corrected.yaml > /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.tmp
dot -Tpng /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.tmp > /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.png) 1>/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/LOGS/GRAPH.o 2>/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/LOGS/GRAPH.e
sbatch: fatal: --mem, --mem-per-cpu, and --mem-per-gpu are mutually exclusive.
Traceback (most recent call last):
File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/default_profile/slurm-submit.py", line 102, in <module>
jobid = slurm_utils.submit_job(jobscript, **sbatch_options)
File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/default_profile/slurm_utils.py", line 199, in submit_job
raise e
File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/default_profile/slurm_utils.py", line 197, in submit_job
res = sp.check_output(cmd)
File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['sbatch', '--parsable', '--export=ALL', '--cpus-per-task=1', '--mem-per-cpu=10G', '--partition=batch', '--output=/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/LOGS/GRAPH.o_cluster', '--error=/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/LOGS/GRAPH.e_cluster', '--job-name=rule_graph', '--mem=1000', '/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/.snakemake/tmp.5klg583h/snakejob.rule_graph.19.sh']' returned non-zero exit status 1.
Error submitting jobscript (exit code 1):
I am new to slurm, but I understand that two or more mutually exclusive flags are supplied to slurm and this is what is causing the issue (--mem, --mem-per-cpu, and --mem-per-gpu ).
In my example this happens here:
['sbatch', '--parsable', '--export=ALL', '--cpus-per-task=1', '--mem-per-cpu=10G', '--partition=batch', '--output=//GRAPH.o_cluster', '--error=//GRAPH.e_cluster', '--job-name=rule_graph', '--mem=1000', '//snakejob.rule_graph.19.sh']
Here we have both --mem-per-cpu=10G, as well as --mem=1000.
Why does this happen? How can I fix this?
PS: Perhaps this is related to the section
RESOURCE_MAPPING = {
"time": ("time", "runtime", "walltime"),
"mem": ("mem", "mem_mb", "ram", "memory"),
"mem-per-cpu": ("mem-per-cpu", "mem_per_cpu", "mem_per_thread"),
"nodes": ("nodes", "nnodes"),
"nodelist" : ("w", "nodelist"),
"partition": ("partition", "queue"),
}
in the file slurm-submit.py? Here, both mem and mem-per-cpu appear.
PS2: "mem-per-cpu: 10G" appears twice in the file cluster_config.yaml. I could not figure out where the "mem=1000" originates.