Skip to content

Slurm issue

Created by: AlcaArctica

I want to try out your pipeline to see whether we could use it for routine assembly and QC in our lab. However, I can not get the test data set to run. I am sure its a simple issue. Would you kindly guide me through it?

We are using a unix HPC cluster andI have set up your pipeline in the following way:

conda create --prefix=/lustre/projects/dazzler/uelze/conda_envs/culebrONT
conda activate /lustre/projects/dazzler/uelze/conda_envs/culebrONT
conda install  python=3.7
python3 -m pip install culebrONT
culebrONT install_cluster --scheduler slurm --env modules --bash_completion --create_envmodule --modules_dir /lustre/projects/dazzler/uelze/conda_envs

When running the test data set the pipeline cannot complet all jobs which require slurm. The error message is always the same

[Wed Sep 13 10:00:12 2023]
Job 19: 
        making dag ...
        snakemake -s /lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/snakefiles/Snakefile  --use-envmodules  --rulegraph --configfile /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/config_corrected.yaml > /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.tmp
        dot -Tpng /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.tmp > /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.png
        
Reason: Missing output files: /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.png


        (snakemake -s /lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/snakefiles/Snakefile  --use-envmodules  --rulegraph --configfile /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/config_corrected.yaml > /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.tmp
        dot -Tpng /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.tmp > /lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/dag.png) 1>/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/LOGS/GRAPH.o 2>/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/LOGS/GRAPH.e
        
sbatch: fatal: --mem, --mem-per-cpu, and --mem-per-gpu are mutually exclusive.
Traceback (most recent call last):
  File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/default_profile/slurm-submit.py", line 102, in <module>
    jobid = slurm_utils.submit_job(jobscript, **sbatch_options)
  File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/default_profile/slurm_utils.py", line 199, in submit_job
    raise e
  File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/site-packages/culebrONT/default_profile/slurm_utils.py", line 197, in submit_job
    res = sp.check_output(cmd)
  File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/lustre/projects/dazzler/uelze/conda_envs/culebrONT/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['sbatch', '--parsable', '--export=ALL', '--cpus-per-task=1', '--mem-per-cpu=10G', '--partition=batch', '--output=/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/LOGS/GRAPH.o_cluster', '--error=/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/FINAL_REPORT/LOGS/GRAPH.e_cluster', '--job-name=rule_graph', '--mem=1000', '/lustre/projects/dazzlerAssembly/test_culebrONT/culebrONT_OUTPUT/.snakemake/tmp.5klg583h/snakejob.rule_graph.19.sh']' returned non-zero exit status 1.
Error submitting jobscript (exit code 1):

I am new to slurm, but I understand that two or more mutually exclusive flags are supplied to slurm and this is what is causing the issue (--mem, --mem-per-cpu, and --mem-per-gpu ).

In my example this happens here: ['sbatch', '--parsable', '--export=ALL', '--cpus-per-task=1', '--mem-per-cpu=10G', '--partition=batch', '--output=//GRAPH.o_cluster', '--error=//GRAPH.e_cluster', '--job-name=rule_graph', '--mem=1000', '//snakejob.rule_graph.19.sh'] Here we have both --mem-per-cpu=10G, as well as --mem=1000.

Why does this happen? How can I fix this?

PS: Perhaps this is related to the section

RESOURCE_MAPPING = {
    "time": ("time", "runtime", "walltime"),
    "mem": ("mem", "mem_mb", "ram", "memory"),
    "mem-per-cpu": ("mem-per-cpu", "mem_per_cpu", "mem_per_thread"),
    "nodes": ("nodes", "nnodes"),
    "nodelist" : ("w", "nodelist"),
    "partition": ("partition", "queue"),
}

in the file slurm-submit.py? Here, both mem and mem-per-cpu appear.

PS2: "mem-per-cpu: 10G" appears twice in the file cluster_config.yaml. I could not figure out where the "mem=1000" originates.