1

I have a snakemake (7.22.0) that's stalling after they start. I have rules that run on a cluster (through pbs) and execute an external Python script. I noticed that now some of the rules stall for very long before executing the script - the job starts, and snakemake outputs that it has started running, but then the actual script starts only 2hrs later. The output I get from the job is thus something like this:

[Tue Oct 15 23:13:13 2024]
rule ...:
    input: ...
    output: ...
    jobid: 0
    reason: Missing output files: ...
    wildcards: ...
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/var/tmp/pbs.<job id>.<cluster name>

2024-10-16 01:21:37.393620 log from first line of the script
...
2024-10-16 01:21:41.212192 log from last line of the script (after reading large files) 
Not cleaning up <tmp script path>
[Wed Oct 16 01:21:41 2024]
Finished job 0.
1 of 1 steps (100%) done

Has anyone experienced something like this? What might snakemake be doing that might cause this? I'm generating lots of files in the workflow (only one in this job), so it's a suspect cause, but I don't entirely see how this might cause this. Also, the top-level "all" triggers many other rules (thousdands - but using a limit on the number of jobs submitted to pbs), and executing that takes ~20 minutes, but this is not the rule executing here. Other instances of the same rule execute normally too.

These are the statistics from pbs at some point during the job's execution, from a time before the external script started:

Job Id: ...
    Job_Name = snakejob....
    Job_Owner = ...
    resources_used.cpupercent = 4
    resources_used.cput = 00:00:44
    resources_used.mem = 231660kb
    resources_used.ncpus = 1
    resources_used.vmem = 977976kb
    resources_used.walltime = 00:54:14

The memory consumption seems excessive, I'm not sure? Is there something snakemake does on startup that can use so much memory (in extreme conditions, whatever they may be)?

3
  • Which version of snakemake are you using? How have you configured snakemake to submit rules to the cluster via PBS? When you say " Also, the top-level "all" triggers many other rules" - how many is many? Maybe you have a job limit on PBS so all is working as expected, it might not be snakemake's fault? Commented Oct 16, 2024 at 16:47
  • Thanks for your help! It's snakemake 7.22.0. Yes, rules are submitted to PBS. The number of rules is in the thousands, but I'm using a limit on the number of submitted jobs. The job starts executing, snakemake logs it has started, then a long delay, and then the external script is executed within the job. Commented Oct 17, 2024 at 16:29
  • 1
    Thanks! Maybe you can add that to the question so others don't have to dig into the comments. Maybe you can try snakemake 8, it's been out for quite a while now, but would likely require quite some changes to how PBS support works. Maybe some bugs have been fixed, worth a try! It looks like there's no PBS plugin, so you might have to use the generic one: snakemake.github.io/snakemake-plugin-catalog/plugins/executor/… Commented Oct 17, 2024 at 16:43

1 Answer 1

0

The problem turned out to be that the directory workdir/.snakemake/scripts had become bogged down with many files (~600,000) from previous runs of the workflow. Deleting the old scripts there solved the problem.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.