5

I try to run prokka using snakemake and rule all. In the latter I define all output folders which will be produced by prokka to write the results. Prokka requires a folder to be supplied as an output rather than a file.

A simplified version of what I have is here:

PATIENTID_ls = range(2)
rule all:
input:
    expand("results_{subjectID}_outputfolder",subjectID=PATIENTID_ls), 

rule prokka:
    input:
        "contigs/subject_{subjectID}/contigs.fasta",
    output:
        "results/subject_{subjectID}_outputfolder",
    shell:
        "prokka --cpus 1 --proteins ../GCF_000009645.1_ASM964v1_genomic.gbff --outdir {output} --prefix contigs500_anno9ref {input} "

When running:

$snakemake -p
Building DAG of jobs...
MissingInputException in line 2 of Snakefile:
Missing input files for rule all:
results_1_outputfolder
results_0_outputfolder

It works however when specifying the output explicitly:

snakemake -p results/subject_1_outputfolder

I am sure that is noob mistake on my side, but after hours of playing around I could not solve the issue. Help is highly appreciated. Thank you

3 Answers 3

3

Your example has some issues as your rule all output files do not match your rule prokka output file.

However, one way to implement what you want to do is to use params to specify the output directory and use that as argument to the flag --outdir {params.outdir}.

A similar example is shown below:

PATIENTID_ls = [1,2]
PREFIX = "contigs500_anno9ref"

rule all:
    input:
        expand("results_{subjectID}_outputfolder/{prefix}.gff",subjectID=PATIENTID_ls, prefix=PREFIX), 

rule prokka:
    input:
        "contigs/contigs.fasta",
    params:
        outdir= "results_{subjectID}_outputfolder",
        prefix= PREFIX,
    output:
        "results_{subjectID}_outputfolder/{prefix}.gff",
    shell:
        "echo '{params.prefix}' > {params.outdir}/{PREFIX}.gff"

You still should specify a file as an output in rule prokka and in rule all. Based on the example on the prokka repo the output file is essentially {outdir}/{prefix}.gff. You can specify that as the output to both rule all and rule prokka without ever directly using it while invoking the command.


Alternatively even though there does not seem to be a reason for it, you could use a mock file to signify completion of the rule.

An example would be:

PATIENTID_ls = [1,2]
rule all:
    input:
        expand("results_{subjectID}_outputfolder/mockfile.txt",subjectID=PATIENTID_ls), 

rule prokka:
    input:
        "contigs/contigs.fasta",
    params:
        outdir= "results_{subjectID}_outputfolder",
        prefix= "contigs500_anno9ref",
    output:
        "results_{subjectID}_outputfolder/mockfile.txt",
    shell:
        "echo '{params.prefix}' && touch {params.outdir}/mockfile.txt"
Sign up to request clarification or add additional context in comments.

1 Comment

Thx JohnnyBD...the params did the trick. I prefer the upper solution, as a mock output file lacks specificity if a run was truly successful.
2

As @JohnnyBD mentioned, your major problem appears to be with rule all output not matching rule prokka. If you still need to use directory as output instead of a file, you may want to use directory() as it better handles edge cases.

1 Comment

Thx JeeYem! That indeed would have solved my problem too (and provides exactly what I was looking for). Thx also for the link to the documentation...I have missed it there.
2

You can create a variable refering to your output directory, and call it on the rule:

outputdir= "your_output_directory"

rule Align_Sequences:
    input: sequence.fasta
    output: outputdir + "/sequence_aligned.fasta"
    shell: "mafft  {input} > {output}"

1 Comment

how do you declare the rule all? or you don't one?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.