Using multiple config file in snakemake results in not same wildcard error

Question

I had asked an earlier question about running the same snakemake pipeline for multiple datasets and one of the solutions mentioned was using multiple config files by @bli. I am trying to implement it but got an error when I have to read in a file which has sample information. error:

SyntaxError:
Not all output, log and benchmark files of rule fastqc contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
  File "Snakefile", line 64, in <module>

I have seen this error before but I cannot figure out why is it coming in this case when every input and output has sample as a wildcard. Any help is much appreciated!

My Snakefile looks like this:

import os
import pandas as pd
import yaml
configfile: "main_config.yaml"

all_keys = list(config.keys())
print(all_keys)



datasets  =config["datasets"]

print(datasets.items())


for p_id, p_info in datasets.items():
    for key in p_info:
     print(key + '------',p_info[key])
     conf_file=p_info["conf"]
     conf_fh=open(conf_file)
     dat_conf = yaml.safe_load(conf_fh)
     output = dat_conf["output_dir"]
     samples = dat_conf["sampletable"]
     R1 = dat_conf["R1"]
     R2 = dat_conf["R2"]
     print(samples)
     print(R1)


SampleTable = pd.read_table(samples,index_col=0)
SAMPLES = list(SampleTable.index)
print(SAMPLES)
PAIRED_END= ('R2' in SampleTable.columns)
FRACTIONS= ['R1']
if PAIRED_END: FRACTIONS+= ['R2']

qc = config["qc_only"]

def all_input_reads(qc):
    if config["qc_only"]:
        return expand("{output}/fastqc/{sample}" + config["R1"] + "_fastqc.html", sample=SAMPLES)
    else:
    return expand("{output}/fastqc/{sample}" + config["R1"] + "_fastqc.html", sample=SAMPLES)


rule all:
    input:
         all_input_reads


rule fastqc:
    input:
      unpack( lambda wc: dict(SampleTable.loc[wc.sample]))
    output:
      R1= "{output}/fastqc/{sample}{R1}" +"_fastqc.html",
      R2 ="{output}/fastqc/{sample}{R2}"  +"_fastqc.html"
    conda:
      "../envs/fastqc.yaml"
    log:
       "{output}/logs/qc/fastqc_{sample}_unfilt.log"
    shell: "fastqc -o {output}/fastqc {input.R1} {input.R2} >> {log}"

The main config file is :

  datasets:
    
     dat1:
          conf: "config_files/data1_config.yaml"
     dat2:
          conf: "config_files/data2_config.yaml"
     qc_only: FALSE

and the individual config files looks like this data1_config.yaml:

# List of files

sampletable: "samples_data1.tsv"
output_dir: "data1"
## Cutadapt
## IMPORTANT ****** If you want to remove primers uncomment line 51  in utils/rules/qc_cutadapt.smk which will allow for primers to be removed

primers:
# Illumina V3V4 protocol primers
fwd_primer: "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG"
rev_primer: "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC"
fwd_primer_rc: "CTGCWGCCNCCCGTAGGCTGTCTCTTATACACATCTGACGCTGCCGACGA"
rev_primer_rc: "GGATTAGATACCCBDGTAGTCCTGTCTCTTATACACATCTCCGAGCCCACGAGAC"


R1: "_R1"
R2: "_R2"

maxEE:
  - 2
  - 2
truncQ: 2

Dmitry Kuzminov · Accepted Answer · 2022-08-30 08:22:36Z

0

Here is your fastqc/output with little formatting:

rule fastqc:
    output:
        R1 = "{output}/fastqc/{sample}{R1}" + "_fastqc.html",
        R2 = "{output}/fastqc/{sample}{R2}" + "_fastqc.html"

R1 and R2 wildcards are synonyms, and there is no way for Snakemake to differenciate them. For example, imagine that the rule all requires this file to be created: output_dir/fastqc/sample_aR1_fastqc.html. Which variables should Snakemake assign this file to, output.R1 or output.R2?

You need to separate these parameters with a non-wildcard, like that:

rule fastqc:
    output:
        R1 = "{output}/fastqc/{sample}R1" + "_fastqc.html",
        R2 = "{output}/fastqc/{sample}R2" + "_fastqc.html"

answered Aug 30, 2022 at 8:22

Dmitry Kuzminov

6,6508 gold badges25 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using multiple config file in snakemake results in not same wildcard error

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related