I had asked an earlier question about running the same snakemake pipeline for multiple datasets and one of the solutions mentioned was using multiple config files by @bli. I am trying to implement it but got an error when I have to read in a file which has sample information. error:
SyntaxError:
Not all output, log and benchmark files of rule fastqc contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
File "Snakefile", line 64, in <module>
I have seen this error before but I cannot figure out why is it coming in this case when every input and output has sample as a wildcard. Any help is much appreciated!
My Snakefile looks like this:
import os
import pandas as pd
import yaml
configfile: "main_config.yaml"
all_keys = list(config.keys())
print(all_keys)
datasets =config["datasets"]
print(datasets.items())
for p_id, p_info in datasets.items():
for key in p_info:
print(key + '------',p_info[key])
conf_file=p_info["conf"]
conf_fh=open(conf_file)
dat_conf = yaml.safe_load(conf_fh)
output = dat_conf["output_dir"]
samples = dat_conf["sampletable"]
R1 = dat_conf["R1"]
R2 = dat_conf["R2"]
print(samples)
print(R1)
SampleTable = pd.read_table(samples,index_col=0)
SAMPLES = list(SampleTable.index)
print(SAMPLES)
PAIRED_END= ('R2' in SampleTable.columns)
FRACTIONS= ['R1']
if PAIRED_END: FRACTIONS+= ['R2']
qc = config["qc_only"]
def all_input_reads(qc):
if config["qc_only"]:
return expand("{output}/fastqc/{sample}" + config["R1"] + "_fastqc.html", sample=SAMPLES)
else:
return expand("{output}/fastqc/{sample}" + config["R1"] + "_fastqc.html", sample=SAMPLES)
rule all:
input:
all_input_reads
rule fastqc:
input:
unpack( lambda wc: dict(SampleTable.loc[wc.sample]))
output:
R1= "{output}/fastqc/{sample}{R1}" +"_fastqc.html",
R2 ="{output}/fastqc/{sample}{R2}" +"_fastqc.html"
conda:
"../envs/fastqc.yaml"
log:
"{output}/logs/qc/fastqc_{sample}_unfilt.log"
shell: "fastqc -o {output}/fastqc {input.R1} {input.R2} >> {log}"
The main config file is :
datasets:
dat1:
conf: "config_files/data1_config.yaml"
dat2:
conf: "config_files/data2_config.yaml"
qc_only: FALSE
and the individual config files looks like this data1_config.yaml:
# List of files
sampletable: "samples_data1.tsv"
output_dir: "data1"
## Cutadapt
## IMPORTANT ****** If you want to remove primers uncomment line 51 in utils/rules/qc_cutadapt.smk which will allow for primers to be removed
primers:
# Illumina V3V4 protocol primers
fwd_primer: "TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG"
rev_primer: "GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC"
fwd_primer_rc: "CTGCWGCCNCCCGTAGGCTGTCTCTTATACACATCTGACGCTGCCGACGA"
rev_primer_rc: "GGATTAGATACCCBDGTAGTCCTGTCTCTTATACACATCTCCGAGCCCACGAGAC"
R1: "_R1"
R2: "_R2"
maxEE:
- 2
- 2
truncQ: 2