1

First off, I'm sorry if I'm not explaining my problem clearly, English is not my native language.

I'm trying to make a snakemake rule that takes a fastq file and filters it with a program called Filtlong. I have multiple fastq files on which I want to run this rule and it should output a filtered file per fastq file but apparently it takes all of the fastq files as input for a single Filtlong command.

The fastq files are in separate directories and the snakemake rule should write the filtered files to separate directories aswell.

This is how my code looks right now:

from os import listdir

configfile: "config.yaml"

DATA = config["DATA"]
SAMPLES = listdir(config["RAW_DATA"])
RAW_DATA = config["RAW_DATA"]
FILT_DIR = config["FILTERED_DIR"]

rule all:
    input:
        expand("{FILT_DIR}/{sample}/{sample}_filtered.fastq.gz", FILT_DIR=FILT_DIR, sample=SAMPLES)

rule filter_reads:
    input:
        expand("{RAW_DATA}/{sample}/{sample}.fastq", sample=SAMPLES, RAW_DATA=RAW_DATA)
    output:
        "{FILT_DIR}/{sample}/{sample}_filtered.fastq.gz"
    shell:
        "filtlong --keep_percent 90 --target_bases 300000000 {input} | gzip > {output}"

And this is the config file:

DATA:
  all_samples

RAW_DATA:
  all_samples/raw_samples

FILTERED_DIR:
  all_samples/filtered_samples

The separate directories with the fastq files are in RAW_DATA and the directories with the filtered files should be in FILTERED_DIR,

When I try to run this, I get an error that looks something like this:

Error in rule filter_reads:
    jobid: 30
    output: all_samples/filtered_samples/cell_18-07-19_barcode10/cell_18-07-19_barcode10_filtered.fastq.gz
    shell:
        filtlong --keep_percent 90 --target_bases 300000000 all_samples/raw_samples/cell3_barcode11/cell3_barcode11.fastq all_samples/raw_samples/barcode01/barcode01.fastq all_samples/raw_samples/barcode03/barcode03.fastq all_samples/raw_samples/barcode04/barcode04.fastq all_samples/raw_samples/barcode05/barcode05.fastq all_samples/raw_samples/barcode06/barcode06.fastq all_samples/raw_samples/barcode07/barcode07.fastq all_samples/raw_samples/barcode08/barcode08.fastq all_samples/raw_samples/barcode09/barcode09.fastq all_samples/raw_samples/cell3_barcode01/cell3_barcode01.fastq all_samples/raw_samples/cell3_barcode02/cell3_barcode02.fastq all_samples/raw_samples/cell3_barcode03/cell3_barcode03.fastq all_samples/raw_samples/cell3_barcode04/cell3_barcode04.fastq all_samples/raw_samples/cell3_barcode05/cell3_barcode05.fastq all_samples/raw_samples/cell3_barcode06/cell3_barcode06.fastq all_samples/raw_samples/cell3_barcode07/cell3_barcode07.fastq all_samples/raw_samples/cell3_barcode08/cell3_barcode08.fastq all_samples/raw_samples/cell3_barcode09/cell3_barcode09.fastq all_samples/raw_samples/cell3_barcode10/cell3_barcode10.fastq all_samples/raw_samples/cell3_barcode12/cell3_barcode12.fastq all_samples/raw_samples/cell_18-07-19_barcode01/cell_18-07-19_barcode01.fastq all_samples/raw_samples/cell_18-07-19_barcode02/cell_18-07-19_barcode02.fastq all_samples/raw_samples/cell_18-07-19_barcode03/cell_18-07-19_barcode03.fastq all_samples/raw_samples/cell_18-07-19_barcode04/cell_18-07-19_barcode04.fastq all_samples/raw_samples/cell_18-07-19_barcode05/cell_18-07-19_barcode05.fastq all_samples/raw_samples/cell_18-07-19_barcode06/cell_18-07-19_barcode06.fastq all_samples/raw_samples/cell_18-07-19_barcode07/cell_18-07-19_barcode07.fastq all_samples/raw_samples/cell_18-07-19_barcode08/cell_18-07-19_barcode08.fastq all_samples/raw_samples/cell_18-07-19_barcode09/cell_18-07-19_barcode09.fastq all_samples/raw_samples/cell_18-07-19_barcode10/cell_18-07-19_barcode10.fastq all_samples/raw_samples/cell_18-07-19_barcode11/cell_18-07-19_barcode11.fastq all_samples/raw_samples/cell_18-07-19_barcode12/cell_18-07-19_barcode12.fastq all_samples/raw_samples/cell_18-07-19_barcode13/cell_18-07-19_barcode13.fastq all_samples/raw_samples/cell_18-07-19_barcode14/cell_18-07-19_barcode14.fastq all_samples/raw_samples/cell_18-07-19_barcode15/cell_18-07-19_barcode15.fastq all_samples/raw_samples/cell_18-07-19_barcode16/cell_18-07-19_barcode16.fastq all_samples/raw_samples/cell_18-07-19_barcode17/cell_18-07-19_barcode17.fastq all_samples/raw_samples/cell_18-07-19_barcode18/cell_18-07-19_barcode18.fastq all_samples/raw_samples/cell_18-07-19_barcode19/cell_18-07-19_barcode19.fastq | gzip > all_samples/filtered_samples/cell_18-07-19_barcode10/cell_18-07-19_barcode10_filtered.fastq.gz
        (exited with non-zero exit code)

As far as I can tell, the rule takes all of the fastq files as input for a single Filtlong command, but I don't quite understand why

1 Answer 1

2

You shouldn't use the expand function in your input section of the filter_reads rule. What you are doing now is requiring all your samples to be the input of each filtered file: that is what you can observe in your error message.

There is another complication that you introduce out of nothing: you mix both wildcards and variables. In your example the {FILT_DIR} is just a predefined value while the {sample} is a wildcard that Snakemake uses to match the rules. Try the following (pay special attention on single/double brackets and on the formatted string (the one that has the form f"")):

rule filter_reads:
    input:
        f"{RAW_DATA}/{{sample}}/{{sample}}.fastq"
    output:
        f"{FILT_DIR}/{{sample}}/{{sample}}_filtered.fastq.gz"
    shell:
        "filtlong --keep_percent 90 --target_bases 300000000 {input} | gzip > {output}"
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this worked! I didn't realize that the expand function was causing all my samples to be the input. I see that now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.