0

I have multiple .vcf.gz files that look like this: (and there is 22 of them)

ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
...
ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz

And I have a script filter.sh which can run on one file that looks like this. How would I loop trough all those 22 files?

filter_and_convert ()
{
echo -ne "varID\t" 
bcftools view $1 -S $2 --force-samples -Ou |  bcftools query -l | tr '\n' '\t' | sed 's/\t$/\n/'       

#The first python inline script will check if a variant is blacklisted
NOW=$(date +%Y-%m-%d/%H:%M:%S)
echo "Starting at $NOW"
bcftools view -S $2 --force-samples $1 -Ou | \
bcftools query -f '%ID[\t%GT]\n' | \
awk '
{
for (i = 1; i <= NF; i++) {
    if (substr($i,0,1) == "c") {
        printf("%s",$i)
    } else if ( substr($i, 0, 1) == ".") {
        printf("\tNA")
    } else if ($i ~ "[0-9]|[0-9]") {
        n = split($i, array, "|")
        printf("\t%d",array[1]+array[2])
        } else {
        #printf("\t%s",$i)
        printf("Unexpected: %s",$i)
       exit 1
    }
}
printf("\n")
}
'

NOW=$(date +%Y-%m-%d/%H:%M:%S)
echo "Ending at $NOW"
}

filter_and_convert ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz  samples.txt
6
  • for file in *.vcf.gz; do filter_and_convert "$file"; done Commented Jan 31, 2020 at 16:52
  • do you mean put this line in my filter.sh script of run this on my filter.sh script like: for file in *.vcf.gz; do filter_and_convert "$file"; done where I would replace name of the file in my filter.sh script with $file? Commented Jan 31, 2020 at 17:02
  • You replace the last line of the script with that. Commented Jan 31, 2020 at 17:04
  • Oh, it also needs samples.txt after "$file". I didn't see that before. Commented Jan 31, 2020 at 17:05
  • 2
    What is $3 supposed to be in the function? You only give two arguments. Commented Jan 31, 2020 at 17:06

2 Answers 2

2

Replace

filter_and_convert ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz  samples.txt

with a for loop that calls the function on all the files that match a wildcard.

for file in ALL.*.vcf.gz; do
    filter_and_convert "$file"  samples.txt
done
Sign up to request clarification or add additional context in comments.

6 Comments

I did put your code at the end of my script and run it like this: sh filter.sh > output.txt Does this make sense if I want to have output of each 22 files processed in teh loop concatenated in the output.txt? Also this seems to run forever....it's been probably 10 hours since I started running it and it yet didn't finish. Is there is a more efficiant way to run this?
Put set -x at the beginning of the script so you can see what it's doing.
I did it hangs after this line: } printf("\n") } '
is there is any input on this or do you need more info?
I tried running this for one file and it completed. My question is to run this using the loop you provided do I run it like this: sh filter.sh > output.txt or sh filter.sh >> output.txt In other words how do I ensure that output of all 22 loop iterations is in this one file: output.txt?
|
0
v="ALL.chr"
p=".phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz"
for i in {1..22};
do
        file=$v$i$p
        bash filter.sh $file sample.txt
done

Use this file variable with your script. It should work. I am assuming first argument to your filer.sh is filename. Rest of the argument you can add

1 Comment

Thanks for gettign back to me. Can you please edit your answer with how exactly my filter.sh script would look like?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.