2

I have a directory data in which there are several fastqs like below:

SRR13456784_1.fastq
SRR13456784_2.fastq
SRR13456784_3.fastq

SRR13456785_1.fastq
SRR13456785_2.fastq
SRR13456785_3.fastq

SRR13456786_1.fastq
SRR13456786_2.fastq
SRR13456786_3.fastq

SRR19876543_1.fastq
SRR19876543_2.fastq
SRR19876543_3.fastq

SRR19876544_1.fastq
SRR19876544_2.fastq
SRR19876544_3.fastq

I have a details.txt delimited file, in which there two columns ID and Sample. I wanted to concatenate the ID fastqs of the matching sample and give the Sample name for the output.

    ID        Sample
SRR13456784 GJK1234567
SRR13456785 GJK1234567
SRR13456786 GJK1234567
SRR19876543 GJK2444103
SRR19876544 GJK2444103

For one of the file I concatenated like below:

cat SRR13456784_1.fastq SRR13456785_1.fastq SRR13456786_1.fastq > GSK1234567_1.fastq

cat SRR13456784_2.fastq SRR13456785_2.fastq SRR13456786_2.fastq > GSK1234567_2.fastq

cat SRR13456784_3.fastq SRR13456785_3.fastq SRR13456786_3.fastq > GSK1234567_3.fastq

The above txt file is an example, but in my original file there are 300 IDs matching to 50 samples.

Can anyone tell me how to do this concatenation and give Sample name for the output in a single script? Thank you.

2
  • Are the ID's for one given Sample always in (numerical) increasing order like in your sample. Do numbers always have the same number of digits (8 for SSRx, 1 for the _x suffix, 7 for the GJKx)? Commented Apr 27, 2023 at 11:55
  • @StéphaneChazelas Sample is not always in increasing order. Yes 8 for SRRx and yes 7 for GJKx and 1 for the _x.fastq Commented Apr 27, 2023 at 12:10

1 Answer 1

1

You can do something like this:

$ tail -n +2 details.txt | 
   while read -r id sample; do 
     for i in {1..3}; do 
       cat < "${id}_${i}".fastq >> "${sample}_${i}".fastq
     done
   done

The tail +2 is needed to skip your header (ID Sample). Then, we iterate over the remaining lines, saving the id and sample in the corresponding variables, and then have a second loop that iterates over the numbers 1 through 3, and concatenates the relevant files. The commands that would be run from your example input are:

$ tail -n +2 details.txt | while read -r id sample; do for i in {1..3}; do echo "cat \"${id}_${i}\".fastq >> \"${sample}_${i}\".fastq"; done; done
cat "SRR13456784_1".fastq >> "GJK1234567_1".fastq
cat "SRR13456784_2".fastq >> "GJK1234567_2".fastq
cat "SRR13456784_3".fastq >> "GJK1234567_3".fastq
cat "SRR13456785_1".fastq >> "GJK1234567_1".fastq
cat "SRR13456785_2".fastq >> "GJK1234567_2".fastq
cat "SRR13456785_3".fastq >> "GJK1234567_3".fastq
cat "SRR13456786_1".fastq >> "GJK1234567_1".fastq
cat "SRR13456786_2".fastq >> "GJK1234567_2".fastq
cat "SRR13456786_3".fastq >> "GJK1234567_3".fastq
cat "SRR19876543_1".fastq >> "GJK2444103_1".fastq
cat "SRR19876543_2".fastq >> "GJK2444103_2".fastq
cat "SRR19876543_3".fastq >> "GJK2444103_3".fastq
cat "SRR19876544_1".fastq >> "GJK2444103_1".fastq
cat "SRR19876544_2".fastq >> "GJK2444103_2".fastq
cat "SRR19876544_3".fastq >> "GJK2444103_3".fastq
5
  • thanq. I see this tail: cannot open '+2' for reading: No such file or directory Commented Apr 27, 2023 at 12:43
  • Ah, sorry, I guess that's a GNU thing (please remember to always mention your operating system). Does tail -n +2 work? Alternatively, try awk 'NR>1' details.txt | while ... Commented Apr 27, 2023 at 12:45
  • @stack_learner what are you running? That doesn't make sense from the code I gave you: you would get that message if you quoted the entire command somehow, like: 'cat "SRR13456784_1".fastq >> "GJK1234567_1".fastq' Commented Apr 27, 2023 at 12:51
  • @stack_learner you are using the wrong command. That was just to show what would be executed. The code you should use is the first one. Commented Apr 27, 2023 at 12:57
  • ok I missed -r. it worked now. thanks a lot Commented Apr 27, 2023 at 12:57

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.