0

I have an unknown number of input files that all match a search string, let's say *.dat, and all have 2 columns of data and equal number of rows. In bash I need to take the 2nd column in each file and append it as a new column in a singular merged file.

Eg:

>>cat File1.dat
1   A
2   B
3   C
>>cat File2.dat
4   D
5   E
6   F
>>cat combined.dat
A   D
B   E
C   F

Here is the code I have tried, the approach I have gone for is to try to loop and append:

for filename in $(ls *.dat); do paste combined.dat <(awk '{print $2}' $filename) >> combined.dat; done

The output format can be anything so long as its tab delimited, and the key is it must work on any number of input files up to...100 approx, where the number isn't known in advance.

2
  • Related: Process Substitution For Each Array Entry Commented Jun 4, 2020 at 10:50
  • I fixed two bugs (which only ocured on some systems) in my answer. Hope that everything works now. Please let me know if one of the commands works for you. Commented Jun 5, 2020 at 18:25

2 Answers 2

2

Awk

Since you already use awk, you could to the whole work in awk:

rm -f combined.dat
awk 'FNR<NR{d="\t"} {a[FNR]=a[FNR] d $2} END{for(i=1;i<=FNR;i++) print a[i]}' *.dat > combined.dat

"Classic" solution by repeated paste

You can repeatedly paste combined.dat and the next found file. The only tricky part is getting the first paste right where combined.dat does not exist or is empty. You could use an if, but that would be boring. Here we use a trick: paste acts like cat when used with only one argument. With arrays we can conveniently specify optional further arguments. We also used sponge from moreutils to make sure that combined.dat is not mangled due to concurrent reads and writes – if you don't want to install sponge you have to use a temporary file or variables instead.

rm -f combined.dat
p=()
for f in *.dat; do
  awk '{print $2}' "$f" | paste "${p[@]}" - | sponge combined.dat
  p=(combined.dat)
done

Hacky solution using a single paste

Alternatively, you could build a bash command and execute that. No worries, eval is save here as printf %q ensures correct quoting.

rm -f combined.dat
eval "paste $(printf "<(awk '{printf \$2}' %q) " *.dat) > combined.dat"
Sign up to request clarification or add additional context in comments.

Comments

0

Short draft, especially inserting the new lines and tabs could be optimized:

#!/bin/bash
nrLines=$(wc -l < `(ls *dat | head -1)` | xargs)
i=1
while [ ${i} -le ${nrLines} ];
do
    for file in $(ls *dat); do
            awk -v line=${i} 'NR==line {printf $2}' ${file} >> consolidatedreport.txt
            echo -en "\t" >> consolidatedreport.txt
    done
i=$[$i+1]
echo "" >> consolidatedreport.txt
done

Be careful that, dependent on how you output data to your new file and how you iterate over your existing files, you might end up iterating over your newly created file. So be sure to either use a different ending other than *dat if you iterate over all files with that ending (I used txt in the example) or place the resulting file in a subfolder.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.