0

I have file1 :

col1=val1|col2=val2|col3=val3|col4=val4
col1=val1|col2=val2|col4=val4|col5=val5|col6=val6
col1=val1|col3=val3|col4=val4|col6=val6
col1=val1|col2=val2|col3=val3|col4=val4|col5=val5|col6=val6

And unique column list in file2:

col1
col2
col3
col4
col5
col6

According to file2 columns sequence I need to gets its value from file1 in separate file using pipe delimiters.

output looks like:

val1|val2|val3|val4|||
val1|val2||val4|val5|val6
val1||val3|val4||val6
val1|val2|val3|val4|val5|val6

4 Answers 4

1

Any time you have input data with name=value pairs, the best approach is to first create a name->value array and then print that array's contents by it's named indices. In this case the order of those names comes from a different file so just read that first:

$ cat tst.awk
BEGIN { FS="[=|]"; OFS="|" }
NR==FNR { outFldNames[++numOutFlds]=$0; next }
{
    delete name2val
    for (inFldNr=1; inFldNr<NF; inFldNr++) {
        name2val[$inFldNr] = $(inFldNr+1)
    }

    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
        printf "%s%s", name2val[outFldNames[outFldNr]], (outFldNr<numOutFlds ? OFS : ORS)
    }
}

$ awk -f tst.awk file2 file1
val1|val2|val3|val4||
val1|val2||val4|val5|val6
val1||val3|val4||val6
val1|val2|val3|val4|val5|val6
7
  • This looks like my solution with inverted loops... Commented Mar 18, 2017 at 14:13
  • It's a similar approach but you're looping through up-to-all input fields once for every output field while I'm doing it once each so your efficiency for each line is something like X*Y/2 while mine is X+Y (and you have extraneous backslashes and semi-colons lurking at the end of several lines). The core/important part of mine, though, is the creation of the name2val array first before trying to do anything with the data - that's always the best approach when you have name->value mappings in the input data as it makes it clear and trivial to do whatever you want afterwards. Commented Mar 18, 2017 at 14:26
  • Ok. I need to study your solution more carefully. By the way, though is not a requirement, will this solution work with a different file2 having mixed col order (i.e as appears in my answer for testing)...? Commented Mar 18, 2017 at 14:54
  • 1
    Your solution is perfect. Works exactly as i had it in my mind on the very beginning but i could not make it work. Just for playing some code golf, this is an alternative snippet based on your technique. Same result , maybe a few chars less: awk -F"[=|]" 'NR==FNR{head[$1];next}{delete map;out="";for (i=1;i<=NF;i+=2) map[$i]=$(i+1)}{{for (k in head) printf("%s%s",map[k],"|")}print ""}' file2 file1 Commented Mar 19, 2017 at 2:20
  • 1
    Yes, you are right. Order is random (luckilly enough output printed correctly during test) and you think right - there are some spurious | at end of each line :-) Commented Mar 20, 2017 at 10:30
0
perl -wMstrict -Mvars='*A' -lne '
   if ( @ARGV ) { push @A, $_; }
   else {
      my %h = /([^|=]+)=([^|]+)/g;
      $,="|"; print map { $h{$_} // (($_ eq $A[-1]) ? q/|/ : q//) } @A;
   }
' file2 file1

Note the first line of output: There are 3 pipes here. Due to which the map logic is what it is.

output

val1|val2|val3|val4|||
val1|val2||val4|val5|val6
val1||val3|val4||val6
val1|val2|val3|val4|val5|val6
2
  • Note the first line of output: There are 3 pipes here. - Is that part of the requirements or a bug? It doesn't seem to make any sense: with 3 pipes the first line would have 7 fields, while all other lines have 6. Commented Mar 17, 2017 at 12:44
  • @SatoK. Yes that's correct. The OP should put the requirements properly. Commented Mar 17, 2017 at 12:52
0
$ cat file1
col1=val1|col2=val2|col3=val3|col4=val4
col1=val1|col2=val2|col4=val4|col5=val5|col6=val6
col1=val1|col3=val3|col4=val4|col6=val6
col1=val1|col2=val2|col3=val3|col4=val4|col5=val5|col6=val6

I changed file2 to demonstrate that columns not listed in file2 are omitted:

$ cat file2
col1
col2
col4
col5
col6

The script:

#!/bin/bash
patterns="$(tr '\n' '|' < file2| sed 's/|$//')"

awk -F'|' -v pat="$patterns" '{
  o=0
  for (i=1; i<=6; i++) {
    f=i-o
    split($f,a,"=")
    if ( a[1] ~ i ) {
      if ( a[1] ~ pat ) {
        printf "%s", a[2]
      }
      if (i != 6) {printf "|"}
    } else {
      printf "|"
      o++
    }

  }
  printf "\n"
}' file1

The output without col3 values:

$ ./script
val1|val2||val4|||
val1|val2||val4|val5|val6
val1|||val4||val6
val1|val2||val4|val5|val6
4
  • Sadly, the output seems to differ from what the OP is asking for. Commented Mar 17, 2017 at 12:03
  • I added an explanation, I think it does what he wants. Commented Mar 17, 2017 at 12:43
  • Line 3 is missing val3... Commented Mar 17, 2017 at 12:45
  • There was another error, the script didn't print empty fields in the middle. Commented Mar 17, 2017 at 15:55
0

This is a classic programming approach with awk and manual mapping:

$ awk -F"[=|]" 'NR==FNR{header[++c]=$1;next}\
 {
  record="";
  for (h=1;h<=c;h++) 
    {
      found="*";
      for (field=1;field<=NF;field+=2) \
        {
          if ($field==header[h]) 
             {found=$(field+1);break}
        };
      record=record "|" found;
    }
  print record
 }' file2 file1

#Output:
|val1|val2|val3|val4|*|*
|val1|val2|*|val4|val5|val6
|val1|*|val3|val4|*|val6
|val1|val2|val3|val4|val5|val6

For a different file2 - different columns order like

col6
col4
col3
col5
col2
col1

Output will follow accordingly:

|*|val4|val3|*|val2|val1
|val6|val4|*|val5|val2|val1
|val6|val4|val3|*|*|val1
|val6|val4|val3|val5|val2|val1

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.