get values according to columns list in unix

Question

I have file1 :

col1=val1|col2=val2|col3=val3|col4=val4
col1=val1|col2=val2|col4=val4|col5=val5|col6=val6
col1=val1|col3=val3|col4=val4|col6=val6
col1=val1|col2=val2|col3=val3|col4=val4|col5=val5|col6=val6

And unique column list in file2:

col1
col2
col3
col4
col5
col6

According to file2 columns sequence I need to gets its value from file1 in separate file using pipe delimiters.

output looks like:

val1|val2|val3|val4|||
val1|val2||val4|val5|val6
val1||val3|val4||val6
val1|val2|val3|val4|val5|val6

Ed Morton · Accepted Answer · 2017-03-18 14:17:51Z

1

Any time you have input data with name=value pairs, the best approach is to first create a name->value array and then print that array's contents by it's named indices. In this case the order of those names comes from a different file so just read that first:

$ cat tst.awk
BEGIN { FS="[=|]"; OFS="|" }
NR==FNR { outFldNames[++numOutFlds]=$0; next }
{
    delete name2val
    for (inFldNr=1; inFldNr<NF; inFldNr++) {
        name2val[$inFldNr] = $(inFldNr+1)
    }

    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
        printf "%s%s", name2val[outFldNames[outFldNr]], (outFldNr<numOutFlds ? OFS : ORS)
    }
}

$ awk -f tst.awk file2 file1
val1|val2|val3|val4||
val1|val2||val4|val5|val6
val1||val3|val4||val6
val1|val2|val3|val4|val5|val6

edited Mar 18, 2017 at 14:17

answered Mar 18, 2017 at 14:10

Ed Morton

35.9k6 gold badges25 silver badges60 bronze badges

This looks like my solution with inverted loops...

George Vasiliou
– George Vasiliou

2017-03-18 14:13:46 +00:00
Commented Mar 18, 2017 at 14:13
It's a similar approach but you're looping through up-to-all input fields once for every output field while I'm doing it once each so your efficiency for each line is something like X*Y/2 while mine is X+Y (and you have extraneous backslashes and semi-colons lurking at the end of several lines). The core/important part of mine, though, is the creation of the name2val array first before trying to do anything with the data - that's always the best approach when you have name->value mappings in the input data as it makes it clear and trivial to do whatever you want afterwards.

Ed Morton
– Ed Morton

2017-03-18 14:26:57 +00:00
Commented Mar 18, 2017 at 14:26
Ok. I need to study your solution more carefully. By the way, though is not a requirement, will this solution work with a different file2 having mixed col order (i.e as appears in my answer for testing)...?

George Vasiliou
– George Vasiliou

2017-03-18 14:54:38 +00:00
Commented Mar 18, 2017 at 14:54
1

Your solution is perfect. Works exactly as i had it in my mind on the very beginning but i could not make it work. Just for playing some code golf, this is an alternative snippet based on your technique. Same result , maybe a few chars less: awk -F"[=|]" 'NR==FNR{head[$1];next}{delete map;out="";for (i=1;i<=NF;i+=2) map[$i]=$(i+1)}{{for (k in head) printf("%s%s",map[k],"|")}print ""}' file2 file1

George Vasiliou
– George Vasiliou

2017-03-19 02:20:48 +00:00
Commented Mar 19, 2017 at 2:20
1

Yes, you are right. Order is random (luckilly enough output printed correctly during test) and you think right - there are some spurious | at end of each line :-)

George Vasiliou
– George Vasiliou

2017-03-20 10:30:53 +00:00
Commented Mar 20, 2017 at 10:30

| Show 2 more comments

Community · Accepted Answer · 2020-06-11 14:16:50Z

0

perl -wMstrict -Mvars='*A' -lne '
   if ( @ARGV ) { push @A, $_; }
   else {
      my %h = /([^|=]+)=([^|]+)/g;
      $,="|"; print map { $h{$_} // (($_ eq $A[-1]) ? q/|/ : q//) } @A;
   }
' file2 file1

Note the first line of output: There are 3 pipes here. Due to which the map logic is what it is.

output

val1|val2|val3|val4|||
val1|val2||val4|val5|val6
val1||val3|val4||val6
val1|val2|val3|val4|val5|val6

edited Jun 11, 2020 at 14:16

CommunityBot

1

answered Mar 17, 2017 at 12:13

user218374

Note the first line of output: There are 3 pipes here. - Is that part of the requirements or a bug? It doesn't seem to make any sense: with 3 pipes the first line would have 7 fields, while all other lines have 6.

Satō Katsura
– Satō Katsura

2017-03-17 12:44:14 +00:00
Commented Mar 17, 2017 at 12:44
@SatoK. Yes that's correct. The OP should put the requirements properly.

user218374
– user218374

2017-03-17 12:52:27 +00:00
Commented Mar 17, 2017 at 12:52

Add a comment |

ctx · Accepted Answer · 2017-03-17 15:52:27Z

0

$ cat file1
col1=val1|col2=val2|col3=val3|col4=val4
col1=val1|col2=val2|col4=val4|col5=val5|col6=val6
col1=val1|col3=val3|col4=val4|col6=val6
col1=val1|col2=val2|col3=val3|col4=val4|col5=val5|col6=val6

I changed file2 to demonstrate that columns not listed in file2 are omitted:

$ cat file2
col1
col2
col4
col5
col6

The script:

#!/bin/bash
patterns="$(tr '\n' '|' < file2| sed 's/|$//')"

awk -F'|' -v pat="$patterns" '{
  o=0
  for (i=1; i<=6; i++) {
    f=i-o
    split($f,a,"=")
    if ( a[1] ~ i ) {
      if ( a[1] ~ pat ) {
        printf "%s", a[2]
      }
      if (i != 6) {printf "|"}
    } else {
      printf "|"
      o++
    }

  }
  printf "\n"
}' file1

The output without col3 values:

$ ./script
val1|val2||val4|||
val1|val2||val4|val5|val6
val1|||val4||val6
val1|val2||val4|val5|val6

edited Mar 17, 2017 at 15:52

answered Mar 17, 2017 at 11:46

ctx

2,93513 silver badges19 bronze badges

Sadly, the output seems to differ from what the OP is asking for.

Satō Katsura
– Satō Katsura

2017-03-17 12:03:16 +00:00
Commented Mar 17, 2017 at 12:03
I added an explanation, I think it does what he wants.

ctx
– ctx

2017-03-17 12:43:17 +00:00
Commented Mar 17, 2017 at 12:43
Line 3 is missing val3...

Satō Katsura
– Satō Katsura

2017-03-17 12:45:27 +00:00
Commented Mar 17, 2017 at 12:45
There was another error, the script didn't print empty fields in the middle.

ctx
– ctx

2017-03-17 15:55:41 +00:00
Commented Mar 17, 2017 at 15:55

Add a comment |

George Vasiliou · Accepted Answer · 2017-03-17 20:50:19Z

This is a classic programming approach with awk and manual mapping:

$ awk -F"[=|]" 'NR==FNR{header[++c]=$1;next}\
 {
  record="";
  for (h=1;h<=c;h++) 
    {
      found="*";
      for (field=1;field<=NF;field+=2) \
        {
          if ($field==header[h]) 
             {found=$(field+1);break}
        };
      record=record "|" found;
    }
  print record
 }' file2 file1

#Output:
|val1|val2|val3|val4|*|*
|val1|val2|*|val4|val5|val6
|val1|*|val3|val4|*|val6
|val1|val2|val3|val4|val5|val6

For a different file2 - different columns order like

col6
col4
col3
col5
col2
col1

Output will follow accordingly:

|*|val4|val3|*|val2|val1
|val6|val4|*|val5|val2|val1
|val6|val4|val3|*|*|val1
|val6|val4|val3|val5|val2|val1

Stack Exchange Network

get values according to columns list in unix

4 Answers 4

output

You must log in to answer this question.

Hot Network Questions

get values according to columns list in unix

4 Answers 4

output

You must log in to answer this question.

Related

Hot Network Questions