Aligning columns using an awk script file

Question

I'm trying to figure out how to write a .awk script that takes a .csv file as input and outputs it without commas and with columns aligned. So far I've tried this :

{ printf "%-10s %s\n", $1, $2, $3 ,$4 }

But this only outputs the data in the first two fields aligned. It does a good job of removing the comma delimiters but there's commas within double quotes in the fourth column that I wonder if will cause an issue. Any guidance is much appreciated I'm very new using awk.

Sample input is like:

Name,Last Name,Gender,Pet
Kit,Rattenberie,Male,"Crake, african black"
Cliff,Lakes,Male,"Red phalarope"
Tirrell,Stables,Male,"Rhea, greater"
Cherry,William,Female,"Crow, house"

Desired output will be something like:

Name    Last Name    Gender   Pet
Kit     Rattenberie  Male    "Crake, african black"
Cliff   Lakes        Male    "Red phalarope"
Tirrell Stables      Male    "Rhea, greater"
Cherry  William      Female  "Crow, house"

For a .csv file of 10 rows. Thanks in advance

sure thing, just added. I'm not sure the max width of each column but it's not more than 25 characters per cell. output delimiter would just be whatever amount of spaces aligns the data — AstralV
– AstralV, Commented Oct 11, 2022 at 21:01
Yeah I guess my example wasn't ideal. Since the words are varying lengths the amount of spaces would have to vary as well I suppose — AstralV
– AstralV, Commented Oct 11, 2022 at 21:10
parsing a general purpose CSV with awk isn't trivial, even with GNU awk. What are the border cases? Are there fields with a newline or a double quote as part of the data? Also. don't you need to double-quote "Last Name" in the output? — Fravadona
– Fravadona, Commented Oct 11, 2022 at 23:31

glenn jackman · Accepted Answer · 2022-10-12 03:53:49Z

Using miller we can transform the input data from CSV to "pretty print" format with a command line option:

mlr --c2p cat ./input

Name    Last Name   Gender Pet
Kit     Rattenberie Male   Crake, african black
Cliff   Lakes       Male   Red phalarope
Tirrell Stables     Male   Rhea, greater
Cherry  William     Female Crow, house

It drops the quotes though. The --barred option is nice too:

mlr --c2p --barred cat ./input

+---------+-------------+--------+----------------------+
| Name    | Last Name   | Gender | Pet                  |
+---------+-------------+--------+----------------------+
| Kit     | Rattenberie | Male   | Crake, african black |
| Cliff   | Lakes       | Male   | Red phalarope        |
| Tirrell | Stables     | Male   | Rhea, greater        |
| Cherry  | William     | Female | Crow, house          |
+---------+-------------+--------+----------------------+

An awk technique that's more programming: keep track of the max width of each column while you're reading the input file, then use that to print the data at the end: this is essentially re-implementing column -t

awk -v FPAT='"[^"]*"|[^,]+' '
    {
        for (i=1; i<=NF; i++) {
            data[NR][i] = $i
            if (length($i) > maxw[i]) maxw[i] = length($i)
        }
    }
    END {
        for (i=1; i<=NR; i++) {
            for (j=1; j<=length(data[i]); j++) {
                printf "%-*s  ", maxw[j], data[i][j]
            }
            printf "\n"
        }
    }
' ./input

Name     Last Name    Gender  Pet
Kit      Rattenberie  Male    "Crake, african black"
Cliff    Lakes        Male    "Red phalarope"
Tirrell  Stables      Male    "Rhea, greater"
Cherry   William      Female  "Crow, house"

Aah this looks good but I'm supposed to be using a .awk script

anubhava · Accepted Answer · 2022-10-11 21:37:16Z

3

Using gnu-awk, you can use this:

awk -v FPAT='"[^"]*"|[^,]+' '{
   for (i=1; i<=NF; ++i) $i = sprintf("%-12s", $i)} 1' file

Name     Last Name    Gender  Pet
Kit      Rattenberie  Male    "Crake, african black"
Cliff    Lakes        Male    "Red phalarope"
Tirrell  Stables      Male    "Rhea, greater"
Cherry   William      Female  "Crow, house"

Or if width is totally unpredictable then use this awk + column solution:

awk -v FPAT='"[^"]*"|[^,]+' -v OFS=';' '{$1=$1} 1' file |
column -s';' -t

Name     Last Name    Gender  Pet
Kit      Rattenberie  Male    "Crake, african black"
Cliff    Lakes        Male    "Red phalarope"
Tirrell  Stables      Male    "Rhea, greater"
Cherry   William      Female  "Crow, house"

If you want to create an awk script then use:

cat col.awk

BEGIN {
   FPAT="\"[^\"]*\"|[^,]+"
   OFS=";"
}
{$1 = $1}
1

Use it as:

awk -f col.awk file.csv | column -s';' -t

edited Oct 11, 2022 at 21:37

answered Oct 11, 2022 at 21:16

anubhava

790k67 gold badges603 silver badges671 bronze badges

6 Comments

AstralV Over a year ago

Hmm I'm running this from a .awk file so I'm getting some syntax errors, example run is like awk -F"," -f script.awk file.csv

anubhava Over a year ago

What's output of awk -v FPAT='"[^"]*"|[^,]+' -v OFS=';' '{$1=$1} 1' file.csv | column -s';' -t ?

AstralV Over a year ago

So I suppose I only put the FPAT='"[^"]*"|[^,]+' -v OFS=';' '{$1=$1} 1' in the BEGIN{} block in my script file and then apply column -t when invoking it? This brings errors though

anubhava Over a year ago

check my updated answer to create a col.awk script and use it

glenn jackman Over a year ago

If you want to declare FPAT in the BEGIN block, you can't use single quotes and have to escape doubles: BEGIN {FPAT = "\"[^\"]*\"|[^,]+"}

|

markp-fuso · Accepted Answer · 2022-10-11 22:14:05Z

One awk idea using *.awk script (per OP's comment), and having awk determine the max width of each column:

$ cat script.awk
BEGIN { FPAT="\"[^\"]*\"|[^,]+" }                            # instead of parsing on field delimiter (via FS) ... parse on field format via (FPAT)
      { for (i=1;i<=NF;i++)
            w[i]= length($i) > w[i] ? length($i) : w[i]      # keep track of max width of each column
        lines[FNR]=$0                                        # save entire line
      }
END   { for (i=1;i<=FNR;i++) {                               # loop through each saved line
            n=patsplit(lines[i],a)                           # reparse based on FPAT, storing fields in array a[]
            for (j=1;j<n;j++)                                # loop through array entries ...
                printf "%-*s%s", w[j], a[j], OFS             # printing to stdout
            print a[n]                                       # print last field plus "\n"
        }
      }

Or using a multi-dimensional array to store the input thus eliminating the 2nd parsing (via patsplit()) of the input data:

$ cat script.awk
BEGIN { FPAT="\"[^\"]*\"|[^,]+" }
      { for (i=1;i<=NF;i++) {
            w[i]= length($i) > w[i] ? length($i) : w[i]
            fields[FNR][i]=$i
        }
      }
END   { for (i=1;i<=FNR;i++) {
            for (j=1;j<NF;j++)
                printf "%-*s%s", w[j], fields[i][j], OFS
            print fields[i][NF]
        }
      }

NOTES:

assumes entire file can fit into memory (via the awk/lines[] or awk/fields[][] array)
requires GNU awk for FPAT and multi-dimensional array support

Both of these generate:

$ awk -f script.awk file
Name    Last Name   Gender Pet
Kit     Rattenberie Male   "Crake, african black"
Cliff   Lakes       Male   "Red phalarope"
Tirrell Stables     Male   "Rhea, greater"
Cherry  William     Female "Crow, house"

Nice and tidy. The final printf could just be print a[n] -- we don't really care how long it is

Collectives™ on Stack Overflow

Aligning columns using an awk script file

3 Answers 3

1 Comment

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related