Convert a bash array into an awk array

Question

I have an array in bash and want to use this array in an awk script. How can I pass the array from bash to awk?

The keys of the awk array should be the indices of the bash array. For simplicity, we can assume that the bash array is dense, that is, the array is not sparse like a=([3]=x [5]=y).

The elements inside the array can have any value. Besides strange unicode symbols and ascii control characters they may contain spaces or even newlines. Also, there might be empty ("") entries which should be retained. As an example consider the following array:

a=(AB " C  D " $'E\nF\tG' "¼ẞ🍕" "")

I'd argue this is the point where switch to more expressive languages. — chepner
– chepner, Commented Dec 2, 2019 at 13:27

dash-o · Accepted Answer · 2019-12-02 14:08:56Z

Extending approach #1 provided by Socowi, it is possible to address the shortcoming that he identified using the awk split function. Note that this solution does not use the stdin - it uses command line options - allowing awk to process stdin, files, etc.

The solution will convert the 'a' bash array into the 'a' awk, using intermediate awk file AVG (process substituion). This is a workaround to the bash limit that prevent NUL from being stored in a string.

a=(AB " C  D " $'E\nF\tG' "¼ẞ🍕" "")

awk -v AVF=<(printf '%s\0' "${a[@]}") '
BEGIN {
   # Temporary RS to allow reading the array with a single read.
   saveRS=RS
   RS=""
   getline AV < AVF
   rs = saveRS
   na=split(AV, a, "\\0")
   # Remove trailing empty element (printf add trailing separator).
   delete a[na]
   na-- ; for (i=1 ; i<=na ; i++ ) print "AV#", i, "=" a[i]
}{
   # Use a[x]
}
'

Output:

1 AB
2  C  D 
3 E
F   G
4 ¼ẞ🍕
5

Previous solution: For practical reason, Using the '\001' character as separator. make the script much easier (can use any other character sequence that is known not to appear in the info array). Bash command substitution does not allow NUL character. Hopefully, not a major issue, as this control character is not used for normal files, etc. I believe possible to solve this, but I'm not how.

The solution will convert the 'a' bash array into the 'a' awk, using intermediate awk variable 'AV'.

a=(AB " C  D " $'E\nF\tG' "¼ẞ🍕" "")

awk -v AV="$(printf '%s\1' "${a[@]}")" '
BEGIN {
   na=split(AV, a, "\\1") }
   # Remove trailing empty element (printf add trailing separator).
   delete a[na]
   for (i=1 ; i<=na ; i++ ) print "AV#", i, "=" a[i]
{
   # Use a[x]
}
'

Socowi · Accepted Answer · 2019-12-02 13:01:37Z

Approach 1: Reading in `awk`

Since the array elements can contain any character but the null byte (\0) we have to delimit them by \0. This is done with printf. For simplicity we assume that the array has at least one entry.

Due to the \0 we can no longer pass the string to awk as an argument but have to use (or emulate) a file instead. We then read that file in awk using \0 as the record separator RS (may require GNU awk).

awk 'BEGIN {RS="\0"} {a[n++]=$0; next}' <(printf %s\\0 "${a[@]}")

This reliably constructs the awk array a from the bash array a. The length of a is stored in n.

This approach is ugly when you actually want to use it. There is no simple step-by-step instruction on how to incorporate this approach into your existing awk script. Normally, your awk script would read another file afterwards, therefore you have to change the record separator RS after the array file was read. This can be done with NR>FNR. However, if your awk script already reads multiple files and relies on something like NR==FNR things get complicated.

Approach 2: Generating `awk` Code with `bash`

Instead of parsing the array in awk we hard-code the array by generating awk code. This code will be injected at the beginning of an existing awk script and initialize the array. This approach also supports sparse arrays and associative arrays and should work with all awk versions, not only GNU.

For the code generation we have to correctly quote all strings. For example, the code generator echo "a[0]=${a[0]}" would fail if ${a[0]} was " resulting in the code a[1]=""". POSIX awk supports octal escape sequences (\012) which can encode all bytes. We simply encoding everything. That way we cannot forget any special symbols (even though the generated code is a bit inefficient).

octString() {
    printf %s "$*" | od -bvAn | tr ' ' '\\' | tr -d '\n'
}
arrayToAwk() {
    printf 'BEGIN{'
    n=0
    for key in "${!a[@]}"; do
        printf 'a["%s"]="%s";' "$(octString "$key")" "$(octString "${a[$key]}")"
        ((n++))
    done
    echo "n=$n}"
}

The function arrayToAwk converts the bash array a (can be sparse or associative) into a BEGIN block. After inserting the generated code block at the begging of your existing awk program you can use the awk array a anywhere inside awk without having to adapt anything (assuming that the variable names a and n were unused before). n is the size of the awk array a.

For awk commands of the form awk ... 'program' ... use

awk ... "$(arrayToAwk)"'program' ...

For big arrays this might result in the error Argument list too long. You can circumvent this problem using a program file:

awk ... -f <(arrayToAwk; echo 'program') ...

For awk commands of the form awk ... -f progfile ... use

awk ... -f <(arrayToAwk; cat progfile) ...

kabanus · Accepted Answer · 2019-12-02 14:35:55Z

1

I'd like to point out that this can be extremely simple if you do not mind using ARGV and deleting all the non-file arguments. One way:

>cat awk_script.sh
#!/bin/awk -f

BEGIN{
    i=1
    while(ARGV[i] != "--" && i < ARGC) {
        print ARGV[i]
        delete ARGV[i]
        i++
    }
    if(i < ARGC)
        delete ARGV[i]
} {
    print "File 1 contains at 1",$1
}

Then run it with:

>./awk_script.sh "${a[@]}" -- file1
AB
 C  D
E
F       G
¼ẞ�

File 1 contains at 1 a

Obviously I'm missing some symbols.

Note while I like this method it assumes -- is not in the array, as pointed out by Oguz Ismail. They give a great alternate solution of having the first argument the length of your list.

This can be a one liner to where you have

awk 'BEGIN{... get and delete first arguments ...}{process files}END{if wanted} "${a[@]}" file1 file2...

but will become unreadable very quickly.

edited Dec 2, 2019 at 14:35

answered Dec 2, 2019 at 13:16

kabanus

26.3k7 gold badges48 silver badges79 bronze badges

3 Comments

oguz ismail Over a year ago

Instead of using -- as a marker, why don't you pass the length of bash array as first argument?

kabanus Over a year ago

@oguzismail That's completely valid, and I suppose a matter of preference - this is just how I do it. You can go ahead and edit that in as another option, or I will do so later. In any case thanks.

oguz ismail Over a year ago

Yeah, my only concern with -- is that an array containing -- would cause a false-positive and break the program. Anyways this is a good answer, ++

Collectives™ on Stack Overflow

Convert a bash array into an awk array

3 Answers 3

Comments

Approach 1: Reading in `awk`

Approach 2: Generating `awk` Code with `bash`

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Approach 1: Reading in awk

Approach 2: Generating awk Code with bash

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Approach 1: Reading in `awk`

Approach 2: Generating `awk` Code with `bash`