How can I output null-terminated strings in Awk?

Question

I'm working on a shell script that will be used by others, and may ingest suspect strings. It's based around awk, so as a basic resiliency measure, I want to have awk output null-terminated strings - the commands that will receive data from awk can thus avoid a certain amount of breakage from strings that contain spaces or not-often-found-in-English characters.

Unfortunately, from the basic awk documentation, I'm not getting how to tell awk to print a string terminated by an ASCII null instead of by a newline. How can I tell awk that I want null-terminated strings?

Versions of awk that might be used:

[user@server1]$ awk --version
awk version 20070501

[user@server2]$ awk -W version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan

[user@server3]$ awk -W version
GNU Awk 3.1.7

So pretty much the whole family of awk versions. If we have to consolidate on a version, it'll probably be GNU Awk, but answers for all versions are welcome since I might have to make it work across all of these awks. Oh, legacy scripts.

Best guide I've found so far: sandrotosi.blogspot.com/2011/09/… - but that's not quite a full answer, and also a random blogspot blog has less SEO juice than SO, so a good SO answer will be useful to more people. — Brighid McDonnell
– Brighid McDonnell, Commented Feb 3, 2012 at 18:06
Sorry, that uses \0 as the input separator. I'm having trouble getting awk to use it as the output separator. — Kevin
– Kevin, Commented Feb 3, 2012 at 18:37

score 28 · Accepted Answer · 2015-08-31 01:23:23Z

28

There are three alternatives:

Setting ORS to ASCII zero: Other solutions have awk -vORS=$'\0' but:
The $'\0' is a construct specific to some shells (bash,zsh).
So: this command awk -vORS=$'\0' will not work in most older shells.

There is the option to write it as: awk 'BEGIN { ORS = "\0" } ; { print $0 }', but that will not work with most awk versions.

Printing (printf) with character \0: awk '{printf( "%s\0", $0)}'
Printing directly ASCII 0: awk '{ printf( "%s%c", $0, 0 )}'

Testing all alternatives with this code:

#!/bin/bash

test1(){   # '{printf( "%s%c",$0,0)}'|
    a='awk,mawk,original-awk,busybox awk'
    IFS=',' read -ra line <<<"$a"
    for i in "${line[@]}"; do
        printf "%14.12s %40s" "$i" "$1"
        echo -ne "a\nb\nc\n" |
        $i "$1"|
        od -cAn;
    done
}

#test1 '{print}'
test1 'BEGIN { ORS = "\0" } ; { print $0 }'
test1 '{ printf "%s\0", $0}'
test1 '{ printf( "%s%c", $0, 0 )}'

We get this results:

            awk      BEGIN { ORS = "\0" } ; { print $0 }   a  \0   b  \0   c  \0
           mawk      BEGIN { ORS = "\0" } ; { print $0 }   a   b   c
   original-awk      BEGIN { ORS = "\0" } ; { print $0 }   a   b   c
    busybox awk      BEGIN { ORS = "\0" } ; { print $0 }   a   b   c
            awk                     { printf "%s\0", $0}   a  \0   b  \0   c  \0
           mawk                     { printf "%s\0", $0}   a   b   c
   original-awk                     { printf "%s\0", $0}   a   b   c
    busybox awk                     { printf "%s\0", $0}   a   b   c
            awk               { printf( "%s%c", $0, 0 )}   a  \0   b  \0   c  \0
           mawk               { printf( "%s%c", $0, 0 )}   a  \0   b  \0   c  \0
   original-awk               { printf( "%s%c", $0, 0 )}   a  \0   b  \0   c  \0
    busybox awk               { printf( "%s%c", $0, 0 )}   a   b   c

As it can be seen above, the first two solutions work only in GNU AWK.

The most portable is the third solution: '{ printf( "%s%c", $0, 0 )}'.

No solution work correctly in "busybox awk".

The versions used for this tests were:

          awk> GNU Awk 4.0.1
         mawk> mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
 original-awk> awk version 20110810
      busybox> BusyBox v1.20.2 (Debian 1:1.20.0-7) multi-call binary.

edited Aug 31, 2015 at 1:23

answered Aug 31, 2015 at 1:02

user2350426

Sign up to request clarification or add additional context in comments.

2 Comments

Brighid McDonnell Over a year ago

Many blessings on you for specifying the versions that you used! The problem that inspired this question has long since become Not Mine, but it does my heart good to see people leaving helpful, diligent answers. Well done.

Javier C Over a year ago

Thank you, the %c option was just what I was looking for. It's perfect that it doesn't depend on the current shell's escaping magic.

Kevin · Accepted Answer · 2012-02-03 18:43:58Z

22

Alright, I've got it.

awk '{printf "%s\0", $0}'

Or, using ORS,

awk -vORS=$'\0' //

answered Feb 3, 2012 at 18:43

Kevin

56.6k15 gold badges107 silver badges139 bronze badges

8 Comments

Brighid McDonnell Over a year ago

When I pipe the results of those incantations into xargs -0, it doesn't split on the \0 that awk is inserting (tested by splitting on something else). :(

Kevin Over a year ago

@SeanM The first seems not to work, but the second is working for me, are you quite sure the problem is in awk? (try saving the output from just that to a file)

dubiousjim Over a year ago

You can check awk's actual output by piping to od -cAn. I found that gawk would output the NUL bytes, but BusyBox awk and nawk on FreeBSD wouldn't. The sandrotosi.blogspot.com technique of printf "%c","" didn't work on those implementations either.

Christian Long Over a year ago

I had to use double-quotes for the -vORS argument awk -vORS=$"\0". This was with gawk 4.0.1.

ivan_pozdeev Over a year ago

-v isn't supported by BSD awk, e.g. the one in OSX. Neither inserting \0 into a string works in it, it's treated as the end of the string instead.

|

Macaronio · Accepted Answer · 2015-11-18 11:48:17Z

8

You can also pipe your awk's output through tr:

awk '{...code...}' infile | tr '\n' '\0' > outfile

Just tested, it works at least on Linux and FreeBSD.

If you cannot use newlines as separators (for example, if output records can contain newlines inside), just use some other character that's guaranteed not to appear inside a record, e.g. the one with code 1:

awk 'BEGIN { ORS="\001" } {...code...}' | tr '\001' '\0'

edited Nov 18, 2015 at 11:48

answered Nov 18, 2015 at 11:42

Macaronio

1461 silver badge2 bronze badges

3 Comments

Adam Katz Over a year ago

From what I've seen, this is the most portable and reliable answer. tr '\n' '\0' even works in busybox (unlike any use of null characters in busybox's awk). Rather than using \001 (Start of Heading), I recommend \036 (U+001e, Information Separator Two, a.k.a. Record Separator, RS) since the information separators are made for this purpose. (#2/RS maps to lines (awk's default ORS) while #1, Unit Separator, would be akin to awk's FS.) More at en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text

ivan_pozdeev Over a year ago

Since UNIX paths can contain any bytes except \0, you are not doing it right if you use anything else, even if you replace it with \0 afterwards: any inline bytes with the same code would be replaced, too.

tink Over a year ago

What Ivan said: \0 will also allow you to post-process your lines w/ e.g. xargs, which will fail if there's an embedded single quote in the line and it's not null-terminated. Adam's suggestion is a poor one.

potame · Accepted Answer · 2015-06-29 08:44:24Z

-1

I've solved printing ASCII 0 from awk. I use UNIX command printf "\000"

echo | awk -v s='printf "\000"' '{system(s);}'

edited Jun 29, 2015 at 8:44

potame

7,9434 gold badges29 silver badges35 bronze badges

answered Jun 6, 2013 at 8:20

suzhor

1

Collectives™ on Stack Overflow

How can I output null-terminated strings in Awk?

4 Answers 4

2 Comments

8 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

8 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related