3

I'm using a bytea type in PostgreSQL, which, to my understanding, contains just a series of bytes. However, I can't get it to play well with nulls. For example:

=# select length(E'aa\x00aa'::bytea);
 length
--------
      2
(1 row)

I was expecting 5. Also:

=# select md5(E'aa\x00aa'::bytea);
               md5
----------------------------------
 4124bc0a9335c27f086f24ba207a4912
(1 row)

That's the MD5 of "aa", not "aa\x00aa". Clearly, I'm Doing It Wrong, but I don't know what I'm doing wrong. I'm also on an older version of Postgres (8.1.11) for reasons outside of my control. (I'll see if this behaves the same on the latest Postgres as soon as I get home...)

1
  • 1
    8.4 complains about invalid encoding when trying to build the string (before converting it to bytea) for me. Commented Jun 7, 2010 at 18:30

2 Answers 2

9

Try this:

# select length(E'aa\\000aa'::bytea);
 length
--------
      5

Updated: Why the original didn't work? First, understand the difference between one slash and two:

pg=# select E'aa\055aa', length(E'aa\055aa') ;
 ?column? | length
----------+--------
 aa-aa    |      5
(1 row)

pg=# select E'aa\\055aa', length(E'aa\\055aa') ;
 ?column? | length
----------+--------
 aa\055aa |      8

In the first case, I'm writing a literal string, 4 characters unescaped('a') and one escaped. The slash is consumed by the parser in a first pass, which converts the full \055 to a single char ('-' in this case).

In the second case, the first slash just escapes the second, the pair \\ is translated by the parser to a single \ and the 055 is seen as three characters.

Now, when converting a text to a bytea, escape characters (in a already parsed or produced text) are parsed/interpreted again! (Yes, this is confusing).

So, when I write

 select E'aa\000aa'::bytea;

in the first parsing, the literal E'aa\000aa' is converted to an internal text with a null character in the third position (and depending on your postgresql version, the null character is interpreted as an EOS, and the text is assumed to be of length two - or in other versions an illegal string error is thrown).

Instead, when I write

 select E'aa\\000aa'::bytea;

in the first parsing, the literal string "aa\000aa" (eight characters) is seen, and is asigned to a text; then in the casting to bytea, it is parsed again, and the sequence of characters '\000' is interpreted as a null byte.

IMO postgresql kind of sucks here.

Sign up to request clarification or add additional context in comments.

3 Comments

Ah - I think I see: Postgres strings can't represent binary data, so they use an escaped form, where a null is \000, which we must type as '\\000'. I see now the PQescapeBytea and PQunescapeBytea - I'm assuming that if I select a bytea column, I'll get this intermediate escaped form back, which I must then pass through PQunescapeBytea. This all makes more sense now.
Yes, thanks for your update. That confirms what I was thinking, and where you marked "(Yes, this is confusing.)" I wholeheartedly agree!
Sometimes it's simpler to specify base64 input for a binary literal, e.g. decode('AAAA','base64') instead of E'\\000\\000\\000'::bytea. 9.0 supports a simpler hex encoding for bytea values which is rather clearer (E'\\x000000' and no slavish ::bytea suffix needed)
0

You can use regular strings or dollar-quoted strings instead of escaped strings:

# select length('aa\000aa'::bytea);
 length 
════════
      5
(1 row)

# select length($$aa\000aa$$::bytea);
length 
════════
      5
(1 row)

I think that dollar-quoted strings are a better option because, if the configuration parameter standard_conforming_strings is off, then PostgreSQL recognizes backslash escapes in both regular and escape string constants. However, as of PostgreSQL 9.1, the default is on, meaning that backslash escapes are recognized only in escape string constants.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.