3

I have a Java application that generates ids for primary keys by encoding cryptographically random bytes to base32 (RFC4648).

How do I do the same with PostgreSQL from an SQL script?

The gen_random_bytes seems to do the job for generating the random bytes but it seems there isn't anything available to encode them to base32.

2 Answers 2

3

I searched this exact questions a few years back and finally built a pure plpgsql schema with functions to handle this and figured I'd leave it here if anyone comes across it.

credit to this video and RFC https://www.youtube.com/watch?v=Va8FLD-iuTg https://www.rfc-editor.org/rfc/rfc4648

here is the extension https://github.com/pyramation/base32/blob/master/packages/base32/sql/launchql-base32--0.0.1.sql

once you install it, you can do

select base32.encode('foo');
-- MZXW6===


select base32.decode('MZXW6===');
-- foo
Sign up to request clarification or add additional context in comments.

2 Comments

It works nice with text as input. I would like instead something that works with bytea. similar to encode(gen_random_bytes(30), 'base64') but using base32. With the functions in your extension I have to convert the random bytes to text and then encode and it works just fine. Is there a way to encode the bytes directly?
Yes, there is a way! I tried working on an implementation, but got little bit lost in the bits, I will probably eventually get around to it and post the solution here. Essentially instead of using strings as I do to break things into bytes, bits, I could have used native bytea and set_bit and get_bit functions in pure plpgsql
3

Was curious if it was possible to convert bytea of any length to base32. Here's sql code with a couple of helper functions to ease the understanding. It appears all used built-in functions are marked immutable so the helpers should inline easily.

Tested on Postgres 14:

create or replace function public.bytea_to_varbit(p_bytes bytea)
returns varbit as $$
-- https://stackoverflow.com/a/56008861/1671505
select right(p_bytes::text, -1)::varbit;
$$ language sql immutable;

create or replace function public.varbit40_to_base32(p_num varbit(40))
returns text as $$
                    -- https://www.rfc-editor.org/rfc/rfc4648#section-6
    select string_agg(('{"A","B","C","D","E","F","G","H"
                        ,"I","J","K","L","M","N","O","P"
                        ,"Q","R","S","T","U","V","W","X"
                        ,"Y","Z","2","3","4","5","6","7"}'::text[])
                     -- shift right to 5 lsb and mask off the rest
                        [(p_num >> s & 31::bit(40))::bigint::int + 1]
                    -- concatenate resulting characters in order from left to right
                        , '' order by o)
        -- generate bit-shifts to move all five-bit positions to the right
      from generate_series(35, 0, -5) with ordinality m(s, o)
$$ language sql immutable;

create or replace function public.bytea_to_base32(p_val bytea, p_omit_padding bool = true)
returns text as $$
with v(val, padding, trm) as (
    select string_agg(varbit40_to_base32(bytea_to_varbit(substring (padded from b for 5))), '' order by o)
         -- https://www.rfc-editor.org/rfc/rfc4648#section-6 (1)-(5):
         , ('{"","======","====","===","="}'::text[])[rem + 1]
         , ('{0,6,4,3,1}'::int[])[rem + 1]
      from (values(length(p_val), length(p_val) % 5 
        -- pad length to be divisible by 5
         , p_val || substring(bytea '\x0000000000' from 1 for (5 - length(p_val) % 5) % 5)) 
           ) v(len, rem, padded)
        -- rfc 4648 6: breakpoints for breaking the byte string into 40-bit groups
         , generate_series(1, len + (5 - rem) % 5, 5) with ordinality c(b,o)
     group by p_val, rem
)                
-- rfc4648 3.2
select case when p_omit_padding then
            case when trm > 0 then left(val, -trm) else val end
       else
            case when trm > 0 then overlay(val placing padding from length(val)-trm+1) else val end
       end
       from v

       union all
       -- https://www.rfc-editor.org/rfc/rfc4648#section-10: empty input returns empty string
       select ''
 limit 1;
$$ language sql immutable strict;

/*
https://www.rfc-editor.org/rfc/rfc4648#section-10
select inp, expected, bytea_to_base32(inp::bytea, omit_padding)
  from (values
         ('', '', true)
       , ('f', 'MY', true)
       , ('fo', 'MZXQ', true)
       , ('foo', 'MZXW6', true)
       , ('foob', 'MZXW6YQ', true)
       , ('fooba', 'MZXW6YTB', true)
       , ('foobar', 'MZXW6YTBOI', true)
       , ('', '', false)
       , ('f', 'MY======', false)
       , ('fo', 'MZXQ====', false)
       , ('foo', 'MZXW6===', false)
       , ('foob', 'MZXW6YQ=', false)
       , ('fooba', 'MZXW6YTB', false)
       , ('foobar', 'MZXW6YTBOI======', false)
       ) t(inp, expected, omit_padding)
  where bytea_to_base32(inp::bytea, omit_padding) <> expected
;
*/

Essentially, you need to break down your input into 5-byte groups that in turn get broken into eight 5-bit strings from left to right.

The easiest way is to generate breakpoint positions with generate_series and then chunk the byte string with substring(x from b for desired_len).

The weird double remainder (5-len()%5)%5) is to clamp the length of padding correctly (otherwise for reminder 0 you get an extra 5-byte padding, since 5-0 = 5, but you really want 0-length padding for that case).

I find that array literal lookups (the ({'A', ... }::text[])[position_calculation] part) make the code a bit more obvious, if somewhat more noisy.

To generate base_32_hex per RFC4648 section 7 you only need to replace the alphabet array in varbit_to_base32. Or you could have the alphabet as input to varbit40_to_base32 and pass the right alphabet from the caller.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.