Javascript encodeURI like function in postgresql?

Question

Is there any function/stored procedure in PostgreSQL/plpgsql which is same as the javascripts encodeURI?

What does it mean? Javascript have a handy built in function to encode any kind of url:

encodeURI(url) -> returns the encoded url

For example: encodeURI('http://hu.wikipedia.org/wiki/São_Paulo') -> returns a String which is "http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo"

I looking for exactly the same.

I don't want to encode each parameters separately. I don't want a function like javascript encodeURIComponent which is not the same. The example above results a different output with

encodeURIComponent('http://hu.wikipedia.org/wiki/São_Paulo')

-> "http%3A%2F%2Fhu.wikipedia.org%2Fwiki%2FS%C3%A3o_Paulo"

It's encode the whole string not just the path part. So this is not what I'm looking for. I need a plpgsql function which results equivalent output to javascript function encodeURI.

Thanks!

vyegorov · Accepted Answer · 2012-04-25 18:58:43Z

16

Slow and inefficient, consider doing C version of this function:

CREATE OR REPLACE FUNCTION urlencode(in_str text, OUT _result text)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _i      int4;
    _temp   varchar;
    _ascii  int4;
BEGIN
    _result = '';
    FOR _i IN 1 .. length(in_str) LOOP
        _temp := substr(in_str, _i, 1);
        IF _temp ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _temp;
        ELSE
            _ascii := ascii(_temp);
            IF _ascii > x'07ff'::int4 THEN
                RAISE EXCEPTION 'Won''t deal with 3 (or more) byte sequences.';
            END IF;
            IF _ascii <= x'07f'::int4 THEN
                _temp := '%'||to_hex(_ascii);
            ELSE
                _temp := '%'||to_hex((_ascii & x'03f'::int4)+x'80'::int4);
                _ascii := _ascii >> 6;
                _temp := '%'||to_hex((_ascii & x'01f'::int4)+x'c0'::int4)
                            ||_temp;
            END IF;
            _result := _result || upper(_temp);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

Results:

# select urlencode('http://hu.wikipedia.org/wiki/São_Paulo');
-[ RECORD 1 ]------------------------------------------
urlencode | http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo

answered Apr 25, 2012 at 18:58

vyegorov

23k7 gold badges61 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Roki Over a year ago

Oh, thank you! I'm searching for a built in function or a postgres library to install which contains this kind of function. I'm optimistic, an I think somebody did it before us, becaue it is a simple/everyday problem... I did also nearly the same stored procedure, but as you wrote too, it's inefficent and slow. And yes as you mentioned native C version is the only way to make it scream... So, my question is about what to install, what to use, which is the best library for this, not write a code type... :-)

Taras Melon Over a year ago

@vyegorov What does it mean when I reach exception 'Won''t deal with 3 (or more) byte sequences.'?

vyegorov Over a year ago

@TarasMelon, function is limited, you are free to improve it. Or use extension from the accepted answer.

Community · Accepted Answer · 2021-10-07 06:15:07Z

This is a very old thread, but surprisingly none of the previously posted solutions seem to comply with the relevant RFC 3986. So here are two PostgreSQL functions encode_uri (if you want to encode a complete URI) and encode_uri_component (if you want to encode just a component of the URI, e.g. the key or value of a query parameter) that combines Nick's buggy solution, ElDiabolo's note, and Kev's partial solution from a related thread into a single working SQL-only solution.

How it works: first, explode the string into single characters, then hex-encode each multibyte character and and each disallowed character.

Encode URI Component:

create or replace function encode_uri_component(text) returns text as $$
    select string_agg(
        case
            when bytes > 1 or c !~ '[0-9a-zA-Z_.!~*''()-]+' then 
                regexp_replace(encode(convert_to(c, 'utf-8')::bytea, 'hex'), '(..)', E'%\\1', 'g')
            else 
                c
        end,
        ''
    )
    from (
        select c, octet_length(c) bytes
        from regexp_split_to_table($1, '') c
    ) q;
$$ language sql immutable strict;

Encode Complete URI:

create or replace function encode_uri(text) returns text as $$
    select string_agg(
        case
            when bytes > 1 or c !~ '[0-9a-zA-Z_.!~*''();,/?:@&=+$#-]+' then 
                regexp_replace(encode(convert_to(c, 'utf-8')::bytea, 'hex'), '(..)', E'%\\1', 'g')
            else 
                c
        end,
        ''
    )
    from (
        select c, octet_length(c) bytes
        from regexp_split_to_table($1, '') c
    ) q;
$$ language sql immutable strict;

karel · Accepted Answer · 2021-10-20 09:00:35Z

7

I wrote PostgreSQL extensions url_encode that solves this issue.

postgres=# select url_encode('http://hu.wikipedia.org/wiki/São_Paulo');
                      url_encode                       
───────────────────────────────────────────────────────
http%3A%2F%2Fhu.wikipedia.org%2Fwiki%2FS%C3%A3o_Paulo

or

postgres=# select uri_encode('http://hu.wikipedia.org/wiki/São_Paulo');
               uri_encode                  
---------------------------------------------
http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo

edited Oct 20, 2021 at 9:00

karel

5,96562 gold badges58 silver badges60 bronze badges

answered Apr 29, 2012 at 14:21

Pavel Stehule

46.6k6 gold badges103 silver badges102 bronze badges

5 Comments

Roki Over a year ago

Maybe I modify it a little bit... It has a bug, after protocol (http)... If you add c == ':' on the 95th line, it will work perfectly... (and line 154 also... to be consistent)

Pavel Stehule Over a year ago

This is only initial version - I'll add a functions uri_encode and uri_decode, cannot to modify like you propose, because : is reserved chars.

Pavel Stehule Over a year ago

I uploaded new version that support same semantic like javascript functions.

Enwired Over a year ago

There is a more recent postgres extension for adding the URI datatype to postgres. It might be a good choice: github.com/petere/pguri

Antony Gibbs Over a year ago

@Enwired strangely enough, the pguri package is more recent and has more functions but fails to encode uri with accents. See github.com/petere/pguri/issues/8

Neil McGuigan · Accepted Answer · 2014-09-30 18:21:13Z

7

With PL/V8...cheating?

create function encode_uri(text) returns text language plv8 strict immutable as $$
  return encodeURI($1);
$$;

edited Sep 30, 2014 at 18:21

answered Sep 24, 2014 at 22:35

Neil McGuigan

48.5k12 gold badges130 silver badges156 bronze badges

1 Comment

Antony Gibbs Over a year ago

And here the expression "Using a Bazooka to kill a Fly" takes all it's sense. Consider 3~5Gb of disk space and plenty, plenty cpu to compile this plv8...

tsohr · Accepted Answer · 2016-11-23 11:10:09Z

Today I encountered "Won't deal with 3 (or more) byte sequences." for Korean characters while I've been using @vyegorov 's answer quite a long time, more than a year, need to change it just dumps the bytea hex strings with "%" prefixed.

CREATE OR REPLACE FUNCTION urlencode(in_str text, OUT _result text)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _i      int4;
    _temp   varchar;
    _hex    varchar;
    _ascii  int4;
BEGIN
    _result = '';
    FOR _i IN 1 .. length(in_str) LOOP
        _temp := substr(in_str, _i, 1);
        IF _temp ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _temp;
        ELSE
            _hex := encode(_temp::bytea, 'hex');
            _temp := '';
            WHILE LENGTH(_hex) > 0 LOOP
                _temp := _temp || '%' || SUBSTRING(_hex, 1, 2);
                _hex := SUBSTRING(_hex, 3, 999);
            END LOOP;
            _result := _result || upper(_temp);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

example,

SELECT urlencode('a') UNION ALL  --> "a"
SELECT urlencode('À') UNION ALL  --> "%C3%80"
SELECT urlencode('Ā') UNION ALL  --> "%C4%80"
SELECT urlencode('ə') UNION ALL  --> "%C9%99"
SELECT urlencode('α') UNION ALL  --> "%CE%B1"
SELECT urlencode('가') UNION ALL --> "%EA%B0%80"
SELECT urlencode('上') UNION ALL --> "%E4%B8%8A"
SELECT urlencode('い')           --> "%E3%81%84"

ElDiabolo · Accepted Answer · 2019-07-01 19:28:14Z

2

The usage of ::bytea in the answers of tsohr and Nick is wrong. Do a SELECT '\'::bytea to see why.

In both cases convert_to(x, 'utf-8') will give the desired result.

answered Jul 1, 2019 at 19:28

ElDiabolo

212 bronze badges

Comments

dawgman · Accepted Answer · 2020-09-24 19:55:50Z

2

PLPython3 offers a straightforward approach.

Install plpython3 if you don't have it already. Example package install for the command line:

apt-get install postgresql-plpython3-9.5

Create the language (if you haven't done it already for other functions):

CREATE LANGUAGE plpython3u;

Simple function after that:

CREATE OR REPLACE FUNCTION encode_uri(input_string text)
  RETURNS text
AS $$
  import urllib.parse
  return urllib.parse.quote_plus(input_string, safe='~@#$&()*!+=:;,.?/\\''')
$$ LANGUAGE plpython3u immutable strict;

answered Sep 24, 2020 at 19:55

dawgman

211 bronze badge

Comments

Nick · Accepted Answer · 2017-03-30 21:33:08Z

1

Here is "pure SQL" (no plv8, plpython or even plpgsql is needed) implementation supporting multibyte chars (including 3- and 4-byte emoji):

create or replace function urlencode(text) returns text as $$
  select
    string_agg(
      case
        when ol>1 or ch !~ '[0-9a-zA-Z:/@._?#-]+' 
          then regexp_replace(upper(substring(ch::bytea::text, 3)), '(..)', E'%\\1', 'g')
        else ch
      end,
      ''
    )
  from (
    select ch, octet_length(ch) as ol
    from regexp_split_to_table($1, '') as ch
  ) as s;
$$ language sql immutable strict;

(source: https://github.com/NikolayS/postgrest-google-translate/pull/8)

edited Mar 30, 2017 at 21:33

answered Oct 19, 2016 at 2:36

Nick

2,54315 silver badges21 bronze badges

2 Comments

Nick Over a year ago

@Dmitry any details? Error message? What Postgres version do you use?

Dmitry Over a year ago

select urlencode('12 3') returns '123' but expected '12%203'

Toluwalope Olanrewaju · Accepted Answer · 2022-05-31 13:31:21Z

This is an update of @tsohr's answer. I have removed the unused variables and updated the FOR loop to be a little bit less verbose by using FOREACH.

By the way, regex_split_to_array was added in PostgreSQL 8.3 (See here). If you're using a version before 8.3, @tsohr's answer will do just fine for you.

CREATE OR REPLACE FUNCTION urlencode(in_str VARCHAR, OUT _result VARCHAR)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _eachChar   varchar;
    _hex        varchar;
BEGIN
    _result = '';
    FOREACH _eachChar IN ARRAY regexp_split_to_array(in_str, '') LOOP
        IF _eachChar ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _eachChar;
        ELSE
            _hex := encode(_eachChar::bytea, 'hex');
            _eachChar := '';
            WHILE LENGTH(_hex) > 0 LOOP
                _eachChar := _eachChar || '%' || SUBSTRING(_hex, 1, 2);
                _hex := SUBSTRING(_hex, 3, 999);
            END LOOP;
            _result := _result || upper(_eachChar);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

Collectives™ on Stack Overflow

Javascript encodeURI like function in postgresql?

9 Answers 9

3 Comments

Comments

5 Comments

1 Comment

Comments

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

3 Comments

Comments

5 Comments

1 Comment

Comments

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related