9

Is there any function/stored procedure in PostgreSQL/plpgsql which is same as the javascripts encodeURI?

What does it mean? Javascript have a handy built in function to encode any kind of url:

encodeURI(url) -> returns the encoded url

For example: encodeURI('http://hu.wikipedia.org/wiki/São_Paulo') -> returns a String which is "http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo"

I looking for exactly the same.

I don't want to encode each parameters separately. I don't want a function like javascript encodeURIComponent which is not the same. The example above results a different output with

encodeURIComponent('http://hu.wikipedia.org/wiki/São_Paulo')

-> "http%3A%2F%2Fhu.wikipedia.org%2Fwiki%2FS%C3%A3o_Paulo"

It's encode the whole string not just the path part. So this is not what I'm looking for. I need a plpgsql function which results equivalent output to javascript function encodeURI.

Thanks!

9 Answers 9

16

Slow and inefficient, consider doing C version of this function:

CREATE OR REPLACE FUNCTION urlencode(in_str text, OUT _result text)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _i      int4;
    _temp   varchar;
    _ascii  int4;
BEGIN
    _result = '';
    FOR _i IN 1 .. length(in_str) LOOP
        _temp := substr(in_str, _i, 1);
        IF _temp ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _temp;
        ELSE
            _ascii := ascii(_temp);
            IF _ascii > x'07ff'::int4 THEN
                RAISE EXCEPTION 'Won''t deal with 3 (or more) byte sequences.';
            END IF;
            IF _ascii <= x'07f'::int4 THEN
                _temp := '%'||to_hex(_ascii);
            ELSE
                _temp := '%'||to_hex((_ascii & x'03f'::int4)+x'80'::int4);
                _ascii := _ascii >> 6;
                _temp := '%'||to_hex((_ascii & x'01f'::int4)+x'c0'::int4)
                            ||_temp;
            END IF;
            _result := _result || upper(_temp);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

Results:

# select urlencode('http://hu.wikipedia.org/wiki/São_Paulo');
-[ RECORD 1 ]------------------------------------------
urlencode | http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo
Sign up to request clarification or add additional context in comments.

3 Comments

Oh, thank you! I'm searching for a built in function or a postgres library to install which contains this kind of function. I'm optimistic, an I think somebody did it before us, becaue it is a simple/everyday problem... I did also nearly the same stored procedure, but as you wrote too, it's inefficent and slow. And yes as you mentioned native C version is the only way to make it scream... So, my question is about what to install, what to use, which is the best library for this, not write a code type... :-)
@vyegorov What does it mean when I reach exception 'Won''t deal with 3 (or more) byte sequences.'?
@TarasMelon, function is limited, you are free to improve it. Or use extension from the accepted answer.
8

This is a very old thread, but surprisingly none of the previously posted solutions seem to comply with the relevant RFC 3986. So here are two PostgreSQL functions encode_uri (if you want to encode a complete URI) and encode_uri_component (if you want to encode just a component of the URI, e.g. the key or value of a query parameter) that combines Nick's buggy solution, ElDiabolo's note, and Kev's partial solution from a related thread into a single working SQL-only solution.

How it works: first, explode the string into single characters, then hex-encode each multibyte character and and each disallowed character.

Encode URI Component:

create or replace function encode_uri_component(text) returns text as $$
    select string_agg(
        case
            when bytes > 1 or c !~ '[0-9a-zA-Z_.!~*''()-]+' then 
                regexp_replace(encode(convert_to(c, 'utf-8')::bytea, 'hex'), '(..)', E'%\\1', 'g')
            else 
                c
        end,
        ''
    )
    from (
        select c, octet_length(c) bytes
        from regexp_split_to_table($1, '') c
    ) q;
$$ language sql immutable strict;

Encode Complete URI:

create or replace function encode_uri(text) returns text as $$
    select string_agg(
        case
            when bytes > 1 or c !~ '[0-9a-zA-Z_.!~*''();,/?:@&=+$#-]+' then 
                regexp_replace(encode(convert_to(c, 'utf-8')::bytea, 'hex'), '(..)', E'%\\1', 'g')
            else 
                c
        end,
        ''
    )
    from (
        select c, octet_length(c) bytes
        from regexp_split_to_table($1, '') c
    ) q;
$$ language sql immutable strict;

Comments

7

I wrote PostgreSQL extensions url_encode that solves this issue.

postgres=# select url_encode('http://hu.wikipedia.org/wiki/São_Paulo');
                      url_encode                       
───────────────────────────────────────────────────────
http%3A%2F%2Fhu.wikipedia.org%2Fwiki%2FS%C3%A3o_Paulo

or

postgres=# select uri_encode('http://hu.wikipedia.org/wiki/São_Paulo');
               uri_encode                  
---------------------------------------------
http://hu.wikipedia.org/wiki/S%C3%A3o_Paulo

5 Comments

Maybe I modify it a little bit... It has a bug, after protocol (http)... If you add c == ':' on the 95th line, it will work perfectly... (and line 154 also... to be consistent)
This is only initial version - I'll add a functions uri_encode and uri_decode, cannot to modify like you propose, because : is reserved chars.
I uploaded new version that support same semantic like javascript functions.
There is a more recent postgres extension for adding the URI datatype to postgres. It might be a good choice: github.com/petere/pguri
@Enwired strangely enough, the pguri package is more recent and has more functions but fails to encode uri with accents. See github.com/petere/pguri/issues/8
7

With PL/V8...cheating?

create function encode_uri(text) returns text language plv8 strict immutable as $$
  return encodeURI($1);
$$;

1 Comment

And here the expression "Using a Bazooka to kill a Fly" takes all it's sense. Consider 3~5Gb of disk space and plenty, plenty cpu to compile this plv8...
6

Today I encountered "Won't deal with 3 (or more) byte sequences." for Korean characters while I've been using @vyegorov 's answer quite a long time, more than a year, need to change it just dumps the bytea hex strings with "%" prefixed.

CREATE OR REPLACE FUNCTION urlencode(in_str text, OUT _result text)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _i      int4;
    _temp   varchar;
    _hex    varchar;
    _ascii  int4;
BEGIN
    _result = '';
    FOR _i IN 1 .. length(in_str) LOOP
        _temp := substr(in_str, _i, 1);
        IF _temp ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _temp;
        ELSE
            _hex := encode(_temp::bytea, 'hex');
            _temp := '';
            WHILE LENGTH(_hex) > 0 LOOP
                _temp := _temp || '%' || SUBSTRING(_hex, 1, 2);
                _hex := SUBSTRING(_hex, 3, 999);
            END LOOP;
            _result := _result || upper(_temp);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

example,

SELECT urlencode('a') UNION ALL  --> "a"
SELECT urlencode('À') UNION ALL  --> "%C3%80"
SELECT urlencode('Ā') UNION ALL  --> "%C4%80"
SELECT urlencode('ə') UNION ALL  --> "%C9%99"
SELECT urlencode('α') UNION ALL  --> "%CE%B1"
SELECT urlencode('가') UNION ALL --> "%EA%B0%80"
SELECT urlencode('上') UNION ALL --> "%E4%B8%8A"
SELECT urlencode('い')           --> "%E3%81%84"

Comments

2

The usage of ::bytea in the answers of tsohr and Nick is wrong. Do a SELECT '\'::bytea to see why.

In both cases convert_to(x, 'utf-8') will give the desired result.

Comments

2

PLPython3 offers a straightforward approach.

Install plpython3 if you don't have it already. Example package install for the command line:

apt-get install postgresql-plpython3-9.5

Create the language (if you haven't done it already for other functions):

CREATE LANGUAGE plpython3u;

Simple function after that:

CREATE OR REPLACE FUNCTION encode_uri(input_string text)
  RETURNS text
AS $$
  import urllib.parse
  return urllib.parse.quote_plus(input_string, safe='~@#$&()*!+=:;,.?/\\''')
$$ LANGUAGE plpython3u immutable strict;

Comments

1

Here is "pure SQL" (no plv8, plpython or even plpgsql is needed) implementation supporting multibyte chars (including 3- and 4-byte emoji):

create or replace function urlencode(text) returns text as $$
  select
    string_agg(
      case
        when ol>1 or ch !~ '[0-9a-zA-Z:/@._?#-]+' 
          then regexp_replace(upper(substring(ch::bytea::text, 3)), '(..)', E'%\\1', 'g')
        else ch
      end,
      ''
    )
  from (
    select ch, octet_length(ch) as ol
    from regexp_split_to_table($1, '') as ch
  ) as s;
$$ language sql immutable strict;

(source: https://github.com/NikolayS/postgrest-google-translate/pull/8)

2 Comments

@Dmitry any details? Error message? What Postgres version do you use?
select urlencode('12 3') returns '123' but expected '12%203'
0

This is an update of @tsohr's answer. I have removed the unused variables and updated the FOR loop to be a little bit less verbose by using FOREACH.

By the way, regex_split_to_array was added in PostgreSQL 8.3 (See here). If you're using a version before 8.3, @tsohr's answer will do just fine for you.

CREATE OR REPLACE FUNCTION urlencode(in_str VARCHAR, OUT _result VARCHAR)
    STRICT IMMUTABLE AS $urlencode$
DECLARE
    _eachChar   varchar;
    _hex        varchar;
BEGIN
    _result = '';
    FOREACH _eachChar IN ARRAY regexp_split_to_array(in_str, '') LOOP
        IF _eachChar ~ '[0-9a-zA-Z:/@._?#-]+' THEN
            _result := _result || _eachChar;
        ELSE
            _hex := encode(_eachChar::bytea, 'hex');
            _eachChar := '';
            WHILE LENGTH(_hex) > 0 LOOP
                _eachChar := _eachChar || '%' || SUBSTRING(_hex, 1, 2);
                _hex := SUBSTRING(_hex, 3, 999);
            END LOOP;
            _result := _result || upper(_eachChar);
        END IF;
    END LOOP;
    RETURN ;
END;
$urlencode$ LANGUAGE plpgsql;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.