4

Without using plpgsql, I'm trying to urlencode a given text within a pgsql SELECT statement.

The problem with this approach:

select regexp_replace('héllo there','([^A-Za-z0-9])','%' || encode(E'\\1','hex'),'g')

...is that the encode function is not passed the regexp parameter, unless there's another way to call functions from within the replacement expression that actually works. So I'm wondering if there's a replacement expression that, by itself, can encode matches into hex values.

There may be other combinations of functions. I thought there would be a clever regex (and that may still be the answer) out there, but I'm having trouble finding it.

0

4 Answers 4

7
select regexp_replace(encode('héllo there','hex'),'(..)',E'%\\1','g');

This doesn't leave the alphanumeric characters human-readable, though.

Sign up to request clarification or add additional context in comments.

Comments

2

Here is pretty short version, and it's even "pure SQL" function, not plpgsql. Multibyte chars (including 3- and 4-bytes emoji) are supported.

create or replace function urlencode(in_str text, OUT _result text) returns text as $$
  select
    string_agg(
      case
        when ol>1 or ch !~ '[0-9a-za-z:/@._?#-]+' 
          then regexp_replace(upper(substring(ch::bytea::text, 3)), '(..)', E'%\\1', 'g')
        else ch
      end,
      ''
    )
  from (
    select ch, octet_length(ch) as ol
    from regexp_split_to_table($1, '') as ch
  ) as s;
$$ language sql immutable strict;

1 Comment

Note that the ch::bytea::text depends on ch being parseable as a BYTEA literal. Which seems to work for everything but ``.
1

Here's a function I wrote that handles encoding using built in functions while preserving the readability of the URL.

Regex matches to capture pairs of (optional) safe characters and (at most one) non-safe character. Nested selects allow those pairs to be encoded and re-combined returning a fully encoded string.

I've run through a test suite with all sorts of permutations (leading/trailing/only/repeated encoded characters and thus far it seems to encode correctly.

The safe special characters are _ ~ . - and /. My inclusion of "/" on that list is probably non-standard, but fits the use case I have where the input text may be a path and I want that to remain.

CREATE OR REPLACE FUNCTION oseberg.encode_uri(input text)
  RETURNS text
  LANGUAGE plpgsql
  IMMUTABLE STRICT
AS $function$
DECLARE
  parsed text;
  safePattern text;
BEGIN
  safePattern = 'a-zA-Z0-9_~/\-\.';
  IF input ~ ('[^' || safePattern || ']') THEN
    SELECT STRING_AGG(fragment, '')
    INTO parsed
    FROM (
      SELECT prefix || encoded AS fragment
      FROM (
        SELECT COALESCE(match[1], '') AS prefix,
               COALESCE('%' || encode(match[2]::bytea, 'hex'), '') AS encoded
        FROM (
          SELECT regexp_matches(
            input,
            '([' || safePattern || ']*)([^' || safePattern || '])?',
            'g') AS match
        ) matches
      ) parsed
    ) fragments;
    RETURN parsed;
  ELSE
    RETURN input;
  END IF;
END;
$function$

2 Comments

Thanks for all the time you put into this, but as stated in the OP, I was looking for pure built-in functions, without pl/pgsql.
There are some problems with multibyte character such as Chinese. Beijing (北京市) returns '%e58c97%e4baac%e5b882', it should be '%E5%8C%97%E4%BA%AC%E5%B8%82'
-3

You can use CLR and import the namespace or use the function shown in this link , this creates a T-SQL function that does the encoding.

http://www.sqljunkies.com/WebLog/peter_debetta/archive/2007/03/09/28987.aspx

1 Comment

Also, I'm trying to use built-in only, i.e., I don't want to create a function or stored procedure.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.