1

I'm trying to get a regex to capture the base URL from a URL string. This

^(.+?[^\/:])(?=[?\/]|$)

works. REGEX101

But when I try to use it within postgresql

regexp_replace(content_url,'^(.+?[^\\/:])(?=[?\\/]|$)', '\1') 

it does not

2 Answers 2

1

RegexBuddy gives this warning about the first '?'

PostgreSQL is inconsistent in the way it handles lazy quantifiers in regular expressions with alternation because it attempts to match the longest alternative, instead of being eager and accepting the first alternative that matches

and if you remove it, it seems to work, i.e ^(.+[^\/:])(?=[?\/]|$)

however, if you're trying to parse the baseurl that regex won't work. Use this instead:

select regexp_replace('....', '^(.*:)//([a-z\-.]+)(:[0-9]+)?(.*)$', '\2')
Sign up to request clarification or add additional context in comments.

5 Comments

SELECT regexp_replace('http://stackoverflow.com/questions/1991608/find-base-name-in-url-in-javascript','^(.+[^\/:])(?=[?\/]|$)', '\1') AS content_url; just gives me a box. Like a little "unknown character" box.
@thomas maybe it's the escaping, works here sqlfiddle.com/#!15/cfab1/4/0
Can you provide another solution that will work with postgre?
[Error Code: 0, SQL State: 2201B] ERROR: invalid regular expression: invalid character range :(
what version of postgres are you using?
0

PostGreSQL has an interesting regular expression engine. It took me a while to figure out what be escaped and what needs to be double-escaped. The solution that worked for me is:

(regexp_matches(content_url,'(https?:\/\/\\w+(?:\\.\\w+)+)'))[1] AS content_url

Hope this can help someone.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.