1

I've tested this regex to extract URLs from a text string:

(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])

... and it works just as I'd like, it matches all the URLs I throw at it.

However, when I use REGEXEXTRACT in Google Sheets like this:

=iferror(Regexextract(A1,"(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&@#\/%=~_|$])"),"")

... nothing extracts. The regex is identical.

What am I doing wrong?

NB. The Regex was tested here: http://www.regextester.com/53716

6
  • 2
    Google Sheets documenation states: Google products use RE2 for regular expressions. Learn how to use RE2 expressions. Commented Apr 24, 2017 at 14:37
  • I would at least replace all those [- by [\- since - has special meaning in a character class, it's better to escape a dash in a character class. Unless it's at the end of the character class (in most regex engines) Commented Apr 24, 2017 at 14:42
  • 1
    @LukStorms - has no special meaning if placed at the beginning or at the end of a character set. [-abc] or [abc-] are totally valid. Commented Apr 24, 2017 at 14:51
  • 2
    @LukStorms, the good practice is knowing the rules and use them as necessary, not more not less. Commented Apr 24, 2017 at 15:07
  • 1
    Isn't it enough to find http etc "prefix" and then match any non-whitespace char? =REGEXEXTRACT(B6, "(?:(?:https?|ftps?|file)://|www\.|ftp\.)\S+")? Commented Apr 24, 2017 at 15:28

1 Answer 1

1

I suggest you use a simpler regex like

=REGEXEXTRACT(B6, "(?:(?:https?|ftps?|file)://|www\.|ftp\.)\S+")

Details:

  • (?:(?:https?|ftps?|file)://|www\.) - either of:
    • (?:https?|ftps?|file):// - http/https, ftp/ftps or file followed with ://
    • | - or
    • www\. - www.
  • \S+ - 1 or more non-whitespace symbols

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Excellent! There are a few exceptions your regex doesn't catch, but it still just saved me hours of work. Thank you very much :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.