REGEXP_REPLACE for spark.sql()

Question

I need to write a REGEXP_REPLACE query for a spark.sql() job. If the value, follows the below pattern then only, the words before the first hyphen are extracted and assigned to the target column 'name', but if the pattern doesn't match, the entire 'name' should be reported.

Pattern:

Values should be hyphen delimited. Any values can be present before the first hyphen (be it numbers, alphabets, special characters or even space)
First hyphen should be exactly followed by 2 words, separated by hyphen (it can only be numbers, alphabets or alphanumeric) (Note: Special characters & blanks are not allowed)
Two words should be followed by one or more digits, followed by hyphen.
Last portion should be only one or more digits.

For Example:

if name = abc45-dsg5-gfdvh6-9890-7685, output of REGEXP_REPLACE = abc45

if name = abc, output of REGEXP_REPLACE = abc

if name = abc-gf5-dfg5-asd5-98-00, output of REGEXP_REPLACE = abc-gf5-dfg5-asd5-98-00

I have

spark.sql("SELECT REGEXP_REPLACE(name , '-[^-]+-\\w{2}-\\d+-\\d+$','',1,1,'i')  AS name").show();

But it does not work.

Try it with regexp_extract and the pattern ^(?:[^-\s]+(?=-\w+-\w+\d+-\d+-\d+$)|\S+) to match the desired strings. See regex101.com/r/BeCTRF/1 — The fourth bird
– The fourth bird, Commented Mar 10, 2021 at 20:25

Ryszard Czech · Accepted Answer · 2021-03-10 22:37:53Z

Use

^([^-]*)(-[a-zA-Z0-9]+){2}-[0-9]+-[0-9]+$

See proof. Replace with $1. If $1 does not work, use \1. If \1 does not work use \\1.

EXPLANATION

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^-]*                    any character except: '-' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (                        group and capture to \2 (2 times):
--------------------------------------------------------------------------------
    -                        '-'
--------------------------------------------------------------------------------
    [a-zA-Z0-9]+             any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9' (1 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  ){2}                     end of \2 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \2)
--------------------------------------------------------------------------------
  -                        '-'
--------------------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  -                        '-'
--------------------------------------------------------------------------------
  [0-9]+                   any character of: '0' to '9' (1 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Collectives™ on Stack Overflow

REGEXP_REPLACE for spark.sql()

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related