1

I have a dataset where I need to extract the alias in a domain style format: domain\alias where the alias is after the backslash. The backslash seems to be treated as an escape character despite a few attempts at getting it to be treated as a character. I first tested my regex pattern using a known non-escape character, the forward slash with success. I then tried the same pattern with the backslash and then several permutations with methods familiar to me to get regex to treat the backslash as a character and not as an escape character without success. What do you do to get this regex pattern to work with a backslash in spark?

Regex Pattern Verification

select regexp_extract('domain/alias', '/(.*)') as test --Results: alias Works with forward slash.

Permutations and Results

select regexp_extract('domain\alias', '\(.*)') as test --Results: domainalias Removes the backslash for some reason

select regexp_extract('domain\alias', '"""\"""(.*)') as test --Results: empty string

select regexp_extract('domain\alias', '"""\\"""(.*)') as test --Results: empty string

select regexp_extract('domain\alias', '\\(.*)') as test --Results: Error in SQL statement: NullPointerException: 

select regexp_extract('domain\alias', '\\\(.*)') as test --Results: Error in SQL statement: NullPointerException: 

1 Answer 1

1

I was able to solve this by adjusting a spark setting. In the Databricks cluster I was using I ran this:

set spark.sql.parser.escapedStringLiterals=true;

My regex pattern now worked as intended:

select regexp_extract('domain\alias', '\\(.*)') as test --Results: alias
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.