-3

^((?!ca-ct.mydomain)(?!ca.mydomain)(?!cats.mydomain).)*mydomain.com$

I got the above expression from a web.config file, it's supposed to filter out anything that contains ca-ct.mydomain or ca.mydomain or cats.mydomain

I just cannot understand what the .)* piece means, the closing parenthesis between the dot and the asterisk seem to break the otherwise logical "any amount of characters after matching any of the 3 negative lookaheads" piece.

1

2 Answers 2

0

The negative look-ahead assertions are checked at successive positions. After consuming one character with ., a repetition with * will apply those assertions again at the next position, ...and so on.

It is just one way to do it. Another approach is to have the three negative look-ahead assertions execute only once, only at the beginning of the input, and look further in the input (by incorporating .*). Then when these assertions succeed, the input can be consumed with yet another .*:

^(?!.*ca-ct.mydomain)(?!.*ca.mydomain)(?!.*cats.mydomain).*mydomain.com$

The work involved for the regex engine is similar though.

Sign up to request clarification or add additional context in comments.

3 Comments

Your explanation is very helpful to understand what the original expression does. Now I attempted this other one since I needed to mix the original one with another filter, and it didn't work. Do you see what's wrong with it? ^(?!(.*anotherItemToFilter|ca-ct\.mydomain|ca\.mydomain|cats\.mydomain)).*\.mydomain\.com$ By wrong I mean, this previous expression is not filtering out "ca-ct.mydomain.com" for instance.
For one, the | inside the negative look-ahead puts that first .* in the first operand of that |. It doesn't apply to the second operand, ...etc. Look again at the regex I put in my answer: every operand starts with .*.
I accepted your answer as correct. Thanks for the breakdown of this regex, I think the original way they wrote it was tough to decode.
0

(?!.*cats.mydomain).*mydomain.com$" is used to match strings that contain mydomain.com but exclude specific subdomains. Let's break it down step by step:

^: Asserts the start of the string. (?!.ca-ct.mydomain): This is a negative lookahead assertion that ensures ca-ct.mydomain does not appear anywhere in the string. The . allows for any characters to appear before ca-ct.mydomain. (?!.*ca.mydomain): This negative lookahead ensures ca.mydomain does not appear anywhere in the string. (?!.*cats.mydomain): This negative lookahead ensures cats.mydomain does not appear anywhere in the string. .*mydomain.com: This part matches any character (except for newline) zero or more times followed by mydomain.com. $: Asserts the end of the string. In summary, this regex pattern will match any string that contains mydomain.com but does not contain the subdomains ca-ct.mydomain, ca.mydomain, or cats.mydomain.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.