17

I am trying to match a string which does not contain a substring

My string always starts "http://www.domain.com/"

The substring I want to exclude from matches is ".a/" which comes after the string (a folder name in the domain name)

There will be characters in the string after the substring I want to exclude

For example:

"http://www.domain.com/.a/test.jpg" should not be matched

But "http://www.domain.com/test.jpg" should be

4 Answers 4

29

Use a negative lookahead assertion as:

^http://www\.domain\.com/(?!\.a/).*$

Rubular Link

The part (?!\.a/) fails the match if the URL is immediately followed with a .a/ string.

Sign up to request clarification or add additional context in comments.

1 Comment

What if I want to finish the match with a quote mark(")? As I am searching through HTML.
9

My advise in such cases is not to construct overly complicated regexes whith negative lookahead assertions or such stuff.
Keep it simple and stupid!
Do 2 matches, one for the positives, and sort out later the negatives (or the other way around). Most of the time, the regexes become easier, if not trivial. And your program gets clearer.
For example, to extract all lines with foo, but not foobar, I use:

grep foo | grep -v foobar

Comments

0

I would try with

^http:\/\/www\.domain\.com\/([^.]|\.[^a]).*$

You want to match your domain, plus everything that do not continue with a . and everything that do continue with a . but not a a. (Eventually you can add you / if needed after)

2 Comments

This is fine - until another programmer is asked to extend it to also exclude .b, .c and .whatElsethemanagementdoesnotwant
Yep... I get that @Ingo. BTW I forgot the \ before /
0

If you don't use look ahead, but just simple regex, you can just say, if it matches your domain but doesn't match with a .a/

<?php

function foo($s) {

    $regexDomain = '{^http://www.domain.com/}';
    $regexDomainBadPath = '{^http://www.domain.com/\.a/}';

    return preg_match($regexDomain, $s) && !preg_match($regexDomainBadPath, $s);
}

var_dump(foo('http://www.domain.com/'));
var_dump(foo('http://www.otherdomain.com/'));

var_dump(foo('http://www.domain.com/hello'));
var_dump(foo('http://www.domain.com/hello.html'));
var_dump(foo('http://www.domain.com/.a'));
var_dump(foo('http://www.domain.com/.a/hello'));
var_dump(foo('http://www.domain.com/.b/hello'));
var_dump(foo('http://www.domain.com/da/hello'));

?>

note that http://www.domain.com/.a will pass the test, because it doesn't end with /.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.