5

I wrote the following regex:

(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?

Its behaviour can be seen here: http://gskinner.com/RegExr/?34b8m

I wrote the following JavaScript code:

var urlexp = new RegExp(
    '^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$', 'gi'
);
document.write(urlexp.test("blaaa"))

And it returns true even though the regex was supposed to not allow single words as valid.

What am I doing wrong?

3
  • This is why I hate using the new RegExp Construct for Regular Expression initialization in JS. Every backslash has to be doubled. Try the exact same code but with var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi Commented Mar 30, 2013 at 8:53
  • 3
    Also, when using new RegExp, you don't have to escape your forward slashes - that's exclusively for with you're using /regex/mod notation (like you don't have to escape your single quotes in a double quoted string and vice versa), so var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi'); will work as well. Commented Mar 30, 2013 at 8:56
  • possible duplicate of Why this javascript regex doesn't work? Commented Mar 31, 2014 at 7:16

1 Answer 1

9

Your problem is that JavaScript is viewing all your escape sequences as escapes for the string. So your regex goes to memory looking like this:

^(https?://)?([da-z.-]+).([a-z]{2,6})(/(w|-)*)*/?$

Which you may notice causes a problem in the middle when what you thought was a literal period turns into a regular expressions wildcard. You can solve this in a couple ways. Using the forward slash regular expression syntax JavaScript provides:

var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi

Or by escaping your backslashes (and not your forward slashes, as you had been doing - that's exclusively for when you're using /regex/mod notation, just like you don't have to escape your single quotes in a double quoted string and vice versa):

var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi')

Please note the double backslash before the w - also necessary for matching word characters.

A couple notes on your regular expression itself:

[da-z.-]

d is contained in the a-z range. Unless you meant \d? In that case, the slash is important.

(/(\w|-)*)*/?

My own misgivings about the nested Kleene stars aside, you can whittle that alternation down into a character class, and drop the terminating /? entirely, as a trailing slash will be match by the group as you've given it. I'd rewrite as:

(/[\w-]*)*

Though, maybe you'd just like to catch non space characters?

(/[^/\s]*)*

Anyway, modified this way your regular expression winds up looking more like:

^(https?://)?([\da-z.-]+)\.([a-z]{2,6})(/[\w-]*)*$

Remember, if you're going to use string notation: Double EVERY backslash. If you're going to use native /regex/mod notation (which I highly recommend), escape your forward slashes.

Sign up to request clarification or add additional context in comments.

7 Comments

Very detailed, thanks for both the clarification and additional suggestions!
@Marin You could also use this Regex escape method here to escape all the backslashes in a string: stackoverflow.com/a/2593661/1726343
@asad - that will turn any string into a regular expression that matches the literal string passed in - not exactly called for in this situation.
@FrankieTheKneeMan Ah, I see what you mean. I misunderstood what the OP required. I guess that could still work if the regex in the function was reduced to /([\\])/
@asad Not really. The problem was that backslashes weren't surviving into the in memory string representation, and as such the meaning was being changed. That's the whole problem with backslashes and string representations. So new RegExp(RegExp.quote('[a\-z]')) would (with your new regex) generate the regular expression /[a-z]/, because the string that the function saw would look like [a-z], not containing a backslash at all. new RegExp(RegExp.quote('[a\\-z]')) would send a string that looked like [a\-z], but generate the regular expression /[a\\-z]/, which is dangerously wrong.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.