100

I have a URL, and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the URL can either end or continue with a "/" and more URL text. I'd like to match URLs like this:

But not match something like this:

So, I thought my best bet was something like this:

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]

where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these URLs while still pulling back the correct groups?

4 Answers 4

140

To match either / or end of content, use (/|\z)

This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).


To put that with an updated version of what you had:

/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)

Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )

Sign up to request clarification or add additional context in comments.

2 Comments

How do I give you more point ;) Thanks for this. Just to document (/|\A) would match forward slash or beginning of string.
Note: JavaScript doesn't support \Z and \z
70

You've got a couple regexes now which will do what you want, so that's adequately covered.

What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).

2 Comments

This is something frequently forgotten and not mentioned eneough in the regex docs.
Note that ^ can have special meaning in a character class. If it is the first character in the class, it makes it a negative class that will match anything except the other characters. e.g. to match anything except a or b, you could use [^ab]. To include a literal ^, just make sure it isn't first, so to match either a, b or ^ you would use [ab^].
51
/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)(/.*)?$

1st Capturing Group (.+)

.+ matches any character (except for line terminators)

  • + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

2nd Capturing Group (\d{4}-\d{2}-\d{2})

\d{4} matches a digit (equal to [0-9])

  • {4} Quantifier — Matches exactly 4 times

- matches the character - literally (case sensitive)

\d{2} matches a digit (equal to [0-9])

  • {2} Quantifier — Matches exactly 2 times

- matches the character - literally (case sensitive)

\d{2} matches a digit (equal to [0-9])

  • {2} Quantifier — Matches exactly 2 times

- matches the character - literally (case sensitive)

3rd Capturing Group (\d+)

\d+ matches a digit (equal to [0-9])

  • + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

4th Capturing Group (.*)?

? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)

.* matches any character (except for line terminators)

  • * Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

$ asserts position at the end of the string

2 Comments

Why doesn't the last / need to be escaped?
the 4th capturing group is (/.*)? not (.*)? the whole point of the question was revolving around the possible ending slash too. If that's a typo is ok
23

In Ruby and Bash, you can use $ inside parentheses.

/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|$)

(This solution is similar to Pete Boughton's, but preserves the usage of $, which means end of line, rather than using \z, which means end of string.)

3 Comments

PHP too from what I can tell. I see no reason why $ can't be used in parenthesis () in any implementation actually. It's the brackets [] that make it literal.
$ works this way in javascript, whereas \z doesn't (Chrome 48, Firefox 43, IE9).
This is the most straight-forward option. Match slash or end-of-line. It even matches the title of this question!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.