1

I've tried to match a the below URL for a couple of hours and can't seem to figure it out and Im quite sure its not that difficult:

The URL can be this:

/course/lesson-one/

or it can also be:

/course/lesson-one/chapter-one/

What I have is the following which matches the second URL:

/course/([a-zA-Z]+[-a-zA-Z]*)/([a-zA-Z]+[-a-zA-Z]*)/

What I want is for the second part to be optional but I can't figure it out the closest I got was the following:

/course/([a-zA-Z]+[-a-zA-Z]*)/*([a-zA-Z]+[-a-zA-Z]*)/

But the above for some reason leaves out the last letter of the word for example if the URL is

/course/computers/

I end up with the string 'computer'

3 Answers 3

1

You use ? if you need optional parts.

/course/([a-zA-Z][-a-zA-Z]*)/([a-zA-Z][-a-zA-Z]*/)?
#                                                 ^

(Note that [a-zA-Z]+[-a-zA-Z]* is equivalent to [a-zA-Z][-a-zA-Z]*.)

Use an additional grouping (?:…) to exclude the / from the match, while allowing multiple elements to be optional at once:

/course/([a-zA-Z][-a-zA-Z]*)/(?:([a-zA-Z][-a-zA-Z]*)/)?
#                            ~~~                     ~^

Your 2nd regex swallows the last character, because:

  /course/([a-zA-Z]+[-a-zA-Z]*)/*([a-zA-Z]+[-a-zA-Z]*)/
          ^^^^^^^^^^^^^^^^^^^^^  ~~~~~~~~~~~~~~~~~~~~~
        this matches 'computer'  and this matches the 's'.

The second group in this regex required to match some alphabets with length 1 or more due to the +, so the 's' must belong there.

Sign up to request clarification or add additional context in comments.

2 Comments

Ok thank you, it's the question mark that Im missing. Just glanced at the docs and its a one liner explains why I overlooked it!
The second regex you included above is exactly what I needed also thank you for explaining it real well +100. Thanks to everyone that contributed below.
1

use a "?" after something to make it considered optional.

>>> r = r"/course/([a-zA-Z]+[-a-zA-Z]*)(/[A-Z[a-z]+[-a-zA-Z]*)?"
>>> s = "/course/lesson-one/chapter-one/"
>>> re.match(r, s).groups()
('lesson-one', '/chapter-one')
>>> s = "/course/computers/"
>>> re.match(r, s).groups()
('computers', None)

Comments

1

You can use the following regex:

'/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?'

This makes the second part optional and still matches each of the parts of the URL.

Note that the second part of the URL has two groups: one that matches /chapter-one/ and one that matches chapter-one

>>> re.match('/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?', '/course/lesson-one/chapter-one/').groups()
('lesson-one', '/chapter-one/', 'chapter-one')

Similarly:

>>> re.match('/course/([a-zA-Z]+[-a-zA-Z]*)(/([a-zA-Z]+[-a-zA-Z]*)/)?', '/course/lesson-one/').groups()
('lesson-one', None, None)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.