0

I want to parse out a part of URL using regex operation. This might be old question. But I am new to regex and searched so much for my requirement and not able to find it. I know ParseURL can be used here. But my URLs are not properly structured to use that. Suppose my URL is as follows,

url = https://www.sitename.com/&q=To+Be+Parsed+out&oq=Dont+Need+to+be+parsed

Here I want to find out when &q= occurs and parse out until & occurs next. I want to remove + or any special characters in the middle. The output should be,

To Be Parsed out

Also if there is no match, the original URL should be returned.

I have tried the following,

re.search('q=?([^&]+)&',url).group(0)

this returns,

&q=To+Be+Parsed+out&oq=Dont+Need+to+be+parsed

Can anybody help me in parsing this out. Thanks

1 Answer 1

3

You can use re.search() to get the desired substring and then replace all + with spaces with str.replace():

re.search(r'/&q=([^&]*)', url).group(1).replace('+', ' ')
  • re.search(r'/&q=([^&]*)', url).group(1) gets the desired portion and replace('+', ' ') does the replaements

Example:

In [56]: url
Out[56]: 'https://www.sitename.com/&q=To+Be+Parsed+out&oq=Dont+Need+to+be+parsed'

In [57]: re.search(r'/&q=([^&]*)', url).group(1).replace('+', ' ')
Out[57]: 'To Be Parsed out'

In case when there is no match, catch the AttributeError exception raised by re.search.group() e.g.:

try:
    out = re.search(r'/&q=([^&]*)', url).group(1).replace('+', ' ')
except AttributeError:
    ## No match, do what you want
Sign up to request clarification or add additional context in comments.

1 Comment

This works fine. In some scenarios, when we can't find a pattern, it throws an error. AttributeError: 'NoneType' object has no attribute 'group'. Can we write a condition to return the URL when we can't find a pattern?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.