1

What I am trying to do: Parse a query for a leading or trailing ? which will result in a search on the rest of the string.

"foobar?" or "?foobar" results in a search. "foobar" results in some other behavior.

This code works as expected in the interpreter:

 >>> import re
 >>> print re.match(".+\?\s*$","foobar?")
 <_sre.SRE_Match object at 0xb77c4d40>
 >>> print re.match(".+\?\s*$","foobar")
 None

This code from a Django app does not:

doSearch = { "text":"Search for: ", "url":"http://www.google.com/#&q=QUERY", "words":["^\?\s*",".+\?\s*$"] }
...
subQ = myCore.lookForPrefix(someQuery, doSearch["words"])
...
def lookForPrefix(query,listOfPrefixes):
    for l in listOfPrefixes:
        if re.match(l, query):
            return re.sub(l,'', query)
    return False

The Django code never matches the trailing "?", all other regexs work fine.

And ideas about why not?

2
  • 1
    You should be careful about escaping your backslashes correctly (or using raw strings - either r".+\?\s*$" or ".+\\?\\s*$"), but that's just a side-note. Commented Feb 5, 2010 at 9:17
  • What is an example of query for a failing match? Try printing repr for it -- perhaps it has a trailing \n or something. Commented Feb 5, 2010 at 9:19

2 Answers 2

3

The problem is in your second regex. It matches the whole query, so using re.sub() will replace it all with an empty string. I.e. lookForPrefix('foobar?',listOfPrefixes) will return ''. You are likely checking the return value in an if, so it evaluates the empty string as false.

To solve this, you just need to change the second regex to \?\s*$ and use re.search() instead of re.match(), as the latter requires that your regex matches from the beginning of the string.

doSearch = { "text":"Search for: ", "url":"http://www.google.com/#&q=QUERY", "words":["^\?\s*","\?\s*$"] }

def lookForPrefix(query,listOfPrefixes):
    for l in listOfPrefixes:
        if re.search(l, query):
            return re.sub(l,'', query)
    return False

The result:

>>> lookForPrefix('?foobar', doSearch["words"])
'foobar'
>>> lookForPrefix('foobar?', doSearch["words"])
'foobar'
>>> lookForPrefix('foobar', doSearch["words"])
False

EDIT: In fact, you might as well combine the two regexes into one: ^\?\s*|\?\s*$. That will work equally well.

Sign up to request clarification or add additional context in comments.

2 Comments

That works. I still don't understand why. Even if I use "^.+\?\s*$" and re.match() is does not work. Shouldn't that expression match and string with one or more characters followed by a ? and any number of spaces then the end of string? Thanks!
The problem is not in the match, as it does match, but in the replacement, since in that case it replaces the whole string instead of just the trailing question mark, returning an empty string as a result.
0

You probably want to use raw strings for regexes, such as: r'^\s\?'. Regular strings will prevent problems with escaped characters becoming other values (r'\0' is the same as '\0', but different from '\0' (a single null character)).

Also r'^\?\s*|\?\s*$' will NOT work as intended by Max S. because the | is alternating between "\s* and \?. The regex proposed in the EDIT interprets to: question mark at the beginning of the line followed by any number of spaces OR a question mark, followed by any number of spaces and the end of the line.

I believe Max S. intended: r'(^\?\s*)|(\?\s*$)', which interprets to: a question mark followed by any number of spaces at the beginning or end of the line.

1 Comment

I'm afraid you are wrong about the pipe. A pipe outside of brackets will separate the whole regex, meaning "try everything to the left of the pipe, and if that fails, try everything to the right". For example, re.findall('^a.|d.$', 'abacde') returns ['ab', 'de'].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.