2

How should I declare a regular expression?

mergedData = re.sub(r'\$(.*?)\$', readFile, allData)

I'm kind of wondering why this worked. I thought that I need to use the r'' to pass a regular expression.

mergedData = re.sub("\$(.*?)\$", readFile, allData)

What does "\$" result in in this case? Why? I would have thought "$".

1
  • Why do you revert the edit? r'' and "\$" are code. Commented Feb 27, 2013 at 21:34

3 Answers 3

6

I thought that I need to user the r'' to pass a regular expression.

r before a string literal indicates raw string, which means the usual escape sequences such as \n or \r are no longer treated as new line character or carriage return, but simply \ followed by n or r. To specify a \, you only need \ in raw string literal, while you need to double it up \\ in normal string literal. This is why it is usually the case that raw string is used in specifying regular expression1. It reduces the confusion when reading the code. You would have to do escaping twice if you use normal string literal: once for the normal string literal escape and the second time for the escaping in regex.

What does "\$" result in this case? Why? I would have thought "$"

In Python normal string literal, if \ is not followed by an escape sequence, the \ is preserved. Therefore "\$" results in \ followed by $.

This behavior is slightly different from the way C/C++ or JavaScript handle similar situation: the \ is considered escape for the next character, and only the next character remains. So "\$" in those languages will be interpreted as $.

Footnote

1: There is a small defect with the design of raw string in Python, though: Why can't Python's raw string literals end with a single backslash?

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. I guess I was confused by C. ( I'm always confused by C. )
3

The r'...' escapes sequences like '\1' (reference to first group in a regular expression, but the same as '\x01' if not escaped).

Generally speaking in r'...' the backslash won't behave as an escape character.

Try

 re.split('(.).\1', '1x2x3')  # ['1x2x3']

vs.

 re.split(r'(.).\1', '1x2x3') # ['1', 'x', '3']

As '\$' is not an escape sequence in python, it is literally the same as '\\$'.

Comments

1

Just ask the snake:

>>> r'\$(.*?)\$'=='\$(.*?)\$'
True
>>> r'\vert'=='\vert'
False
>>> r'\123'=='\123'
False
>>> r'\#23'=='\#23'
True

Basically if \x would create an esacped character in C, using r in a string prefix is the same as \\x:

>>> r'\123'=='\\123'
True
>>> r'\tab'=='\\tab'
True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.