1

I have to take a string from the user and format it so that it is acceptable for certain command line consumption. Basically, I need to replace any backslashes that come before a double quote (") with two back slashes. I can find the pattern using this regex:

import re

pattern = '\\\\+"'
string = "\\\\\\\" asdf \\\" \\ \\ \\\\\""

print string, "\n"
matches = re.findall(pattern, string)

But now that I have those matches, how do I replace them with double copies of themselves? So the 3 back slashes in front of a quote has to become 6, the 1 slash becomes 2, and the 2 becomes 4. The slashes that are not in front of quotes stay the same length.

Any advice on this would be greatly appreciated.

Thanks.

3
  • Can you be more explicit about what input and output you want? What's a verbatim example of input and output; don't worry about escaping anything, just show us exactly the input and output you want. I just want to make sure you're understanding how backslashes work before I post my answer. :) Commented Sep 1, 2015 at 0:43
  • the string variable is the string that I am trying to replace the slashes in. After python consumes the escape characters, that string is : \\\" asdf \" \ \ \\" Commented Sep 1, 2015 at 0:54
  • and thus the output would be: \\\\\\" asdf \\" \ \ \\\\" Commented Sep 1, 2015 at 1:00

1 Answer 1

2

You should use single-quotes, raw strings, and re.sub:

string = r'\\\" asdf \" \ \ \\"'
new_string = re.sub(r'(\\+)"', r'\1\1"', string)
print(new_string)

Output:

\\\\\\" asdf \\" \ \ \\\\"

The Pattern

To explain the pattern, first let's remove the parentheses; they don't affect what's matched, and we'll put them back later. The pattern r'\\+"' means "one or more backslashes followed by a double-quote". Even though it's a raw string, we still have to escape the backslash because backslashes have special meaning in regular expressions; that's why it's r'\\+"' instead of r'\+"'.

The Parentheses

The parentheses around the \\+ in the actual pattern just mean "capture the part of the match inside these parentheses". This will put the substring of all backslashes in this match into a capture group. We're going to use this capture group in the replacement string.

The Replacement String

The replacement string, r'\1\1"', just means "two copies of the first capture group followed by a double-quote" (in this case there's only one capture group, but there can be more). The reason the replacement string has a double-quote is because the match had a double-quote; since the entire match is replaced by the replacement string, if the replacement string didn't have a double-quote, the double-quotes would be removed.

Sign up to request clarification or add additional context in comments.

2 Comments

Fantastic solution. I was stuck on the idea of using loops and I knew there had to be a better way to do it. Can you explain how this works a little? I thought that this would only match and replace the slash immediately before the quote.
@Crbreingan, I added an explanation. I might have overdone it a bit :).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.