I am trying to extract values for colorName from the following strings located in <script> of an HTML page.
\\"colorName\\":\\"GLOSS REDSKY SHDWSIL WHT IMPASTO\\"
\\"colorName\\":\\"GLOSS PREMIUM FJORD METALLIC / WHITE METALLIC SILVER\\"
The HTML is returned in response.text using Python Scrapy. I want to extract GLOSS REDSKY SHDWSIL WHT IMPASTO and GLOSS PREMIUM FJORD METALLIC / WHITE METALLIC SILVER from the code snippet using regex.
re.findall('\\\\"colorName\\\\":\\\\"(.*?)\\\\"', response.text)
This line of code works fine, but when I tried to put the regex in a JSON string like this:
{
"selector": "\\\\"colorName\\\\":\\\\"(.*?)\\\\""
}
I got the following errors:
Error: Parse error on line 4:
... "selector": "\\\\"colorName\\\\":\\\\"
-----------------------^
Expecting 'EOF', '}', ':', ',', ']', got 'undefined'
PyCharm suggested the following edit to the JSON string, which didn't throw any error:
{
"selector": "\\\\\\\\\"colorName\\\\\\\\\":\\\\\\\\\"(.*?)\\\\\\\\\""
}
I cannot figure out why I need to add so many extra backslashes into the JSON string to make it right.
{"selector": "\\\\\\\\\"colorName\\\\\\\\\":\\\\\\\\\"(.*?)\\\\\\\\\""}is a valid JSON Object containing a single key/val pair where this is the string value"\\\\\\\\\"colorName\\\\\\\\\":\\\\\\\\\"(.*?)\\\\\\\\\"". Unescaping that double quoted string gives\\\\"colorName\\\\":\\\\"(.*?)\\\\"which as a regex, matches the target string regex101.com/r/8r9nwN/1 Question is, what's the question ?