Dynamically double-quote "keys" in text to form valid JSON string in python

Question

I'm working with text contained in JS variables on a webpage and extracting strings using regex, then turning it into JSON objects in python using json.loads().

The issue I'm having is the unquoted "keys". Right now, I'm doing a series of replacements (code below) to "" each key in each string, but what I want is to dynamically identify any unquoted keys before passing the string into json.loads().

Example 1 with no space after : character

json_data1 = '[{storeName:"testName",address:"12345 Road",address2:"Suite 500",city:"testCity",storeImage:"http://www.testLink.com",state:"testState",phone:"999-999-9999",lat:99.9999,lng:-99.9999}]'

Example 2 with space after : character

json_data2 = '[{storeName: "testName",address: "12345 Road",address2: "Suite 500",city: "testCity",storeImage: "http://www.testLink.com",state: "testState",phone: "999-999-9999",lat: 99.9999,lng: -99.9999}]'

Example 3 with space after ,: characters

json_data3 = '[{storeName: "testName", address: "12345 Road", address2: "Suite 500", city: "testCity", storeImage: "http://www.testLink.com", state: "testState", phone: "999-999-9999", lat: 99.9999, lng: -99.9999}]'

Example 4 with space after : character and newlines

json_data4 = '''[
{
    storeName: "testName", 
    address: "12345 Road", 
    address2: "Suite 500", 
    city: "testCity", 
    storeImage: "http://www.testLink.com", 
    state: "testState", 
    phone: "999-999-9999", 
    lat: 99.9999, lng: -99.9999
}]'''

I need to create pattern that identifies which are keys and not random string values containing characters such as the string link in storeImage. In other words, I want to dynamically find keys and double-quote them to use json.loads() and return a valid JSON object.

I'm currently replacing each key in the text this way

content = re.sub('storeName:', '"storeName":', content)
content = re.sub('address:', '"address":', content)
content = re.sub('address2:', '"address2":', content)
content = re.sub('city:', '"city":', content)
content = re.sub('storeImage:', '"storeImage":', content)
content = re.sub('state:', '"state":', content)
content = re.sub('phone:', '"phone":', content)
content = re.sub('lat:', '"lat":', content)
content = re.sub('lng:', '"lng":', content)

Returned as string representing valid JSON

json_data = [{"storeName": "testName", "address": "12345 Road", "address2": "Suite 500", "city": "testCity", "storeImage": "http://www.testLink.com", "state": "testState", "phone": "999-999-9999", "lat": 99.9999, "lng": -99.9999}]

I'm sure there is a better way of doing this but I haven't been able to find or come up with a regex pattern to handle these. Any help is greatly appreciated!

json.loads does not create JSON objects. It takes a valid JSON value (which may or may not include JSON objects), and returns a Python value. Further, why does your data contain such broken pseudo-JSON in the first place? — chepner
– chepner, Commented Jan 30, 2018 at 15:38

Tim Pietzcker · Accepted Answer · 2018-01-30 15:40:57Z

3

That repetition is of course unnecessary. You could put everything into a single regex:

content = re.sub(r"\b(storeName|address2?|city|storeImage|state|phone|lat|lng):", r'"\1":', content)

\1 contains the match within the first (in this case, only) set of parentheses, so "\1": surrounds it with quotes and adds back the colon.

Note the use of a word boundary anchor to make sure we match only those exact words.

answered Jan 30, 2018 at 15:40

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

AdrianEddy · Accepted Answer · 2018-01-30 15:40:12Z

2

Something like this should do the job: ([{,]\s*)([^"':]+)(\s*:)

Replace for: \1"\2"\3

Example: https://regex101.com/r/oV0udR/1

answered Jan 30, 2018 at 15:40

AdrianEddy

7171 gold badge9 silver badges13 bronze badges

1 Comment

Derrick Brewer Over a year ago

Thank you for the link example. This is by far the most helpful resource I've come across with respect to Regex.

Srdjan M. · Accepted Answer · 2018-01-30 15:53:42Z

0

Regex: (\w+)\s?:\s?("?[^",]+"?,?)

Regex demo

import re

text = 'storeName: "testName", '
text = re.sub('(\w+)\s?:\s?("?[^",]+"?,?)', "\"\g<1>\":\g<2>", text)
print(text)

Output: "storeName":"testName",

answered Jan 30, 2018 at 15:53

Srdjan M.

3,4253 gold badges17 silver badges35 bronze badges

2 Comments

Derrick Brewer Over a year ago

Re-writing this, it's the same as using r'"\1":\2'' versus '\"\g<1>\":\g<2>' for the replace?

sgmbd Over a year ago

It fails where data contains date and that date has : in it. e.g. timestamp_time":\"2020-06-08 22:40:00.000000 UTC. This regex convert it to timestamp_time":\"2020-06-08 "22":40:00.000000 UTC

Collectives™ on Stack Overflow

Dynamically double-quote "keys" in text to form valid JSON string in python

Example 1 with no space after : character

Example 2 with space after : character

Example 3 with space after ,: characters

Example 4 with space after : character and newlines

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Example 1 with no space after : character

Example 2 with space after : character

Example 3 with space after ,: characters

Example 4 with space after : character and newlines

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related