I'm working with text contained in JS variables on a webpage and extracting strings using regex, then turning it into JSON objects in python using json.loads().
The issue I'm having is the unquoted "keys". Right now, I'm doing a series of replacements (code below) to "" each key in each string, but what I want is to dynamically identify any unquoted keys before passing the string into json.loads().
Example 1 with no space after : character
json_data1 = '[{storeName:"testName",address:"12345 Road",address2:"Suite 500",city:"testCity",storeImage:"http://www.testLink.com",state:"testState",phone:"999-999-9999",lat:99.9999,lng:-99.9999}]'
Example 2 with space after : character
json_data2 = '[{storeName: "testName",address: "12345 Road",address2: "Suite 500",city: "testCity",storeImage: "http://www.testLink.com",state: "testState",phone: "999-999-9999",lat: 99.9999,lng: -99.9999}]'
Example 3 with space after ,: characters
json_data3 = '[{storeName: "testName", address: "12345 Road", address2: "Suite 500", city: "testCity", storeImage: "http://www.testLink.com", state: "testState", phone: "999-999-9999", lat: 99.9999, lng: -99.9999}]'
Example 4 with space after : character and newlines
json_data4 = '''[
{
storeName: "testName",
address: "12345 Road",
address2: "Suite 500",
city: "testCity",
storeImage: "http://www.testLink.com",
state: "testState",
phone: "999-999-9999",
lat: 99.9999, lng: -99.9999
}]'''
I need to create pattern that identifies which are keys and not random string values containing characters such as the string link in storeImage. In other words, I want to dynamically find keys and double-quote them to use json.loads() and return a valid JSON object.
I'm currently replacing each key in the text this way
content = re.sub('storeName:', '"storeName":', content)
content = re.sub('address:', '"address":', content)
content = re.sub('address2:', '"address2":', content)
content = re.sub('city:', '"city":', content)
content = re.sub('storeImage:', '"storeImage":', content)
content = re.sub('state:', '"state":', content)
content = re.sub('phone:', '"phone":', content)
content = re.sub('lat:', '"lat":', content)
content = re.sub('lng:', '"lng":', content)
Returned as string representing valid JSON
json_data = [{"storeName": "testName", "address": "12345 Road", "address2": "Suite 500", "city": "testCity", "storeImage": "http://www.testLink.com", "state": "testState", "phone": "999-999-9999", "lat": 99.9999, "lng": -99.9999}]
I'm sure there is a better way of doing this but I haven't been able to find or come up with a regex pattern to handle these. Any help is greatly appreciated!
json.loadsdoes not create JSON objects. It takes a valid JSON value (which may or may not include JSON objects), and returns a Python value. Further, why does your data contain such broken pseudo-JSON in the first place?