56

I have the following JSON string coming from external input source:

{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}

This is an incorrectly-formatted JSON string ("id" and "value" must be in quotes), but I need to parse it anyway. I have tried simplejson and json-py and seems they could not be set up to parse such strings.

I am running Python 2.5 on Google App engine, so any C-based solutions like python-cjson are not applicable.

Input format could be changed to XML or YAML, in addition to JSON listed above, but I am using JSON within the project and changing format in specific place would not be very good.

Now I've switched to XML and parsing the data successfully, but looking forward to any solution that would allow me to switch back to JSON.

3
  • I'm a little confused about how you can switch to XML, yet not be in control of the JSON data. It sounds like you have some external source of data, in either XML or JSON formats, but its JSON output is permanently broken as shown and you can't do anything about it so your only option is to select the XML version instead? Or am I missing something? Commented Dec 19, 2009 at 0:35
  • you can parse it as YAML without a change, because it is YAML too Commented Dec 19, 2009 at 0:43
  • Peter, you're right - I have an external source of data which I could control only in one way - by saying I want an input in either JSON, XML or YAML. Nadia, thanks - that's my mistake (and due to I am not very familiar with Stackoverflow's interface at the time). Commented Dec 19, 2009 at 9:20

5 Answers 5

64

since YAML (>=1.2) is a superset of JSON, you can do:

>>> import yaml
>>> s = '{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}'
>>> yaml.load(s)
{'id': 17893, 'value': '82363549923gnyh49c9djl239pjm01223'}
Sign up to request clarification or add additional context in comments.

7 Comments

well, python-yaml (PyYAML) is not yet fully 1.2 compliant, but will handle most cases. to be prepared for problem cases, see en.wikipedia.org/wiki/YAML#cite_ref-6
mykhal, have you run it on Google App Engine? Seems PyYAML uses C modules and thus cannot be used on GAE.
pyyaml is much faster, if using libyaml, but it also is written in pure python, and you can choose between CLoader o Loader (pure py). But don't worry, yaml support is already included in app engine, you can try this in interactive shell shell.appspot.com
YAML is not a strict superset of JSON as YAML requires the mapping keys to be unique while JSON only suggests to use unique keys (MUST vs. SHOULD).
One more problem: YAML apparently requires a space after the colon. However for the most part this works like a charm.
|
26

You can use demjson.

>>> import demjson
>>> demjson.decode('{foo:3}')
{u'foo': 3}

6 Comments

That helped me to parse JSON without quotes and with formatting that differs from yaml
very helpful package for parsing broken json, thanks
handled nested objects as well which I found to be an issue with yaml. on windows: py -m pip install demjson ----------- then import demjson s = """get or define the multiline string inline""" j = demjson.decode(s) jsonString = demjson.encode(j)
Probably the best python lib for parsing json without quotes, many thanks.
The original link is dead: deron.meranda.us/python/demjson/. I edited in the package's page in pypi instead.
|
3

The dirtyjson library can handle some almost-correct JSON:

>>> import dirtyjson
>>> 
>>> s = '{value: "82363549923gnyh49c9djl239pjm01223", id: 17893}'
>>> d = dirtyjson.loads(s)
>>> d
AttributedDict([('value', '82363549923gnyh49c9djl239pjm01223'), ('id', 17893)])
>>>
>>> d = dict(d)
>>> d
{'value': '82363549923gnyh49c9djl239pjm01223', 'id': 17893}
>>> d["value"]
'82363549923gnyh49c9djl239pjm01223'
>>> d["id"]
17893

Comments

2

You could use a string parser to fix it first, a regex could do it provided that this is as complicated as the JSON will get.

1 Comment

This is possible, but I am considering such type of solution as weird, so for now I am just looking for a json parsing library that could process this broken JSON.
0

Pyparsing includes a JSON parser example, here is the online source. You could modify the definition of memberDef to allow a non-quoted string for the member name, and then you could use this to parser your not-quite-JSON source text.

[The August, 2008 issue of Python Magazine has a lot more detailed info about this parser. It shows some sample JSON, and code that accesses the parsed results like it was a deserialized object.

2 Comments

Links are dead.
Thanks - I fixed the link to the parser to now point to that file in the GitHub repo. I had to drop the Python Magazine link, since there is no longer a public archive of the issues of this magazine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.