1

I convert a string to a json-object using the json-library:

a = '{"index":1}'
import json
json.loads(a)
{'index': 1}

However, if I instead change the string a to contain a leading 0, then it breaks down:

a = '{"index":01}'
import json
json.loads(a)
>>> JSONDecodeError: Expecting ',' delimiter

I believe this is due to the fact that it is invalid JSON if an integer begins with a leading zero as described in this thread.

Is there a way to remedy this? If not, then I guess the best way is to remove any leading zeroes by a regex from the string first, then convert to json?

6
  • is only the index key affected by this? Commented Feb 14, 2019 at 11:37
  • @aws_apprentice No, the above is just an example. My JSON-string is very long, so ideally a function should be written to put quotes around any number in the string (I guess?) Commented Feb 14, 2019 at 11:38
  • Can you share a pastebin link with your json string? Commented Feb 14, 2019 at 11:41
  • 1
    you have to use your own decoder See the similar post here. stackoverflow.com/questions/45068797/… Commented Feb 14, 2019 at 11:49
  • @AbinayaDevarajan That a number cannot start with 0 unless it is only 0 or starts with 0. is quite quite heavily baked into the json module. The technique described in the question above only works if json has recognised that the number is a valid JSON number literal. It's very hard to get around this limitation, you have to forego using a C implementation of the internal json.scanner module, and modify a global regex in the Python implementation of the scanner. Which is a pretty ugly hack. Commented Feb 14, 2019 at 11:59

3 Answers 3

1

A leading 0 in a number literal in JSON is invalid unless the number literal is only the character 0 or starts with 0.. The Python json module is quite strict in that it will not accept such number literals. In part because a leading 0 is sometimes used to denote octal notation rather than decimal notation. Deserialising such numbers could lead to unintended programming errors. That is, should 010 be parsed as the number 8 (in octal notation) or as 10 (in decimal notation).

You can create a decoder that will do what you want, but you will need to heavily hack the json module or rewrite much of its internals. Either way, you will see a performance slow down as you will no longer be using the C implementation of the module.

Below is an implementation that can decode JSON which contains numbers with any number of leading zeros.

import json
import re
import threading

# a more lenient number regex (modified from json.scanner.NUMBER_RE)
NUMBER_RE = re.compile(
    r'(-?(?:\d*))(\.\d+)?([eE][-+]?\d+)?',
    (re.VERBOSE | re.MULTILINE | re.DOTALL))


# we are going to be messing with the internals of `json.scanner`. As such we
# want to return it to its initial state when we're done with it, but we need to
# do so in a thread safe way.
_LOCK = threading.Lock()
def thread_safe_py_make_scanner(context, *, number_re=json.scanner.NUMBER_RE):
    with _LOCK:
        original_number_re = json.scanner.NUMBER_RE
        try:
            json.scanner.NUMBER_RE = number_re
            return json.scanner._original_py_make_scanner(context)
        finally:
            json.scanner.NUMBER_RE = original_number_re

json.scanner._original_py_make_scanner = json.scanner.py_make_scanner
json.scanner.py_make_scanner = thread_safe_py_make_scanner


class MyJsonDecoder(json.JSONDecoder):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # overwrite the stricter scan_once implementation
        self.scan_once = json.scanner.py_make_scanner(self, number_re=NUMBER_RE)


d = MyJsonDecoder()
n = d.decode('010')
assert n == 10

json.loads('010') # check the normal route still raise an error

I would stress that you shouldn't rely on this as a proper solution. Rather, it's a quick hack to help you decode malformed JSON that is nearly, but not quite valid. It's useful if recreating the JSON in a valid form is not possible for some reason.

Sign up to request clarification or add additional context in comments.

Comments

1

First, using regex on JSON is evil, almost as bad as killing a kitten.

If you want to represent 01 as a valid JSON value, then consider using this structure:

a = '{"index" : "01"}'
import json
json.loads(a)

If you need the string literal 01 to behave like a number, then consider just casting it to an integer in your Python script.

5 Comments

My JSON-string is very long, and it isn't really feasible to manually do this. Is there a way to automatize this?
I think the problem may lie with the source of your JSON string, rather than your Python code. Any chance you can change the export so that the "numbers" are treated as strings?
Why would using regex be evil? If the string doesn't have any other numbers then I think regex would be the easiest approach to this
@ninesalt The JSON content might be nested. Can't give an edge case where regex would fail off the top of my head, but I would rather avoid that chance.
I get the JSON-string from a library, so I can't control it unfortunately. However, it isn't nested, so maybe it is best to write a regex to take care of the integers?
1

How to convert string int JSON into real int with json.loads Please see the post above You need to use your own version of Decoder.

More information can be found here , in the github https://github.com/simplejson/simplejson/blob/master/index.rst

c = '{"value": 02}'
value= json.loads(json.dumps(c))
print(value)

This seems to work .. It is strange

> >>> c = '{"value": 02}'
> >>> import json
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 02}
> >>> c = '{"value": 0002}'
> >>> value= json.loads(json.dumps(c))
> >>> print(value) {"value": 0002}

As @Dunes, pointed out the loads produces string as an outcome which is not a valid solution. However,

DEMJSON seems to decode it properly. https://pypi.org/project/demjson/ -- alternative way

>>> c = '{"value": 02}'
>>> import demjson
>>> demjson.decode(c)
{'value': 2}

1 Comment

Because c is a string and not dictionary containing an int. Strings don't have the same restrictions that numbers do. Even if you dumped a dictionary it still doesn't help as the json module will emit valid JSON which 02 is not.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.