A leading 0 in a number literal in JSON is invalid unless the number literal is only the character 0 or starts with 0.. The Python json module is quite strict in that it will not accept such number literals. In part because a leading 0 is sometimes used to denote octal notation rather than decimal notation. Deserialising such numbers could lead to unintended programming errors. That is, should 010 be parsed as the number 8 (in octal notation) or as 10 (in decimal notation).
You can create a decoder that will do what you want, but you will need to heavily hack the json module or rewrite much of its internals. Either way, you will see a performance slow down as you will no longer be using the C implementation of the module.
Below is an implementation that can decode JSON which contains numbers with any number of leading zeros.
import json
import re
import threading
# a more lenient number regex (modified from json.scanner.NUMBER_RE)
NUMBER_RE = re.compile(
r'(-?(?:\d*))(\.\d+)?([eE][-+]?\d+)?',
(re.VERBOSE | re.MULTILINE | re.DOTALL))
# we are going to be messing with the internals of `json.scanner`. As such we
# want to return it to its initial state when we're done with it, but we need to
# do so in a thread safe way.
_LOCK = threading.Lock()
def thread_safe_py_make_scanner(context, *, number_re=json.scanner.NUMBER_RE):
with _LOCK:
original_number_re = json.scanner.NUMBER_RE
try:
json.scanner.NUMBER_RE = number_re
return json.scanner._original_py_make_scanner(context)
finally:
json.scanner.NUMBER_RE = original_number_re
json.scanner._original_py_make_scanner = json.scanner.py_make_scanner
json.scanner.py_make_scanner = thread_safe_py_make_scanner
class MyJsonDecoder(json.JSONDecoder):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# overwrite the stricter scan_once implementation
self.scan_once = json.scanner.py_make_scanner(self, number_re=NUMBER_RE)
d = MyJsonDecoder()
n = d.decode('010')
assert n == 10
json.loads('010') # check the normal route still raise an error
I would stress that you shouldn't rely on this as a proper solution. Rather, it's a quick hack to help you decode malformed JSON that is nearly, but not quite valid. It's useful if recreating the JSON in a valid form is not possible for some reason.
indexkey affected by this?0unless it is only0or starts with0.is quite quite heavily baked into thejsonmodule. The technique described in the question above only works ifjsonhas recognised that the number is a valid JSON number literal. It's very hard to get around this limitation, you have to forego using a C implementation of the internaljson.scannermodule, and modify a global regex in the Python implementation of the scanner. Which is a pretty ugly hack.