4

I was looking for an efficient way to serialise a python list in a reversed order.

I tried to do json.dumps(reversed(mylist)) but apparently json.dumpsdoes not accept iterators.

I can also do json.dumps(list(reversed(mylist))) but that is terribly inefficient with very large lists, and I do not need the temporary list created, I was hoping to serialise the list on the fly instead of creating temporary list.

I think I can use json.JSONEncoder for that, but I do not really get what should I return from the default function.

I also have to stick with standard library because I do not have freedom to install other packages.

So far I tried the two proposed solutions and here is the test output:

>>> timeit.timeit('li.reverse(); json.dumps(li)', number=1, globals=globals())
2.5034537549945526
>>> timeit.timeit('"[{}]".format(",".join(map(json.dumps,reversed(li))))', number=1, globals=globals())
41.076039729989134

I am still thinking that implementing my own JSONEncoder would be more efficient, but I still do not exactly know how to do it.

4
  • Is in place reversing the list first using mylist.reverse() (avoids the copy) - do your serialization, then reverse it again if needs be? Commented Aug 2, 2017 at 10:19
  • It is better than creating a new list, but it still it creates an intermediate step that is not needed. But thanks for the hint. :) Commented Aug 2, 2017 at 10:31
  • 1
    Having had a look through the json library - it's not quite as simple as it seems. The JSONDecoder.default has a bit that says For example, to support arbitrary iterators, you could... but that suggests you then return a list from that iterable which makes sense for subiterables (eg if you had {test: range(10)} expanded... but not for the entire reverse of your data. It's further complicated by the fact that some levels are handled by the C implementation and other bits by _functions with nested _functions... For sheer simplicity I'm sticking with list.reverse :) Commented Aug 2, 2017 at 11:58
  • json.dumps(mylist[::-1]) is another way of doing this, but duplicates the list. Commented Aug 9, 2017 at 6:22

2 Answers 2

6

One way to avoid a copy is to reverse the list inplace, eg:

mylist.reverse()
json_string = json.dumps(mylist)

Then mylist.reverse() it back if needs be.

Sign up to request clarification or add additional context in comments.

Comments

0

Before we go crazy, see if any of the following meet your performance requirements:

mylist.reverse(); json.dumps(mylist); mylist.reverse()
json.dumps(mylist[::-1])
json.dumps(tuple(reversed(mylist)))

You mentioned defining your own JSONEncoder default function, which is fairly simple to do (example at the very bottom*), but I don't think it works here since the json.JSONEncoder requires the default function to convert the object into one of the following:

None, True, False, str, int, float, list, tuple, dict

Converting an iterator to a list or tuple would create a large object, which is what we're trying to avoid.

You'll either need to modify your json library or monkey-patch it.

Here's the CPython source code of json.encoder. PyPy, Jython, and other Python implementations are probably using the same code for the json module.

https://github.com/python/cpython/blob/master/Lib/json/encoder.py#L204

def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
    _key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,
    ## HACK: hand-optimized bytecode; turn globals into locals
    ValueError=ValueError,
    dict=dict,
    float=float,
    id=id,
    int=int,
    isinstance=isinstance,
    list=list,
    str=str,
    tuple=tuple,
    _intstr=int.__str__,
    ...
    def _iterencode(o, _current_indent_level):
        if isinstance(o, str):
            yield _encoder(o)
        ...
        elif isinstance(o, (list, tuple)):
            yield from _iterencode_list(o, _current_indent_level)

        # Add support for processing iterators
        elif isinstance(o, iterator_types):
            # Side-effect: this will consume the iterator.
            # This is probably why it's not included in the official json module
            # We could use itertools.tee to be able to iterate over
            # the original iterator while still having an unconsumed iterator
            # but this would require updating all references to the original
            # iterator with the new unconsumed iterator.
            # The side effect may be unavoidable.
            yield from _iterencode_list(o, _current_index_level)

For performance reasons, you'll want to define the iterator types outside of the function and bring it in as a local.

str_iterator   = type(iter( str()    ))
list_iterator  = type(iter( list()   ))
tuple_iterator = type(iter( tuple()  ))
range_iterator = type(iter( range(0) ))
list_reverseiterator = type(reversed( list()  )) 
reverseiterator      = type(reversed( tuple() )) #same as <class 'reversed'>

# Add any other iterator classes that you need here, plus any container data types that json doesn't support (sets, frozensets, bytes, bytearray, array.array, numpy.array)
iterator_types = (str_iterator, list_iterator, tuple_iterator, range_iterator,
                  list_reverseiterator, reversed)

If you want to go the monkey-patching route, you'll need to redefine the json.encoder._make_iterencode function, replacing all occurrences of isinstance(X, (list, tuple)) with isinstance(X, (list, tuple)+iterator_types)

import json
def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
        _key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,
         iterable_types=_get_iterable_types(),
         ...
    ):
    ...

json.encoder._make_iterencode = _make_iterencode

These changes look something like this: https://github.com/python/cpython/pull/3034/files

*As promised, how to define your own default function, though not useful for dumping iterators without copying the iterator into a list or tuple first.

class JSONEncoderThatSupportsIterators(json.JSONEncoder):
    def default(self, o):
        try:
            iterable = iter(o)
        except TypeError:
            pass
        else:
            return list(iterable)
        # Let the base class default method raise the TypeError
        return json.JSONEncoder.default(self, o)

li = range(10000000) # or xrange if Python 2
dumped = JSONEncoderThatSupportsIterators().encode(reversed(li))
assert dumped.startswith('[999999, 999998, 999997, ')
assert dumped.endswith('6, 5, 4, 3, 2, 1, 0]')

Alternatively, rather than subclassing json.JSONEncoder, you can define the default(self, o) function and pass it as an argument to json.dumps(default=default).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.