Override a dict with numpy support

Question

Using the base idea from How to "perfectly" override a dict?, I coded a class based on dictionaries that should support assigning dot delimited keys, i.e. Extendeddict('level1.level2', 'value') == {'level1':{'level2':'value'}}

The code is

import collections
import numpy

class Extendeddict(collections.MutableMapping):
    """Dictionary overload class that adds functions to support chained keys, e.g. A.B.C          
    :rtype : Extendeddict
    """
    # noinspection PyMissingConstructor
    def __init__(self, *args, **kwargs):
        self._store = dict()
        self.update(dict(*args, **kwargs))

    def __getitem__(self, key):
        keys = self._keytransform(key)
        print 'Original key: {0}\nTransformed keys: {1}'.format(key, keys)
        if len(keys) == 1:
            return self._store[key]
        else:
            key1 = '.'.join(keys[1:])
            if keys[0] in self._store:
                subdict = Extendeddict(self[keys[0]] or {})
                try:
                    return subdict[key1]
                except:
                    raise KeyError(key)
            else:
                raise KeyError(key)

    def __setitem__(self, key, value):
        keys = self._keytransform(key)
        if len(keys) == 1:
            self._store[key] = value
        else:
            key1 = '.'.join(keys[1:])
            subdict = Extendeddict(self.get(keys[0]) or {})
            subdict.update({key1: value})
            self._store[keys[0]] = subdict._store

    def __delitem__(self, key):
        keys = self._keytransform(key)
        if len(keys) == 1:
            del self._store[key]
        else:
            key1 = '.'.join(keys[1:])
            del self._store[keys[0]][key1]
            if not self._store[keys[0]]:
                del self._store[keys[0]]

    def __iter__(self):
        return iter(self._store)

    def __len__(self):
        return len(self._store)

    def __repr__(self):
        return self._store.__repr__()

    # noinspection PyMethodMayBeStatic
    def _keytransform(self, key):
        try:
            return key.split('.')
        except:
            return [key]

But with Python 2.7.10 and numpy 1.11.0, running

basic = {'Test.field': 'test'}
print 'Normal dictionary: {0}'.format(basic)
print 'Normal dictionary in a list: {0}'.format([basic])
print 'Normal dictionary in numpy array: {0}'.format(numpy.array([basic], dtype=object))
print 'Normal dictionary in numpy array.tolist(): {0}'.format(numpy.array([basic], dtype=object).tolist())

extended_dict = Extendeddict(basic)
print 'Extended dictionary: {0}'.format(extended_dict)
print 'Extended dictionary in a list: {0}'.format([extended_dict])
print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object))
print 'Extended dictionary in numpy array.tolist(): {0}'.format(numpy.array([extended_dict], dtype=object).tolist())

I get:

Normal dictionary: {'Test.field': 'test'}
Normal dictionary in a list: [{'Test.field': 'test'}]
Normal dictionary in numpy array: [{'Test.field': 'test'}]
Normal dictionary in numpy array.tolist(): [{'Test.field': 'test'}]
Original key: Test
Transformed keys: ['Test']
Extended dictionary: {'Test': {'field': 'test'}}
Extended dictionary in a list: [{'Test': {'field': 'test'}}]
Original key: 0
Transformed keys: [0]
Traceback (most recent call last):
  File "/tmp/scratch_2.py", line 77, in <module>
    print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object))
  File "/tmp/scratch_2.py", line 20, in __getitem__
    return self._store[key]
KeyError: 0

Whereas I would expect print 'Extended dictionary in numpy array: {0}'.format(numpy.array([extended_dict], dtype=object)) to result in Extended dictionary in numpy array: [{'Test': {'field': 'test'}}]

Any suggestions on what might be wrong for this? Is this even the right way to do it?

it seems to me that you're trying to reinvent the pandas library ;) — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Apr 16, 2016 at 13:40
@MaxU Pandas does something quite different from what I would need for this, and I do use it for many other things. What I want is a "simple" class dictionary-like that support s dot-delimited fields. — Nicolau Gonçalves
– Nicolau Gonçalves, Commented Apr 16, 2016 at 14:43
Add some debugging prints, for exaple key and keys near the error. — hpaulj
– hpaulj, Commented Apr 16, 2016 at 14:46
What happens with the object in a list? Or the thearrary.tolist(). If I ran your code I'd be trying all sorts of prints and actions, trying to find a pattern. — hpaulj
– hpaulj, Commented Apr 16, 2016 at 15:26

hpaulj · Accepted Answer · 2016-04-16 17:21:12Z

3

The problem is in the np.array constructor step. It digs into its inputs trying to create a higher dimensional array.

In [99]: basic={'test.field':'test'}

In [100]: eb=Extendeddict(basic)

In [104]: eba=np.array([eb],object)
<keys: 0,[0]>
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-104-5591a58c168a> in <module>()
----> 1 eba=np.array([eb],object)

<ipython-input-88-a7d937b1c8fd> in __getitem__(self, key)
     11         keys = self._keytransform(key);print key;print keys
     12         if len(keys) == 1:
---> 13             return self._store[key]
     14         else:
     15             key1 = '.'.join(keys[1:])

KeyError: 0

But if I make an array, and assign the object it works fine

In [105]: eba=np.zeros((1,),object)

In [106]: eba[0]=eb

In [107]: eba
Out[107]: array([{'test': {'field': 'test'}}], dtype=object)

np.array is a tricky function to use with dtype=object. Compare np.array([[1,2],[2,3]],dtype=object) and np.array([[1,2],[2]],dtype=object). One is (2,2) the other (2,). It tries to make a 2d array, and resorts to 1d with list elements only if that fails. Something along that line is happening here.

I see 2 solutions - one is this round about way of constructing the array, which I've used in other occasions. The other is to figure out why np.array doesn't dig into dict but does with yours. np.array is compiled, so that may require reading tough GITHUB code.

I tried a solution with f=np.frompyfunc(lambda x:x,1,1), but that doesn't work (see my edit history for details). But I found that mixing an Extendeddict with a dict does work:

In [139]: np.array([eb,basic])
Out[139]: array([{'test': {'field': 'test'}}, {'test.field': 'test'}], dtype=object)

So does mixing it with something else like None or an empty list

In [140]: np.array([eb,[]])
Out[140]: array([{'test': {'field': 'test'}}, []], dtype=object)

In [142]: np.array([eb,None])[:-1]
Out[142]: array([{'test': {'field': 'test'}}], dtype=object)

This is another common trick for constructing an object array of lists.

It also works if you give it two or more Extendeddict with different lengths

np.array([eb, Extendeddict({})]). In other words if len(...) differ (just as with mixed lists).

edited Apr 16, 2016 at 17:21

answered Apr 16, 2016 at 16:09

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Nicolau Gonçalves Over a year ago

Unfortunately, the same happens if I remove the dtype argument. :(

hpaulj Over a year ago

The issue isn't the dtype=object. I think it analyses the input before even looking at the dtype. From its behavior I think only looks at the dtype near the end, when actually constructing the result.

Nicolau Gonçalves Over a year ago

I did try the same things as you did, adding a different length object, which works as you describe. But this also means that everyone using this library would need to be aware of this issue, which seems counter-productive to me. I'll keep it as is for now, but I'll upvote your answer in case anyone else runs into the same issue.

MSeifert · Accepted Answer · 2016-04-17 06:30:38Z

2

Numpy tries to do what it's supposed to do:

Numpy checks for each element if it's iterable (by using len and iter) because what you pass in may be interpreted as a multidimensional array.

There is a catch here: dict-like classes (meaning isinstance(element, dict) == True) will not be interpreted as another dimension (that is why passing in [{}] works). Probably they should check if it's a collections.Mapping instead of a dict. Maybe you can file a bug on their issue tracker.

If you change your class definition to:

class Extendeddict(collections.MutableMapping, dict):
     ...

or change your __len__-method:

    def __len__(self):
        raise NotImplementedError

it works. Neither of these might be something that you want to do but numpy just uses duck typing to create the array and without subclassing directly from dict or by making len inaccessible numpy sees your class as something that ought to be another dimension. This is rather clever and convenient in case you want to pass in customized sequences (subclasses from collections.Sequence) but inconvenient for collections.Mapping or collections.MutableMapping. I think this a Bug.

edited Apr 17, 2016 at 6:30

answered Apr 16, 2016 at 16:00

MSeifert

154k41 gold badges356 silver badges377 bronze badges

3 Comments

Nicolau Gonçalves Over a year ago

I did try to inherit from dict, but that causes a bunch of other issues that I couldn't figure out how to solve properly, But yeah, I also think it might be a bug in numpy itself.

MSeifert Over a year ago

@NicolauGonçalves I didn't want to recommend inheriting from dict. It was just to illustrate why I came to the conclusion.

Nicolau Gonçalves Over a year ago

As I mentioned in a comment to the other answer, not defining length would be rather counter-productive if anyone would use this class. But I will create an issue in numpy and see what the developers think.

Collectives™ on Stack Overflow

Override a dict with numpy support

2 Answers 2

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related