Convert dictionary with integer keys to numpy array

Question

I have a dictionary defined as follows:

>>> mydict = {0:obj0,5:obj1,4:obj3,7:obj4}

The dictionary has integer as keys.

I am trying to convert this dict to a numpy array.

so that:

>>> nparray[[4,0]] = [obj3,obj0]
>>> nparray[[7,4]] = [obj4,obj3]

I am aware of numpy structured arrays but unfortunately it seems like integer indexes must correspond to the position in the array as opposed to the key. Is there a way to change this behavior.

I was considering a way to "trick" the numpy array. For example instead of reading [4,0] it reads the rows corresponding to those keys.

My end goal is to have some sort of custom class that inherits from np.ndarray, if there isn't another alternative.

UPDATE

A bit more background, I originally solved this problem by using the class below, which stores the objects:

class MyArray (dict):
    def __init__ (self,*args):
        dict.__init__(self,*args)
    def __getitem__ (self, key):
        if not hasattr (key, '__iter__'):
            return dict.__getitem__ (self,key)
        return List([dict.__getitem__ (self,k) for k in key])

Which allows multi-key indexes. The key array can be very huge (1000000+), and so for k in key can take a long time and/or be expensive. I wanted to use numpy arrays to take advantage of it's speed, lower memory etc.. and wouldn't have to use that for loop. Is it still warranted?

Is there a good reason to do so? Numpy is designed for numerical computing. There is usually no points in filling a numpy array with strings or general objects. — Julien
– Julien, Commented Dec 16, 2015 at 0:07
You're thinking of your data in a fundamentally key-value-oriented way, while NumPy arrays are big multidimensional grids. It doesn't sound like you want a big multidimensional grid, in which case NumPy isn't going to solve your problems. Alternatively, if you do want a big multidimensional grid, what should go in all those cells the dict doesn't specify values for? — user2357112
– user2357112, Commented Dec 16, 2015 at 0:07
Good point julien, I really like how np arrays can take a list as input to take items. And also because its fast and doesn't take that much memory compared to a list. The objects contain attributes and wnat to be able to do nparray[[4,0]].someattr and get a list of attributes. The values would be numbers, and i would also like them to be numpy arrays — snowleopard
– snowleopard, Commented Dec 16, 2015 at 0:16
If you haven't already, take a look at pandas (pandas.pydata.org/pandas-docs/stable). In particular, see the pandas DataFrame: pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe — Warren Weckesser
– Warren Weckesser, Commented Dec 16, 2015 at 0:16
You can make an array with references to your objects, but array[[0, 4]] is going to return an array so when you do array[[0, 4]].someattr you're going to get an attribute error. You'll end up doing something like [i.someattr for i in array[[0, 4]]] ... — Bi Rico
– Bi Rico, Commented Dec 16, 2015 at 0:32

hpaulj · Accepted Answer · 2015-12-16 06:21:59Z

2

Lets make the dictionary; here my obj are tuples (just for convenience):

In [563]: mydict={0:(0,),5:(1,),4:(3,),7:(4,)}
In [564]: mydict
Out[564]: {0: (0,), 4: (3,), 5: (1,), 7: (4,)}

Initial an array that's big enough and dtype=object:

In [565]: A=np.empty((8,),dtype=object)    
In [566]: A
Out[566]: array([None, None, None, None, None, None, None, None], dtype=object)

copy values from mydict to A, using the key as the array index:

In [567]: for k in mydict:
   .....:     A[k]=mydict[k]
   .....:     

In [568]: A
Out[568]: array([(0,), None, None, None, (3,), (1,), None, (4,)], dtype=object)

In [574]: A[[4,0]]
Out[574]: array([(3,), (0,)], dtype=object)
In [575]: A[[7,4]]
Out[575]: array([(4,), (3,)], dtype=object)

Items defined in the dictionary now appear in the corresponding slots in the array. I won't make any claims about this being useful.

I could mask the nones.

In [581]: Am=np.ma.masked_array(A)
In [582]: Am.mask=[False,True,True,True,False,False,True,False]

In [583]: Am
Out[583]: 
masked_array(data = [(0,) -- -- -- (3,) (1,) -- (4,)],
             mask = [False  True  True  True False False  True False],
       fill_value = ?)

The nones are still there, just 'hidden'. I don't know if masking does anything useful with object types.

Subclass dict

From comments it sounds like the main thing you want is the ability to select multiple items from a dictionary, something akin to the array A[[0,3,5]] indexing.

It might be easier to subclass dict than to expand or subclass np.ndarray.

scipy.sparse has a sparse matrix format which is a subclass of dict. It's __getitem__ may give ideas on how to extend your own dict. I'll try to come up with a simpler version.

In mean time, one way to fetch a group of keys is with an expression like:

In [646]: {k:mydict[k] for k in mydict if k in {0,4}}
Out[646]: {0: (0,), 4: (3,)}

or simpler

In [647]: {k:mydict[k] for k in [0,4]}
Out[647]: {0: (0,), 4: (3,)}

but more robust:

In [649]: {k:mydict.get(k,None) for k in [0,4,5,10]}
Out[649]: {0: (0,), 4: (3,), 5: (1,), 10: None}

edited Dec 16, 2015 at 6:21

answered Dec 16, 2015 at 0:27

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

snowleopard Over a year ago

If i really want to implement things the way i want, this seems like the only feasible option. Is there any issues that can be cause by those empty elements? The ids usually 6 digits and don't have order. So that might be an issue. I guess i should look at other ways to implement that to not involve numpy

hpaulj Over a year ago

One way or other you have to decide what to do about the 'empty' elements. Do they represent 0, not valid entries (masked), or what? There is a scipy.sparse module that represents arrays with lots of zeros, but it can't handle 'objects', just real numbers. And one of it's formats is a dictionary like yours.

hpaulj Over a year ago

Another option is to subclass dictionary, so it accepts a list or tuple of keys, returning a portion of itself. scipy.sparse.dok_matrix class has such a __getitem__ extension.

snowleopard Over a year ago

This solution is very similar to what i did, see above update

john k Over a year ago

boxes1[b]=boxes[b] IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Collectives™ on Stack Overflow

Convert dictionary with integer keys to numpy array

1 Answer 1

Subclass dict

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Subclass dict

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related