1

I have a dictionary defined as follows:

>>> mydict = {0:obj0,5:obj1,4:obj3,7:obj4}

The dictionary has integer as keys.

I am trying to convert this dict to a numpy array.

so that:

>>> nparray[[4,0]] = [obj3,obj0]
>>> nparray[[7,4]] = [obj4,obj3]

I am aware of numpy structured arrays but unfortunately it seems like integer indexes must correspond to the position in the array as opposed to the key. Is there a way to change this behavior.

I was considering a way to "trick" the numpy array. For example instead of reading [4,0] it reads the rows corresponding to those keys.

My end goal is to have some sort of custom class that inherits from np.ndarray, if there isn't another alternative.

UPDATE

A bit more background, I originally solved this problem by using the class below, which stores the objects:

class MyArray (dict):
    def __init__ (self,*args):
        dict.__init__(self,*args)
    def __getitem__ (self, key):
        if not hasattr (key, '__iter__'):
            return dict.__getitem__ (self,key)
        return List([dict.__getitem__ (self,k) for k in key])

Which allows multi-key indexes. The key array can be very huge (1000000+), and so for k in key can take a long time and/or be expensive. I wanted to use numpy arrays to take advantage of it's speed, lower memory etc.. and wouldn't have to use that for loop. Is it still warranted?

6
  • 4
    Is there a good reason to do so? Numpy is designed for numerical computing. There is usually no points in filling a numpy array with strings or general objects. Commented Dec 16, 2015 at 0:07
  • 2
    You're thinking of your data in a fundamentally key-value-oriented way, while NumPy arrays are big multidimensional grids. It doesn't sound like you want a big multidimensional grid, in which case NumPy isn't going to solve your problems. Alternatively, if you do want a big multidimensional grid, what should go in all those cells the dict doesn't specify values for? Commented Dec 16, 2015 at 0:07
  • Good point julien, I really like how np arrays can take a list as input to take items. And also because its fast and doesn't take that much memory compared to a list. The objects contain attributes and wnat to be able to do nparray[[4,0]].someattr and get a list of attributes. The values would be numbers, and i would also like them to be numpy arrays Commented Dec 16, 2015 at 0:16
  • If you haven't already, take a look at pandas (pandas.pydata.org/pandas-docs/stable). In particular, see the pandas DataFrame: pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe Commented Dec 16, 2015 at 0:16
  • You can make an array with references to your objects, but array[[0, 4]] is going to return an array so when you do array[[0, 4]].someattr you're going to get an attribute error. You'll end up doing something like [i.someattr for i in array[[0, 4]]] ... Commented Dec 16, 2015 at 0:32

1 Answer 1

2

Lets make the dictionary; here my obj are tuples (just for convenience):

In [563]: mydict={0:(0,),5:(1,),4:(3,),7:(4,)}
In [564]: mydict
Out[564]: {0: (0,), 4: (3,), 5: (1,), 7: (4,)}

Initial an array that's big enough and dtype=object:

In [565]: A=np.empty((8,),dtype=object)    
In [566]: A
Out[566]: array([None, None, None, None, None, None, None, None], dtype=object)

copy values from mydict to A, using the key as the array index:

In [567]: for k in mydict:
   .....:     A[k]=mydict[k]
   .....:     

In [568]: A
Out[568]: array([(0,), None, None, None, (3,), (1,), None, (4,)], dtype=object)

In [574]: A[[4,0]]
Out[574]: array([(3,), (0,)], dtype=object)
In [575]: A[[7,4]]
Out[575]: array([(4,), (3,)], dtype=object)

Items defined in the dictionary now appear in the corresponding slots in the array. I won't make any claims about this being useful.


I could mask the nones.

In [581]: Am=np.ma.masked_array(A)
In [582]: Am.mask=[False,True,True,True,False,False,True,False]

In [583]: Am
Out[583]: 
masked_array(data = [(0,) -- -- -- (3,) (1,) -- (4,)],
             mask = [False  True  True  True False False  True False],
       fill_value = ?)

The nones are still there, just 'hidden'. I don't know if masking does anything useful with object types.


Subclass dict

From comments it sounds like the main thing you want is the ability to select multiple items from a dictionary, something akin to the array A[[0,3,5]] indexing.

It might be easier to subclass dict than to expand or subclass np.ndarray.

scipy.sparse has a sparse matrix format which is a subclass of dict. It's __getitem__ may give ideas on how to extend your own dict. I'll try to come up with a simpler version.

In mean time, one way to fetch a group of keys is with an expression like:

In [646]: {k:mydict[k] for k in mydict if k in {0,4}}
Out[646]: {0: (0,), 4: (3,)}

or simpler

In [647]: {k:mydict[k] for k in [0,4]}
Out[647]: {0: (0,), 4: (3,)}

but more robust:

In [649]: {k:mydict.get(k,None) for k in [0,4,5,10]}
Out[649]: {0: (0,), 4: (3,), 5: (1,), 10: None}
Sign up to request clarification or add additional context in comments.

5 Comments

If i really want to implement things the way i want, this seems like the only feasible option. Is there any issues that can be cause by those empty elements? The ids usually 6 digits and don't have order. So that might be an issue. I guess i should look at other ways to implement that to not involve numpy
One way or other you have to decide what to do about the 'empty' elements. Do they represent 0, not valid entries (masked), or what? There is a scipy.sparse module that represents arrays with lots of zeros, but it can't handle 'objects', just real numbers. And one of it's formats is a dictionary like yours.
Another option is to subclass dictionary, so it accepts a list or tuple of keys, returning a portion of itself. scipy.sparse.dok_matrix class has such a __getitem__ extension.
This solution is very similar to what i did, see above update
boxes1[b]=boxes[b] IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.