1

I am trying to create a numpy array of subclassed numpy arrays. Unfortunately, when I create my new array of subclasses, numpy automatically upcasts the elements of my array to numpy.ndarray.

The code below shows what I am trying to do. dummy_class inherits from numpy.ndarray and contains some extra functionality(which is not important for the problem at hand). I create two new arrays using the dummy_class constructor and want to put each of these subclassed arrays in a new numpy_ndarray. When the problematic array gets initialized, the type of the subclassed arrays gets automatically upcast from dummy_class to numpy.ndarray. Some code to reproduce the problem can be found below

import numpy

class dummy_class(numpy.ndarray):
    def __new__(cls, data, some_attribute):
        obj = numpy.asarray(data).view(cls)
        obj.attribute = some_attribute
        return obj

array_1 = dummy_class([1,2,3,4], "first dummy")
print type(array_1)
# <class '__main__.dummy_class'>

array_2 = dummy_class([1,2,3,4], "second dummy")
print type(array_2)
# <class '__main__.dummy_class'>

the_problem = numpy.array([array_1, array_2])
print type(the_problem)
# <type 'numpy.ndarray'>
print type(the_problem[0])
# <type 'numpy.ndarray'>
print type(the_problem[1])
# <type 'numpy.ndarray'>
4
  • 2
    Why do you want to do that? if you want an array of arrays numpy is nkt your tool. The idea of numpy is to have memory efficent contiguous data in a N-dimensional array. If you can't put your data in 2D or 3D you should go for standard lists of numpy arrays. Anyway, try passing dtype=yourtype to the ndarrsy theproblem. Commented Jan 9, 2015 at 12:21
  • @iluengo, I disagree. If the arrays have all the same shape, and you want to add some features/methods, this is actually fine. It will be fast and convenient. See example in my answer. Commented Jan 9, 2015 at 15:15
  • 1
    I'm not trying to say that it can't be done. It is just that numpy is not built for such that purposes. Subclassing numpy is ok, but making numpy arrays of numpy arrays is the opposite of what numpy tries to achieve. If you want such things you can use pandas that is built on top of numpy (assuming as you say that all the arrays have the same shape). Commented Jan 9, 2015 at 16:47
  • Numpy perfectly works with subclassed arrays as shown below, as long as they have the same shape. You will get memory efficient contiguous data, consistent with your idea of numpy that you mention in your first comment. The top array the_problem has shape (2,2,3), and is a perfectly valid,efficient numpy array by itself. The other discussion about what numpy tries to achieve or what numpy is built for is opinionated. Commented Jan 9, 2015 at 18:38

2 Answers 2

3

This is how you can fill a NumPy array with arbitrary Python objects:

the_problem = np.empty(2, dtype='O')
the_problem[:] = [array_1, array_2]

I agree with iluengo that making a NumPy array of arrays is not taking advantage of NumPy's strengths because doing so requires the outer NumPy array to be of dtype object. Object arrays require about the same amount of memory as a regular Python list, require more time to build than an equivalent Python list, are no faster at computation than an equivalent Python list. Perhaps their only advantage is that they offer the ability to use NumPy array indexing syntax.

Sign up to request clarification or add additional context in comments.

1 Comment

This is true only as long as the_problem results in having a dtype=object . This does not hold if you want to have a large array of arrays, and want to subclass the smaller arrays to provide additional functionality. See my answer for an example of what I mean.
0

Please refer to the official example of the numpy documentation, here.

I think the main ingredient missing above is an implementation of __array_finalize__().

The example InfoArray() provided in the link correctly works as expected, without the hack of having to specify the dtype of the newly created array as argument:

shape1 = (2,3)
array_1 = InfoArray(shape1)
print type(array_1)
#<class '__main__.InfoArray'>

shape2 = (1,2)
array_2 = dummy_class(shape2)
the_problem = numpy.array([array_1, array_2])
print type(the_problem)
#<type 'numpy.ndarray'>

print type(the_problem[0])
#<class '__main__.InfoArray'>

Moreover, it is useful to subclass a numpy array, and to aggregate many of them into a larger array like the_problem as reported above if the the resulting aggregate is a numpy array that is not of type object.

As an example, say that array_1 and array_2 have the same shape:

shape = (2,3)
array_1 = InfoArray(shape)
array_2 = InfoArray(shape)
the_problem = numpy.array([array_1, array_2])

Now the dtype of the_problem is not an object, and you can efficiently calculate for example the min as the_problem.min(). You can't do this if you use lists of your subclassed numpy arrays.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.