4

I would like to handle class attributes without going through a Python for loop. To handle large arrays, numpy is the best/fastest but is it possible to access class attributes within a numpy array? Consider the following simplistic code:

import numpy as np

class MyClass():
    def __init__(self):
        self.myvar1 = 10
        self.myvar2 = 20

myarray1 = np.arange(0, 1000, 1)
myarray2 = np.array([MyClass() for i in range(1000)])

All the values of myarray1 would be easily modifiable through one line:

myarray1 += 5

But how can I access myvar1 of all of the MyClass instances in myarray2 and modify it in one go? (is it even possible?) I know that the following does not work but it gives the idea of what I want to achieve:

myarray2.myvar1 += 5
myarray2[myarray2.myvar1] += 5

I have been looking around a lot to find a solution and the closest thing I could find is numpy's recarray that can kind of mimic Python classes, but it does not seem to be a solution for me as the class I am using is a subclass (a pyglet Sprite to be exact) so I do need to use a Python class.

Edit

Following up on hpaulj comment, I am trying to use a vectorized function of the class to update its attribute. Is it an efficient way of updating all the instances of the class?

class MyClass():
    def __init__(self):
        self.myvar1 = 10
        self.myvar2 = 20
    def modifyvar(self):
        self.myvar1 += 5
        return self

vecfunc = np.vectorize(MyClass.modifyvar)
myarray2 = np.array([MyClass() for i in range(1000)])
myarray2 = vecfunc(myarray2)

However, another problem arises: when use this code, myarray2[0].myvar1 returns 20 instead of 15! myarray2[1].myvar1 does return 15, same goes for the rest of the array. Why is myarray2[0] different here?


Solution

Vectorizing a function of the class allows handling the attribute of several of its instances without a for loop. The code of the solution:

class MyClass():
    def __init__(self):
        self.myvar1 = 10
        self.myvar2 = 20
    def modifyvar(self):
        self.myvar1 += 5
        return self

vecfunc = np.vectorize(MyClass.modifyvar, otypes=[object])
myarray2 = np.array([MyClass() for i in range(1000)])
vecfunc(myarray2)

Note: add otype=[object] when using vectorize and dealing with objects.

6
  • Is the only reason that you want to avoid a for loop because you hope/expect it to be faster without one? Commented Feb 27, 2015 at 21:23
  • 1
    It is one of the main reasons but, even if it does not improve performance, I would still like to know if it is possible to access attributes of an object within a numpy array. Commented Feb 27, 2015 at 21:29
  • myarray2 has dtype=object. So each entry in its data buffer is a pointer to an instance elsewhere in memory. The vectorized (using compiled code) operations for that type of array are quite rudimentary. Commented Feb 27, 2015 at 22:39
  • recarray is an overlay on structured arrays. The data is stored as constant size 'tuples' (as opposed to numbers or pointers). With a structured array you could use myarray3['myvar1'] +=1 Commented Feb 27, 2015 at 22:51
  • Thank you for the help. I don't see how a recarray could update all of the python class instances at once, could you explain a bit? For vectorized operations, please see my edit and let me know if I am going in the right direction. Commented Feb 27, 2015 at 23:47

1 Answer 1

2

The extra application of modifyvar to the 1st element results from vectorize trying to determine the type of array to return. Specifying the otypes gets around that problem:

vecfunc = np.vectorize(MyClass.modifyvar,otypes=[object])

With this 'inplace' modifier, you don't need to pay attention to what is returned:

vecfunc(myarray2)

is sufficient.

From the vectorize documentation:

The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

If you defined an add5 method like:

    def add5(self):
        self.myvar1 += 5
        return self.myvar1

then

vecfunc = np.vectorize(MyClass.add5,otypes=[int])
vecfunc(myarray2)

would return a numeric array, and modify myarray2 at the same time:

array([15, 15, 15, 15, 15, 15, 15, 15, 15, 15])

to display the values I use:

[x.myvar1 for x in myarray2]

I really should define a vectorized 'print'.

This looks like one of the better applications of vectorize. It doesn't give you any compiled speed, but it does let you use the array notation and broadcasting while operating on your instances one by one. For example vecfunc(myarray2.reshape(2,5)) returns a (2,5) array of values.

Sign up to request clarification or add additional context in comments.

3 Comments

Very nice! It works and the performance is greatly improved when compared to using a for loop. Thank you :)
Might have overstated it for the performance, needs further testing. But it is still a nice way to handle classes
wiki.scipy.org/Cookbook/Obarray - suggestion to store instance attributes in a record array, and create instances on the fly to access methods.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.