1

I am working with large arrays representing a grid, each element is a Cell object with x,y attributes.

I am not sure the most efficient way to initialize the arrays, my basic implementation is :

# X,Y dimensions of grid:
Gx = 3000
Gy = 4000

    # Array to create
    A = numpy.ndarray(shape=(int(self.Gx),int(self.Gy)),dtype=object)

for y in range(0,int(self.Gy)):
             for x in range (0,int(self.Gx)):       
              c = Cell(1,x,y,1)
              A.itemset((x,y),c)

Clearly, this is not efficient for large arrays. I know how to create large array of objects and use vectorize to access them all at once. What I can't figure out is how to apply an array of indices (via A.indices) in a single function that doesn't require iterating over the entire array.

Each Cell objects does have a setX and setY function, can I pass functions the array of indices to set each cell's y value in a single line?

6
  • Please give us a minimal working example. We don't know what Gy and Gx is and why you always create the list R without using it. Commented Oct 28, 2018 at 17:39
  • 'Efficient' in numpy means doing stuff in compiled numpy code, which is built around numeric dtypes. Your array of objects is object dtype. numpy iterates over those objects much like Python does with a list of the same - but numpy's iteration is slower. We might be able to suggest improvements to a working list based example, but can't promise numpy like efficiency. Commented Oct 28, 2018 at 17:46
  • Related post: stackoverflow.com/questions/32831839/…; stackoverflow.com/questions/42067429/… Commented Oct 28, 2018 at 18:06
  • Reviewing my earlier answers, it's apparent that np.frompyfunc is the fastest tool for iterating over an array of objects. It can be used to create of objects, and can be used to access attributes and methods. Speed is comparable to a well written list comprehensions over the same number of objects. Commented Oct 28, 2018 at 18:26
  • Updated the code to a minimal working example. Can you give an example of using np.frompyfunc? Commented Oct 28, 2018 at 18:36

1 Answer 1

0

Define a simple class:

class Cell():
    def __init__(self,x,y):
        self.x=x
        self.y=y
    def setX(self,x):
        self.x=x
    def __repr__(self):
        return f'Cell({self.x},{self.y})'

A way of creating an array of these objects:

In [653]: f = np.frompyfunc(Cell, 2, 1)
In [654]: arr = f(np.arange(3)[:,None], np.arange(4))
In [655]: arr
Out[655]: 
array([[Cell(0,0), Cell(0,1), Cell(0,2), Cell(0,3)],
       [Cell(1,0), Cell(1,1), Cell(1,2), Cell(1,3)],
       [Cell(2,0), Cell(2,1), Cell(2,2), Cell(2,3)]], dtype=object)
In [656]: arr.shape
Out[656]: (3, 4)

A list way of creating the same objects:

In [658]: [[Cell(i,j) for i in range(3)] for j in range(4)]
Out[658]: 
[[Cell(0,0), Cell(1,0), Cell(2,0)],
 [Cell(0,1), Cell(1,1), Cell(2,1)],
 [Cell(0,2), Cell(1,2), Cell(2,2)],
 [Cell(0,3), Cell(1,3), Cell(2,3)]]

Some comparative timings:

In [659]: timeit arr = f(np.arange(3)[:,None], np.arange(4))
13.5 µs ± 73.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [660]: timeit [[Cell(i,j) for i in range(3)] for j in range(4)]
8.3 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [661]: timeit arr = f(np.arange(300)[:,None], np.arange(400))
64.9 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [662]: timeit [[Cell(i,j) for i in range(300)] for j in range(400)]
78 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

For large sets, the frompyfunc approach has a modest speed advantage.

Fetching the values from all cells:

In [664]: np.frompyfunc(lambda c: c.x, 1, 1)(arr)
Out[664]: 
array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]], dtype=object)

Using the SetX method:

In [665]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(12).reshape(3,4))
Out[665]: 
array([[None, None, None, None],
       [None, None, None, None],
       [None, None, None, None]], dtype=object)
In [666]: arr
Out[666]: 
array([[Cell(0,0), Cell(1,1), Cell(2,2), Cell(3,3)],
       [Cell(4,0), Cell(5,1), Cell(6,2), Cell(7,3)],
       [Cell(8,0), Cell(9,1), Cell(10,2), Cell(11,3)]], dtype=object)

SetX doesn't return anything, so the array produced by function call is all None. But it has modified all elements of arr. Like list comprehensions, we don't normally use frompyfunc calls for side effects, but it is possible.

np.vectorize, in it's default (and original) form, just uses frompyfunc, and adjusts the dtype of the return. frompyfunc always returns object dtype. Newer versions of vectorize have a signature parameter, allowing us to pass arrays (as opposed to scalars) to the function, and get back arrays. But this processing is even slower.

Defining array of objects like this may make your code look cleaner and better organized, but they can never match numeric numpy arrays in terms of speed.


Given the definition of Cell I can set the attributes to arrays, e.g.

Cell(np.arange(3), np.zeros((3,4)))

But to set the values of an array of Cell, I have to construct an object array first:

In [676]: X = np.zeros(3, object)
In [677]: for i,row in enumerate(np.arange(6).reshape(3,2)): X[i]=row
In [678]: X
Out[678]: array([array([0, 1]), array([2, 3]), array([4, 5])], dtype=object)
In [679]: np.frompyfunc(Cell.setX, 2, 1)(arr, X[:,None])
Out[679]: 
array([[None, None, None, None],
       [None, None, None, None],
       [None, None, None, None]], dtype=object)
In [680]: arr
Out[680]: 
array([[Cell([0 1],0), Cell([0 1],1), Cell([0 1],2), Cell([0 1],3)],
       [Cell([2 3],0), Cell([2 3],1), Cell([2 3],2), Cell([2 3],3)],
       [Cell([4 5],0), Cell([4 5],1), Cell([4 5],2), Cell([4 5],3)]],
      dtype=object)

I could not pass a (3,2) array:

In [681]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(6).reshape(3,2))
ValueError: operands could not be broadcast together with shapes (3,4) (3,2) 

numpy preferentially works with multidimensional (numeric) arrays. Creating and using object dtype array requires some special tricks.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.