1

[Related]

In the snapshot below, I compare the speed of

  • modifying an existing array via slice assignment
  • just returning a new, modified array

It seems that the latter is faster. Why should this be the case?


EDIT: Updated with suggestions, and a version that uses numpy's vectorized add(), which is now the fastest.

enter image description here

2 Answers 2

1

I don't know much about python/numpy internals, but here's what I assume is happening. By just looking at the code, I get the impression that finline is doing more work than freturn, since finline has all the statements that freturn does (x + 1.0) and more.

Maybe this explains what's going on:

>>> x = np.random.rand(N)
>>> y = np.zeros(N)
>>> super(np.ndarray, y).__repr__()
Out[33]: '<numpy.ndarray object at 0x24c9c80>'
>>> finline(x, y)
>>> y     # see that y was modified
Out[35]: 
array([ 1.92772158,  1.47729293,  1.96549695, ...,  1.37821499,
        1.8672971 ,  1.17013856])
>>> super(np.ndarray, y).__repr__()
Out[36]: '<numpy.ndarray object at 0x24c9c80>'  # address of y did not change
>>> y = freturn(x)
>>> super(np.ndarray, y).__repr__()
Out[38]: '<numpy.ndarray object at 0x24c9fc0>'  # address of y changed

So essentially, I think that finline is doing more work because it has to iterate over the elements of y and initialize each of them to the array returned by the x + 1.0 operation. On the other hand, y = freturn(x) probably just reinitializes the value of the y pointer to be equal to the address of the array initialized by the x + 1.0 operation.

Sign up to request clarification or add additional context in comments.

7 Comments

Ok, I think you're saying that the inline version is doing an extra copy, i.e. the RHS of the calculation produces an intermediate that gets assigned to the memory of inline-y as a separate step? Perhaps. I am looking at the output of dis.dis() now...
@cjrh, yes, there will be an intermediate in this case. To avoid the intermediate you can do this: y[:] = x, y += 1.0. Is it faster?
@cjrh: That's right. I assume that this is sort of like the difference between copy and move constructors in C++.
@RomanL: I added a version for your suggestion. It appears to be slower than the others. I also added a np.add() version, which is now the fastest.
@cjrh: I don't see a reason why it would be slower, and it is faster on my machine. Of course, np.add is the way to go.
|
0
  • x + 1 will create a new array.
  • y[:] = x + 1: create a new array and copy all the data to y
  • y = x + 1: create a new array and bind name y to this new array.
  • np.add(x, 1, out=y): don't create a new array, it's the fastest.

Here is the code:

x = np.zeros(1000000)
y = np.zeros_like(x)
%timeit x + 1
%timeit y[:] = x + 1
%timeit np.add(x, 1, out=y)

the output:

100 loops, best of 3: 4.2 ms per loop
100 loops, best of 3: 6.83 ms per loop
100 loops, best of 3: 2.5 ms per loop

1 Comment

Thanks! I got there myself eventually, but have some points anyway :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.