5

I was under the impression that numpy would be faster for list operations, but the following example seems to indicate otherwise:

import numpy as np
import time

def ver1():
    a = [i for i in range(40)]
    b = [0 for i in range(40)]
    for i in range(1000000):
        for j in range(40):
            b[j]=a[j]

def ver2():
    a = np.array([i for i in range(40)])
    b = np.array([0 for i in range(40)])
    for i in range(1000000):
        for j in range(40):
            b[j]=a[j]

t0 = time.time()
ver1()
t1 = time.time()
ver2()
t2 = time.time()

print(t1-t0)
print(t2-t1)

Output is:

4.872278928756714
9.120521068572998

(I'm running 64-bit Python 3.4.3 in Windows 7, on an i7 920)

I do understand that this isn't the fastest way to copy a list, but I'm trying to find out if I'm using numpy incorrectly. Or is it the case that numpy is slower for this kind of operation and is only more efficient in more complex operations?

EDIT:

I also tried the following, which just just does a direct copy via b[:] = a, and numpy is still twice as slow:

import numpy as np
import time

def ver6():
    a = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    b = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    for i in range(1000000):
        b[:] = a

def ver7():
    a = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
    b = np.array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
    for i in range(1000000):
        b[:] = a

t0 = time.time()
ver6()
t1 = time.time()
ver7()
t2 = time.time()

print(t1-t0)
print(t2-t1)

Output is:

0.36202096939086914
0.6750380992889404
6
  • 3
    NumPy newbie rule of thumb: If your code has the word for in it, you're not getting the benefits of NumPy there. Commented Jan 26, 2016 at 18:10
  • 2
    Constructing the numpy arrays takes some time. After they're constructed, however, further operations are much quicker than using a vanilla Python list. Since you are constructing two new numpy arrays in every loop iteration, it only makes sense that it would be much slower than using Python lists. Commented Jan 26, 2016 at 18:20
  • @pzp No, the numpy arrays are only created once. Commented Jan 26, 2016 at 18:23
  • @pzp and for thinking exactly that reason, I changed the code to construct the arrays outside of the functions, reduced it to a single function and timed it without that factor. Still the same. Commented Jan 26, 2016 at 18:23
  • @roganjosh Are you sure? I just timed it myself without including the array construction and with user2357112's proper use of numpy, and numpy killed vanilla--it was not even close. Also, this is a terribly constructed test... the results are being cached. Commented Jan 26, 2016 at 19:26

2 Answers 2

6

You're using NumPy wrong. NumPy's efficiency relies on doing as much work as possible in C-level loops instead of interpreted code. When you do

for j in range(40):
    b[j]=a[j]

That's an interpreted loop, with all the intrinsic interpreter overhead and more, because NumPy's indexing logic is way more complex than list indexing, and NumPy needs to create a new element wrapper object on every element retrieval. You're not getting any of the benefits of NumPy when you write code like this.

You need to write the code in such a way that the work happens in C:

b[:] = a

This would also improve the efficiency of the list operation, but it's much more important for NumPy.

Sign up to request clarification or add additional context in comments.

17 Comments

@L3viathan: That wouldn't help; in fact, it'd be outright wrong. Really, the arrays should be np.arange(40) and numpy.zeros([40]).
Hi, I tried it with b[:] = a, the vanilla python is still more than twice as fast as numpy.
@CaptainCodeman: That's from a combination of 3 factors: the inputs are fairly small, very little allocation is involved, and the Python list gets to push the work into C too. If you try it with larger arrays, or if you try a mathematical operation (say, elementwise addition), the NumPy array will be way faster.
@CaptainCodeman: Depends on how much math you're doing, and how well you take advantage of NumPy's features when you're doing that math. Even for arrays of this size, NumPy is way faster than Python built-in data types for math.
@roganjosh: It's worth noting that for actual math, NumPy starts winning at a much lower array length. For example, a+b with arrays beats [x+y for x, y in zip(a, b)] for lists at a length of about 10, and numpy.log(a) beats [math.log(x) for x in a] at a length of about 7.
|
1

Most of what you are seeing is Python object creation from C native types.

A Python list is, at it's heart, an array of PyObject pointers. When a and b are both Python lists, doing b[i] = a[i] will imply:

  • decreasing the reference count of the object pointed by b[i],
  • increasing the reference count of the object pointed by a[i], and
  • copying the address stored in a[i] into b[i].

But if a and b are NumPy arrays, things are a little more ellaborate, and the same b[i] = a[i] then requires:

  • creating a Python integer object from the native C integer type stored at a[i], see this,
  • converting the Python integer object into a native C integer type, and storing its value in b[i], see here, and
  • decreasing the reference count of the temporary Python integer object.

So the difference is mostly in creating and disposing of that intermediate Python object, that lists do not need to do.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.