7

I have seen questions similar to this, but not one directly addressing the issue. I have timed the following two ways of populating the array and half the time using np.zeros() is faster and half the time doing it directly is faster. Is there a preferable way? I am quite new to using numpy arrays, and have gotten involved with the aim of speeding up my code rather without too much thought to readability.

import numpy as np
import time

lis = range(100000)

timer = time.time()
list1 = np.array(lis)
print 'normal array creation', time.time() - timer, 'seconds'

timer = time.time()
list2 = np.zeros(len(lis))
list2.fill(lis)
print 'zero, fill - array creation', time.time() - timer, 'seconds'

Thank you

2
  • The pythonic way to benchmark execution speed is using the timeit module. Commented Dec 2, 2011 at 11:18
  • @mac ok I will use that from now on. This is pretty much the first time/profile(cProfiler) I have needed to time my functions Commented Dec 2, 2011 at 11:23

4 Answers 4

6

If you have a list of floats a=[x/10. for x in range(100000)], then you can create an array with:

np.array(a) # 9.92ms
np.fromiter(a, dtype=np.float) # 5.19ms

Your approach

list2 = np.zeros(len(lis))
list2.fill(lis)

won't work as expected. The .fill fills the whole array with one value.

Sign up to request clarification or add additional context in comments.

2 Comments

sorry should have been more explicit, this code was just for testing the speed, it will be filled with real data when I am going to use it. (the data points will be floats)
For the np.array vs. np.fromiter I'm surprised the second is faster. If it's an iterator numpy won't know how much memory to allocate at first. (It must be checking if it can get a length.) The better performance is because you're telling numpy the explicit type to use. If you passed the dtype to np.array that would be faster still.
2

The first list can be created faster with the arange numpy function:

list3 = np.arange(100000)

You can also find useful the linspace function.

Comments

2

np.fromiter will pre-allocate the output array if given the number of elements:

a = [x/10. for x in range(100000)] # 10.3ms
np.fromiter(a, dtype=np.float) # 3.33ms
np.fromiter(a, dtype=np.float, count=100000) # 3.03ms

Comments

1

Your list2 example simply doesn't work—if you inspect list2, you'll find that it still contains all zeroes. I find that pursuing readability is not just a good aim in and of itself. It also results in an increased likelihood of correct code.

1 Comment

Should have been a comment... :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.