fastest way to populate a 1D numpy array

Question

I have seen questions similar to this, but not one directly addressing the issue. I have timed the following two ways of populating the array and half the time using np.zeros() is faster and half the time doing it directly is faster. Is there a preferable way? I am quite new to using numpy arrays, and have gotten involved with the aim of speeding up my code rather without too much thought to readability.

import numpy as np
import time

lis = range(100000)

timer = time.time()
list1 = np.array(lis)
print 'normal array creation', time.time() - timer, 'seconds'

timer = time.time()
list2 = np.zeros(len(lis))
list2.fill(lis)
print 'zero, fill - array creation', time.time() - timer, 'seconds'

Thank you

The pythonic way to benchmark execution speed is using the timeit module. — mac
– mac, Commented Dec 2, 2011 at 11:18
@mac ok I will use that from now on. This is pretty much the first time/profile(cProfiler) I have needed to time my functions — Anake
– Anake, Commented Dec 2, 2011 at 11:23

eumiro · Accepted Answer · 2011-12-02 11:25:56Z

6

If you have a list of floats a=[x/10. for x in range(100000)], then you can create an array with:

np.array(a) # 9.92ms
np.fromiter(a, dtype=np.float) # 5.19ms

Your approach

list2 = np.zeros(len(lis))
list2.fill(lis)

won't work as expected. The .fill fills the whole array with one value.

edited Dec 2, 2011 at 11:25

answered Dec 2, 2011 at 11:15

eumiro

214k36 gold badges307 silver badges264 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Anake Over a year ago

sorry should have been more explicit, this code was just for testing the speed, it will be filled with real data when I am going to use it. (the data points will be floats)

AFoglia Over a year ago

For the np.array vs. np.fromiter I'm surprised the second is faster. If it's an iterator numpy won't know how much memory to allocate at first. (It must be checking if it can get a length.) The better performance is because you're telling numpy the explicit type to use. If you passed the dtype to np.array that would be faster still.

vonPetrushev · Accepted Answer · 2011-12-02 11:27:49Z

2

The first list can be created faster with the arange numpy function:

list3 = np.arange(100000)

You can also find useful the linspace function.

answered Dec 2, 2011 at 11:27

vonPetrushev

5,6297 gold badges43 silver badges51 bronze badges

Comments

kwgoodman · Accepted Answer · 2012-05-05 16:30:38Z

2

np.fromiter will pre-allocate the output array if given the number of elements:

a = [x/10. for x in range(100000)] # 10.3ms
np.fromiter(a, dtype=np.float) # 3.33ms
np.fromiter(a, dtype=np.float, count=100000) # 3.03ms

answered May 5, 2012 at 16:30

kwgoodman

2,10816 silver badges7 bronze badges

Comments

Michael Hoffman · Accepted Answer · 2011-12-02 11:16:36Z

1

Your list2 example simply doesn't work—if you inspect list2, you'll find that it still contains all zeroes. I find that pursuing readability is not just a good aim in and of itself. It also results in an increased likelihood of correct code.

answered Dec 2, 2011 at 11:16

Michael Hoffman

34.8k7 gold badges68 silver badges91 bronze badges

1 Comment

mac Over a year ago

Should have been a comment... :)

Collectives™ on Stack Overflow

fastest way to populate a 1D numpy array

4 Answers 4

2 Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related