First make sure you understand what the variable assignment does:
my_array = numpy.empty(10000, numpy.float)
my_array = numpy.fromiter(...)
the second assignment replaces the first. The object that the my_array originally referenced is free and gets garbage collected. That's just basic Python variable handling. To hang on to the original array (a mutable object), you have to change its values,
my_array[:] = <new values>
But the process that generates <new values> will, more than likely, create a temporary buffer (or two or three). Those values are then copied to the target. Even x += 1 does a buffered calculation. There are few in-place numpy operations.
Generally trying to second guess numpy's memory allocation doesn't work. Efficiency can only be measured by time tests, not by guessing what is happening under the covers.
Don't bother with 'pre-allocation' unless you need to fill it iteratively:
In [284]: my_array = np.empty(10, int)
In [285]: for i in range(my_array.shape[0]):
...: my_array[i] = 2*i+3
In [286]: my_array
Out[286]: array([ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21])
Which is a horrible way of creating an array compared to:
In [288]: np.arange(10)*2+3
Out[288]: array([ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21])
the fromiter approach is prettier but not faster.
In [290]: np.fromiter((i*2+3 for i in range(10)),int)
Out[290]: array([ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21])
Some timings:
In [292]: timeit np.fromiter((i*2+3 for i in range(10000)),int)
100 loops, best of 3: 4.76 ms per loop
# giving a count drops the time to 4.28 ms
In [293]: timeit np.arange(10000)*2+3
The slowest run took 8.73 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 47.4 µs per loop
In [294]: %%timeit
...: my_array=np.empty(10000,int)
...: for i in range(my_array.shape[0]):
...: my_array[i] = 2*i+3
...:
100 loops, best of 3: 4.72 ms per loop
In [303]: timeit np.array([i*2+3 for i in range(10000)],int)
100 loops, best of 3: 4.48 ms per loop
fromiter takes just as long as an explicit loop, while the pure numpy solution is orders of magnitude faster. Timewise there is little difference between np.array with a list comprehension and fromiter with the generator.
Creating the array from a pre-existing list takes about 1/3 the time.
In [311]: %%timeit alist=[i*2+3 for i in range(10000)]
...: x=np.array(alist, int)
...:
1000 loops, best of 3: 1.63 ms per loop
Assigning a list to an existing empty array isn't faster.
In [315]: %%timeit alist=[i*2+3 for i in range(10000)]
...: arr = np.empty(10000,int)
...: arr[:] = alist
1000 loops, best of 3: 1.65 ms per loop
In [316]: %%timeit alist=[i*2+3 for i in range(10000)]; arr=np.empty(10000,int)
...: arr[:] = alist
1000 loops, best of 3: 1.63 ms per loop
There are some numpy functions that take an out parameter. You may save some time by reusing an array that way. np.cross is one function that takes advantage of this (the code is Python and readable).
Another 'vectorized' way of creating values from a scalar function:
In [310]: %%timeit f=np.frompyfunc(lambda i: i*2+3,1,1)
...: f(range(10000))
...:
100 loops, best of 3: 8.31 ms per loop
np.fromiterdoesn't do any further allocation. That's the whole essence of that function. Also, you don't need to usenp.emptyif you want to change all of the items at once.fromitersays that it creates an array. I assumed that it creates a numpy array and then that array gets moved by the operator = to themy_array. But if you know for a fact that no new allocation is done I will believe you.for ind, elem in enumerate(iterable): my_array[ind] = elem.intobjects.