I have been trying recently to improve the performance (and I mean processing time here) of a piece of code I write with Python3.5 (working on Ubuntu 16.04). My code performs a cosine Fourier transform and I ultimately perform it a lot of time, thus it takes many, many hours...
My laptop is a bit old so I am not convinced multi-threading will help. Anyways, I am more interested in coding the calculation itself to speed things up. Here is the code with my attempt at improving things.
import numpy as np
import time
import math
#== Define my two large numpy arrays ==#
a = np.arange( 200000 )
b = np.arange( 200000 )
#===============#
#== First way ==#
#===============#
t1 = time.time()
#== Loop that performs 1D array calculation 50 times sequentially ==#
for i in range(0, 50):
a * np.cos( 2 * math.pi * i * b )
t2 = time.time()
print( '\nLoop computation with 1D arrays: ', (t2-t1)*1000, ' ms' )
#================#
#== Second way ==#
#================#
t1 = time.time()
#== One liner to use 1D and 2D arrays at once ==#
a * np.cos( 2 * math.pi * ( np.arange( 50 ) )[:, None] * b )
t2 = time.time()
print( '\nOne liner using both 1D and 2D arrays at once: ', (t2-t1)*1000, ' ms\n' )
I need to perform in this case the calculation 50 times, with large Numpy arrays. I used to do 1D array calculation using a loop to sequentially do it as many time as needed.
I more recently tried to use the power of the Numpy vectorization to do the calculation on-line with 2D array calculation. It turns out the 2D array calculation takes more time, as the output shows you:
Loop computation with 1D arrays: 354.66670989990234 ms
One liner using both 1D and 2D arrays at once: 414.03937339782715 ms
I did not expect that. Maybe given the large arrays, the memory overhead slows down the calculation? Or my laptop CPU is overwhelmed a bit more?
So my question is: what is the most performant/fastest way to proceed for this specific case?
UPDATE: I tried Subhaneil Lahiri's Numba suggestion adding the following lines of code to call it twice (still not storing any results):
#===============#
#== Third way ==#
#===============#
t1 = time.time()
@nb.jit(cache=True)
def cos_matrix(a, b, niter):
for i in range(niter):
a * np.cos(2 * math.pi * i * b)
cos_matrix( a, b , 50 )
t2 = time.time()
print( '\nLoop computation using Numba and 1D arrays: ', (t2-t1)*1000, ' ms' )
t1 = time.time()
cos_matrix( a, b , 50 )
t2 = time.time()
print( '\nSecond call to loop computation using Numba and 1D arrays: ', (t2-t1)*1000, ' ms\n' )
And it unfortunately does not improve the result as you can see:
Loop computation with 1D arrays: 366.67585372924805 ms
One liner using both 1D and 2D arrays at once: 417.5834655761719 ms
Loop computation using Numba and 1D arrays: 590.1947021484375 ms
Second call to loop computation using Numba and 1D arrays: 458.58097076416016 ms
Thanks a lot in advance, Antoine.