Python running slower than MATLAB

Question

I am translating a numerical method that I wrote in MATLAB to Python. For some reason the Python code, which is almost identical, runs considerably slower. Here U and V are the unknowns which are solved for at every timestep. The size of U[:,n] and V[:,n] is 700x1. The rest of the variables (dt, A, and denom) are constants. Here is the loop (numpy has been imported as *):

for n in range(0, 400):
    UnVn2 = fft.fft(U[:, n] * V[:, n] ** 3)
    U[:, n +1 ] = fft.ifft((fft.fft(U[:, n]) / dt - UnVn2 + A) / denom)
    V[:, n + 1] = fft.ifft((fft.fft(V[:, n]) / dt + UnVn2) / denom)

Any suggestions? Thanks greatly.

What python runtime? Is it using a JIT compiler? Are the BLAS and FFT routines as fast as the ones packaged with MatLab? IIRC, MatLab uses Intel and AMD binaries which are very heavily optimized. — Ben Voigt
– Ben Voigt, Commented Jan 10, 2013 at 22:23
I am still very new to Python, but I am running the code just by importing the file into IDLE. Is that called JIT? The Python runtime seems to be at least 3 times over MATLAB. I had assumed that the numpy routines would be fully optimized since it is so commonly used for scientific computing. — Doubt
– Doubt, Commented Jan 10, 2013 at 22:32
numpy is capable of using those optimized routines. See software.intel.com/en-us/articles/numpyscipy-with-intel-mkl But use the optimized library most appropriate for your processor. — Ben Voigt
– Ben Voigt, Commented Jan 10, 2013 at 22:36

El Developer · Accepted Answer · 2013-01-11 01:03:19Z

3

See this for instructions on making python and numpy use the same accelerated FFT routines that ship with MATLAB.

If you have an AMD processor, see these instructions instead.

edited Jan 11, 2013 at 1:03

El Developer

3,3341 gold badge24 silver badges41 bronze badges

answered Jan 10, 2013 at 22:36

Ben Voigt

286k45 gold badges444 silver badges764 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

pwuertz · Accepted Answer · 2014-01-05 15:15:56Z

I am not sure of why Python is slower than Matlab, but...

The FFT, as a fourier transform, has a number of properties, which yield most (all) of your FFT operations unnecessary:

def func1(U, V, dt, denom, A) :
    UnVn2 = np.fft.fft(U * V**3)
    U_ = np.fft.ifft((np.fft.fft(U) / dt - UnVn2 + A) / denom)
    V_ = np.fft.ifft((np.fft.fft(V) / dt + UnVn2) / denom)
    return np.vstack((U_, V_))

def func2(U, V, dt, denom, A) :
    UnVn2 = U * V**3
    U_ = (U / dt - UnVn2) / denom
    U_[0] += A / denom
    V_ = (V / dt + UnVn2) / denom
    return np.vstack((U_, V_))

U = np.random.rand(700)
V = np.random.rand(700)
dt, denom, A = tuple(np.random.rand(3))

>>> func1(U, V, dt, denom, A)
array([[ 2.35201751 -1.11022302e-16j,  0.81099082 -2.45463372e-16j,
         0.48451858 +2.15658782e-18j, ...,  2.23237712 -5.24753851e-16j,
         1.15264205 -2.31140087e-16j,  1.06670009 +1.28369537e-16j],
       [ 2.89314136 +8.67361738e-17j,  3.65612404 -7.80625564e-17j,
         3.31383830 +8.96916836e-17j, ...,  0.90415910 +6.27969898e-16j,
         3.03505664 +4.72358723e-16j,  0.64669863 +4.99600361e-16j]])
>>> func2(U, V, dt, denom, A)
array([[ 2.35201751,  0.81099082,  0.48451858, ...,  2.23237712,
         1.15264205,  1.06670009],
       [ 2.89314136,  3.65612404,  3.3138383 , ...,  0.9041591 ,
         3.03505664,  0.64669863]])
>>> np.max(np.abs(func1(U, V, dt, denom, A) - func2(U, V, dt, denom, A)))
1.5151595604785605e-15

And of course:

>>> import timeit
>>> timeit.timeit('func1(U, V, dt, denom, A)', 'from __main__ import func1, U, V, dt, denom, A', number=400)
0.14169366197616284
>>> timeit.timeit('func2(U, V, dt, denom, A)', 'from __main__ import func2, U, V, dt, denom, A', number=400)
0.06098524703428154

Which I have to admit is less than I was expecting, but it is still almost 3x faster.

EDIT The speed from not doing FFTs seemed too small, so I modified func1 and func2 to return a tuple with (U_, V_) and run the following code:

from time import clock
U = np.zeros((700,400), dtype=np.float)
V = np.zeros((700,400), dtype=np.float)
U[:,0] = np.random.rand(700)
V[:,0] = np.random.rand(700)
dt, denom, A = tuple(np.random.rand(3))
t = clock()
for j in xrange(399) :
    U[:, j+1], V[:, j+1] = func1(U[:, j], V[:, j], dt, denom, A)
print clock() - t
t = clock()
for j in xrange(399) :
    U[:, j+1], V[:, j+1] = func2(U[:, j], V[:, j], dt, denom, A)
print clock() - t

The printed output was 11.5148652438 and 0.321673111194 so the speed-up in the actual problem setting is more like x30.

I also timed pwuertz's proposal, with no significant improvement, 11.1805414552 and 0.297830755317 for the following code:

U = np.zeros((400, 700), dtype=np.float)
V = np.zeros((400, 700), dtype=np.float)
U[0] = np.random.rand(700)
V[0] = np.random.rand(700)
dt, denom, A = tuple(np.random.rand(3))
t = clock()
for j in xrange(399) :
    U[j+1], V[j+1] = func1(U[j], V[j], dt, denom, A)
print clock() - t
t = clock()
for j in xrange(399) :
    U[j+1], V[j+1] = func2(U[j], V[j], dt, denom, A)
print clock() - t

It does look much, much neater, though.

Getting rid of the fft is certainly a big improvement. The impact of memory alignment however depends on the size of the array and the hardware. On my system I'm seeing an improvement of ~30% when switching the axes, and a factor of 2 for larger arrays (2000x2000). It's just a general advise to keep things like this in mind ;)

pwuertz · Accepted Answer · 2013-01-10 23:30:23Z

1

I'm not sure how MatLab organizes the axes in multi-dimensional arrays, but I'm pretty sure numpy uses the C-like row-major order (edit: Wikipedia even mentions that MatLab uses column-major order ;) ).

Since you're operating on single columns only all your operations must iterate over rows. With row-major ordering this is generally less efficient than iterating over whole rows. Consider transposing the layout of your 2d arrays and you should get a noticeable increase in performance.

edited Jan 10, 2013 at 23:30

answered Jan 10, 2013 at 23:16

pwuertz

1,3258 silver badges15 bronze badges

Collectives™ on Stack Overflow

Python running slower than MATLAB

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related