Comparing Numpy and Matlab array summation speed

Question

I recently converted a MATLAB script to Python with Numpy, and found that it ran significantly slower. I expected similar performance, so I'm wondering if I'm doing something wrong.

As stripped-down example, I manually sum a geometric series:

MATLAB version:

function s = array_sum(a, array_size, iterations)
    s = zeros(array_size);
    for m = 1:iterations
        s = a + 0.5*s;
    end
end

% benchmark code
array_size = 500
iterations = 500
a = randn(array_size)
f = @() array_sum(a, array_size, iterations);
fprintf('run time: %.2f ms\n', timeit(f)*1e3);

Python/Numpy version:

import numpy as np
import timeit

def array_sum(a, array_size, iterations):
    s = np.zeros((array_size, array_size))
    for m in range(iterations):
        s = a + 0.5*s
    return s

array_size = 500
iterations = 500
a = np.random.randn(array_size, array_size)
timeit_iterations = 10
t1 = timeit.timeit(lambda: array_sum(a, array_size, iterations),
                   number=timeit_iterations)
print("run time: {:.2f} ms".format(1e3*t1/timeit_iterations))

On my machine, MATLAB completes in 58 ms. The Python version runs in 292 ms, or 5X slower.

I also tried speeding up the Python code by adding the Numba JIT decorator @jit('f8[:,:](i8, i8)', nopython=True), but the time only dropped to 236 ms (4X slower).

This is slower than I expected. Am I using timeit improperly? Is there something wrong with my Python code?

EDIT: edited so that the random matrix is created outside of benchmarked function.

EDIT 2: I ran the benchmark using Torch instead of Numpy (calculating the sum as s = torch.add(s, 0.5, a)) and it runs in just 52 ms on my computer!

You have nopython=True, but aren't you using NumPy funcs there? — Divakar
– Divakar, Commented Sep 6, 2017 at 6:00
@Divakar I think recent numba versions support some of the array allocation functions. @ LorenzForvang your test only performs element-wise operations, which are not implemented in BLAS as far I know (that is not to say they should be slower in numpy). — MB-F
– MB-F, Commented Sep 6, 2017 at 6:01
@Divakar yes, I'm using NumPy functions there. This page (numba.pydata.org/numba-doc/dev/reference/numpysupported.html) lists many NumPy functions are supported by Numba in nopython mode. But the run time is the same whether I set nopython to True or False. — Lorenz Forvang
– Lorenz Forvang, Commented Sep 6, 2017 at 6:06
Btw, fun fact: I changed the iteration to perform a matrix dot product s = r * s in Matlab and s = r @ s in Python. Matlab was still faster but only by a factor of 1.5. — MB-F
– MB-F, Commented Sep 6, 2017 at 6:20

seekiu · Accepted Answer · 2017-09-13 14:14:35Z

From my experience, when using numba's jit function it's usually faster to expand array operations into loops. So I tried to rewrite your python function as:

@jit(nopython=True, cache=True)
def array_sum_numba(a, array_size, iterations):
    s = np.zeros((array_size, array_size))
    for m in range(iterations):
        for i in range(array_size):
            for j in range(array_size):
                s[i,j] = a[i,j] + 0.5 * s[i,j]
    return s

And out of curiosity, I've also tested @percusse's version with a little modification on the parameter:

def array_sum2(r, array_size, iterations):
    s = np.zeros((array_size, array_size))
    for m in range(iterations):
        s /= 2
        s += r
    return s

The testing results on my machine are:

original version run time: 143.83 ms
numba jitted loop version run time: 26.99 ms
@percusse's version run time: 61.38 ms

This result is within my expectation. It's worthing mentioning that I've increased timeit iterations to 50, which results in some significant time reduction for numba version.

In summary: The Python code can still be significantly accelerated if you use numba's jit and write the function in loops. I don't have Matlab on my machine to test, but my guess is with numba the python version is faster.

percusse · Accepted Answer · 2017-09-06 17:05:30Z

2

Since you are updating the same variable suitable for inplace operations, you can update your function as

def array_sum2(array_size, iterations):
    s = np.zeros((array_size, array_size))
    r = np.random.randn(array_size, array_size)
    for m in range(iterations):
        s /= 2
        s += r
    return s

This has given the following speed benefit on my machine compared to array_sum

run time: 157.32 ms
run time2: 672.43 ms

answered Sep 6, 2017 at 17:05

percusse

3,1261 gold badge16 silver badges29 bronze badges

3 Comments

Lorenz Forvang Over a year ago

thank you, this sped up my code, though not quite as dramatically. Run time on my machine dropped from 292 ms to 259 ms with Numpy (4X MATLAB), and from 236 ms to 185 ms with Numpy+Numba (3X MATLAB).

percusse Over a year ago

@LorenzForvang That sounds strange indeed. Are you using NumPy+MKL ? I usually go for Cython actually.

Lorenz Forvang Over a year ago

yes, numpy.show_config() shows libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'iomp5', 'pthread'] for blas_mkl_info and lapack_mkl_info

hpaulj · Accepted Answer · 2017-09-06 06:44:21Z

0

Times include the randn call as well as the summation:

In [68]: timeit array_sum(array_size, 0)
16.6 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [69]: timeit array_sum(array_size, 1)
18.9 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [70]: timeit array_sum(array_size, 20)
55.5 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [71]: (55-16)/20
Out[71]: 1.95

So it's 16ms for the setup, and 2ms per iteration. Same pattern with 500 iterations.

MATLAB does some JIT compilation. I don't know if that's the case here or not. I don't have MATLAB to test. In Octave (no timeit)

>> t = time(); array_sum(500,0); (time()-t)*1000
ans =  13.704
>> t = time(); array_sum(500,1); (time()-t)*1000
ans =  16.219
>> t = time(); array_sum(500,20); (time()-t)*1000
ans =  82.346
>> t = time(); array_sum(500,500); (time()-t)*1000
ans =  1610.6

Octave's random is faster, but the per iteration sum is slower.

edited Sep 6, 2017 at 6:44

answered Sep 6, 2017 at 6:34

hpaulj

233k14 gold badges260 silver badges392 bronze badges

4 Comments

MB-F Over a year ago

This means 1000ms for 500 iterations and more or less negligible 16ms for the setup. (In Matlab the picture is similar: 5 ms setup time). So most of the difference comes from the iteration which is the question, isn't it?

hpaulj Over a year ago

@kazemakase, Is the MATLAB per iteration time consistent regardless whether you run 1, 20 or 500 iterations in a call?

Lorenz Forvang Over a year ago

@hpaulj with 500 iterations, the 16 ms setup becomes negligible. In any case, I made the random matrix a function parameter so that doesn't impact the test. I got 58 ms in MATLAB, 292 ms in Python, 229 ms in Python + Numba. I do not have Octave set up, but I'll trust that it's slower than MATLAB and Python.

MB-F Over a year ago

@hpaulj I get the following timings in Matlab: 0, 1, 20, 500 iterations -> 4.77, 5.15, 10.13, 133.6 ms. (I guess we see jit effects for larger numbers of iterations)

Collectives™ on Stack Overflow

Comparing Numpy and Matlab array summation speed

3 Answers 3

Comments

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related