Python FOR loop vs. builtin reduce() and mul() functions?

Question

Here I have two functions:

from functools import reduce
from operator import mul

def fn1(arr):
    product = 1
    for number in arr:
        product *= number
    return [product//x for x in arr]

def fn2(arr):
   product = reduce(mul, arr)
   return [product//x for x in arr]

Benchmark:

In [2]: arr = list(range(1,11))

In [3]: %timeit fn1(arr)
1.62 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit fn2(arr)
1.88 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: arr = list(range(1,101))

In [6]: %timeit fn1(arr)
38.5 µs ± 190 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit fn2(arr)
41 µs ± 463 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: arr = list(range(1,1001))

In [9]: %timeit fn1(arr)
4.23 ms ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit fn2(arr)
4.24 ms ± 36.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [11]: arr = list(range(1,10001))

In [12]: %timeit fn1(arr)
605 ms ± 4.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %timeit fn2(arr)
594 ms ± 4.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Here fn2() is marginally slower with the small lists. My understanding was that reduce() and mul() functions are both builtin functions, therefore they run at C speed and should be faster than the for loop. Probably because I have much more function calls (which also take some time) inside the fn2, it contributes to the end performance? But then the trend shows that fn2() outperforms fn1() with the larger lists. Why?

Those are all very close time measures, I don't think there really is a significant difference between one or another method. reduce is there just in case you prefer programming in a more functional style, it's not supposed to be an optimization. Also, you have a list comprehension in the functions that is probably taking most of the time anyway, especially considering you are dividing numbers with literally thousands of digits (10,000! takes over 100,000 bits to store). — javidcf
– javidcf, Commented May 3, 2019 at 11:26
Speculating that these tiny differences could be caused by the extra calls - a for-loop and arithmetic ops might just be a bit faster than the overhead of calling mul. (It seems that the standard CPython interpreter doesn't do inlining. stackoverflow.com/questions/6442050/…, note how PyPy gets a special mention for its inlining support). You shouldn't worry too much about this; if performance on that scale matters, Python is probably the wrong tool to begin with. (Though you could try PyPy.) — Christoph Burschka
– Christoph Burschka, Commented May 3, 2019 at 11:52
Yes, but why the trend is what it is? If for loop + arithmetic ops (fn1) take less time to compute than the reduce() + mul() (fn2), then it should remain true regardless the size of the list, right? Furthermore with more items in the list computational distance between fn1 and fn2 should increase, as with more function calls it would require even more time. But I'm observing the opposite, it seems that with more items in the list fn2 magically starts performing better than fn1. — NarūnasK
– NarūnasK, Commented May 3, 2019 at 14:14

varela · Accepted Answer · 2019-05-03 11:34:07Z

0

There could be a lot of reasons for this. First is CPU code execution predictions and compiler optimizations. As for me it shouldn't matter much for you which form of the code is faster if you choose proper algorithm. You need to use one that is suitable for your needs and looks better and leave performance to python compiler. Often faster options cause more problems with memory/readability/support. Also it's not guaranteed that little more complex code inside simple cycles wouldn't change performance, just because some optimizations could be applied.

-- Update --

If you want to increase performance of python on simple operations I would advice you to look at PyPy, Cython, nim which are made to gain performance of C when python type wrappers take too much.

edited May 3, 2019 at 11:34

answered May 3, 2019 at 11:28

varela

1,3311 gold badge10 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sam Mason · Accepted Answer · 2019-05-03 15:07:29Z

these are always going to be very close, but for somewhat interesting reasons:

if the product is small (e.g. for the range(1,10)) then the functions are doing very little "useful" work, and everything is going into marshalling machine ints between Python objects so the functions can be invoked
if the product is large (e.g. the range(1,10001) case) the numbers become enormous (i.e. thousands of decimal digits) and the majority of time is spent multiplying very large numbers

for example, with Python 3.7.3 (in Linux 5.0.10):

from functools import reduce
from operator import mul

prod = reduce(mul, range(1, 10001))

prod has ~36k digits and consumes ~16KiB --- i.e. check with math.log10(prod) and sys.getsizeof(prod).

working with small (i.e. not bignum) products such as:

reduce(mul, [1] * 10001)

is ~50 times faster on my computer than when we need to use bignums as above. also notice that it's basically the same speed when using floating point numbers as compared to integers, e.g.

reduce(mul, [1.] * 10001)

only takes ~10% more time.

your additional code that makes an additional pass over the array seems to just be complicating issues hence I've ignored it --- microbenchmarks like these are awkward enough to get right!

Collectives™ on Stack Overflow

Python FOR loop vs. builtin reduce() and mul() functions?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related