3

I have to work with the learning history of a Keras model. This is a basic task, but I've measured the performance of the Python built-in min() function, the numpy.min() function, and the numpy ndarray.min() function for list and ndarray. The performance of the built-in Python min() function is nothing compared to that of Numpy for ndarray - numpy is 10 times faster (for list numpy is almost 6 times slower, but this is not the case of this question). However, the ndarray.min() method is almost twice as fast as numpy.min(). The ndarray.min() documentation refers to the numpy.amin() documentation, which according to the numpy.amin docs, is an alias for numpy.min(). Therefore, I assumed that numpy.min() and ndarray.min() would have the same performance. However, why is the performance of these functions not equal?

from timeit import default_timer
import random
a = random.sample(range(1,1000000), 10000)
b = np.array(random.sample(range(1,1000000), 10000))

def time_mgr(func):
    tms = []
    for i in range(3, 6):
        tm = default_timer()
        for j in range(10**i):
            func()
        tm = (default_timer()-tm) / 10**i * 10e6
        tms.append(tm)
    print(func.__name__, tms)

@time_mgr
def min_list():
    min(a)

@time_mgr
def np_min_list():
    np.min(a)

@time_mgr
def min_nd():
    min(b)

@time_mgr
def np_min_nd():
    np.min(b)

@time_mgr
def np_nd_min():
    b.min()

output, time in mks:

min_list [520.7690014503896, 515.3326001018286, 516.221239999868]
np_min_list [2977.614998817444, 3009.602500125766, 3014.1312699997798]
min_nd [2270.1649996452034, 2195.6873999442905, 2155.1631700014696]
np_min_nd [22.295000962913033, 21.675399970263243, 22.30485000181943]
np_nd_min [14.261999167501926, 12.929399963468313, 12.935079983435571]
0

2 Answers 2

2

Basically your observations are correct.

Here's my timings and notes

Create 2 arrays, one much larger, and a list:

In [254]: a = np.random.randint(0,1000,1000); b = a.tolist()
In [255]: aa = np.random.randint(0,1000,100000)

The method is faster, by about 7µs in both cases - that's basially the overhead of the function delegating the job to the method:

In [256]: timeit a.min()
7.15 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [257]: timeit np.min(a)
14.7 µs ± 204 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [258]: timeit aa.min()
49.4 µs ± 174 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [259]: timeit np.min(aa)
57.4 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The extra time for calling np.min on the list is the time required to convert the list to an array:

In [260]: timeit np.min(b)
142 µs ± 446 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [261]: timeit np.array(b)
120 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The native Python function does reasonably well with a list:

In [262]: timeit min(b)
40.7 µs ± 92 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

It is slower when applied to the array. The extra time is basically the time it takes to iterate through array, treating it as a list:

In [263]: timeit min(a)
127 µs ± 675 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [264]: timeit min(list(a))
146 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

tolist is a faster way to create a list from an array:

In [265]: timeit min(a.tolist())
77.1 µs ± 82 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In general, when there's a numpy function with the same name as a method, it is doing two things:

  • converting the argument(s) to array, if necessary
  • delegating the actual calculation to the method.

Converting a list to an array takes time. Whether that extra time is worth it depends on the following task.

Conversely, treating an array as a list, is usually slower.

In [270]: timeit [i for i in b]
50 µs ± 203 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [271]: timeit [i for i in a]
126 µs ± 278 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The actual item created in an iteration is different:

In [275]: type(b[0]), type(a[0])
Out[275]: (int, numpy.int32)

b[0] is the same object that is in b. That is, b contains references to int objects. a[0] is a new np.int32 object each time it's called, with a new id. That 'unboxing' takes time.

In sum, if you already have an array, it's fastest to use the method. But if for clarity or generality, using the function instead is no big deal, especially if the array is large. Treating an array as a list is usually slower. If you are starting with list, using the native python function is usually the best - if available.

Sign up to request clarification or add additional context in comments.

Comments

2

Lists are horribly inefficient as well as objects : they store FP number as objects and lists contains pointers to these objects. It prevent all optimizations and cause a very bad memory access pattern. Not to mention objects are reference counted and protected by the Global Interpreter Lock (GIL). Any code involving CPython objects in a hot loop is doomed to be very slow, especially when this loop is interpreted. Such a thing happens for min_list and min_nd.

Also Numpy cannot operate on lists directly so it converts them to an array. This conversion is generally much slower than any basic Numpy array computation (which is done again and again in np.min(a), hence the bad performance).

As for np.min vs b.min, they are both equivalent, though the first can be a bit slower on small array because of the function fetch overhead of CPython and a different path taken in Numpy. Numpy is optimized for large array. On small arrays you will see many overheads which are hard to understand without delving into the code of the Numpy implementation (since such overheads are clearly implementation dependent).

On my machine (i5-9600KF CPU with CPython 3.8.1 and Numpy 1.24.4), the former takes 4.5 µs and the later 2.5 µs so a very small time. This is typically the overhead of Numpy (i.e 1-10 µs per call). With a 100x times bigger array, I get 80 µs versus 83 µs (+/- 2 µs) so the two are statistically not different. It also shows most of the time spent in your small array benchmark is just pure overhead (~60% to ~80%).

If you want to reduce these overheads, then you should use tools like Cython/Numba capable of compiling (specialized) Python codes, or just not use Python in the first place but a native language like C/C++/Rust.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.