Python comparing array to zero faster than np.any(array)

Question

I want to test whether all elements of an array are zero. According to the StackOverflow posts Test if numpy array contains only zeros and https://stackoverflow.com/a/72976775/5269892, compared to (array == 0).all(), not array.any() should be the both most memory-efficient and fastest method.

I tested the performance with a random-number floating array, see below. Somehow though, at least for the given array size, not array.any() and even casting the array to boolean type appear to be slower than (array == 0).all(). How comes?

np.random.seed(100)
a = np.random.rand(10418*144)

%timeit (a == 0)
%timeit (a == 0).all()
%timeit a.astype(bool)
%timeit a.any()
%timeit not a.any()

# 711 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 740 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.69 ms ± 587 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.71 ms ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.71 ms ± 2.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I get different from what you got. ( Python 3.9.13, 1.23.0) 617 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 624 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 254 µs ± 702 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 262 µs ± 655 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 262 µs ± 714 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) — user10289025
– user10289025, Commented Jul 15, 2022 at 8:54
Note that in version 1.23, we improve basic reduction functions including np.all and np.any (see github.com/numpy/numpy/pull/21001) though the effect should be small for np.any and np.all. Updating Numpy might help a bit. — Jérôme Richard
– Jérôme Richard, Commented Jul 15, 2022 at 10:12
@Murali This is surprising. Are you running on Windows? AFAIK the Windows build often behave differently (and surprisingly). What is your processor architecture? — Jérôme Richard
– Jérôme Richard, Commented Jul 15, 2022 at 10:13
@JérômeRichard I am using Mac Os(v 12.4), arm architecture (M1 silicon). — user10289025
– user10289025, Commented Jul 15, 2022 at 11:23
An off topic hint: if you know that your array is positive (probably not your case) a.sum()==0 is faster. — Salvatore Daniele Bianco
– Salvatore Daniele Bianco, Commented Jul 15, 2022 at 14:35

Jérôme Richard · Accepted Answer · 2022-07-15 21:05:37Z

The problem is due to the first two operations being vectorized using SIMD instructions while the three last are not. More specifically, the three last calls do an implicit conversion to bool (_aligned_contig_cast_double_to_bool) which is not yet vectorized. This is a known issue and I have already proposed a pull request for this (which revealed some unexpected issues due to undefined behaviors now fixed). If everything is fine, it should be available in the next major release of Numpy.

Note that a.any() and not a.any() implicitly perform a cast to an array of boolean so to then perform the any operation faster. This is not very efficient, but this is done that way so to reduce the number of generated function variants (Numpy is written in C and so a different implementation has to be generated for each type and optimizing many variants is hard so we prefer so perform implicit casts here, not to mention that this also reduce the size of the generated binaries). If this is not enough, not you can use Cython so to generate a faster specific optimized code.

Collectives™ on Stack Overflow

Python comparing array to zero faster than np.any(array)

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related