3

I want to test whether all elements of an array are zero. According to the StackOverflow posts Test if numpy array contains only zeros and https://stackoverflow.com/a/72976775/5269892, compared to (array == 0).all(), not array.any() should be the both most memory-efficient and fastest method.

I tested the performance with a random-number floating array, see below. Somehow though, at least for the given array size, not array.any() and even casting the array to boolean type appear to be slower than (array == 0).all(). How comes?

np.random.seed(100)
a = np.random.rand(10418*144)

%timeit (a == 0)
%timeit (a == 0).all()
%timeit a.astype(bool)
%timeit a.any()
%timeit not a.any()

# 711 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 740 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.69 ms ± 587 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.71 ms ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 1.71 ms ± 2.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
7
  • 1
    I get different from what you got. ( Python 3.9.13, 1.23.0) 617 µs ± 270 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 624 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 254 µs ± 702 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 262 µs ± 655 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) 262 µs ± 714 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each) Commented Jul 15, 2022 at 8:54
  • 1
    Note that in version 1.23, we improve basic reduction functions including np.all and np.any (see github.com/numpy/numpy/pull/21001) though the effect should be small for np.any and np.all. Updating Numpy might help a bit. Commented Jul 15, 2022 at 10:12
  • 1
    @Murali This is surprising. Are you running on Windows? AFAIK the Windows build often behave differently (and surprisingly). What is your processor architecture? Commented Jul 15, 2022 at 10:13
  • 2
    @JérômeRichard I am using Mac Os(v 12.4), arm architecture (M1 silicon). Commented Jul 15, 2022 at 11:23
  • 1
    An off topic hint: if you know that your array is positive (probably not your case) a.sum()==0 is faster. Commented Jul 15, 2022 at 14:35

1 Answer 1

3

The problem is due to the first two operations being vectorized using SIMD instructions while the three last are not. More specifically, the three last calls do an implicit conversion to bool (_aligned_contig_cast_double_to_bool) which is not yet vectorized. This is a known issue and I have already proposed a pull request for this (which revealed some unexpected issues due to undefined behaviors now fixed). If everything is fine, it should be available in the next major release of Numpy.

Note that a.any() and not a.any() implicitly perform a cast to an array of boolean so to then perform the any operation faster. This is not very efficient, but this is done that way so to reduce the number of generated function variants (Numpy is written in C and so a different implementation has to be generated for each type and optimizing many variants is hard so we prefer so perform implicit casts here, not to mention that this also reduce the size of the generated binaries). If this is not enough, not you can use Cython so to generate a faster specific optimized code.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.