Fast way to check if a numpy array is binary (contains only 0 and 1)

Question

Given a numpy array, how can I figure it out if it contains only 0 and 1 quickly? Is there any implemented method?

Divakar · Accepted Answer · 2016-11-14 19:18:09Z

15

Few approaches -

((a==0) | (a==1)).all()
~((a!=0) & (a!=1)).any()
np.count_nonzero((a!=0) & (a!=1))==0
a.size == np.count_nonzero((a==0) | (a==1))

Runtime test -

In [313]: a = np.random.randint(0,2,(3000,3000)) # Only 0s and 1s

In [314]: %timeit ((a==0) | (a==1)).all()
     ...: %timeit ~((a!=0) & (a!=1)).any()
     ...: %timeit np.count_nonzero((a!=0) & (a!=1))==0
     ...: %timeit a.size == np.count_nonzero((a==0) | (a==1))
     ...: 
10 loops, best of 3: 28.8 ms per loop
10 loops, best of 3: 29.3 ms per loop
10 loops, best of 3: 28.9 ms per loop
10 loops, best of 3: 28.8 ms per loop

In [315]: a = np.random.randint(0,3,(3000,3000)) # Contains 2 as well

In [316]: %timeit ((a==0) | (a==1)).all()
     ...: %timeit ~((a!=0) & (a!=1)).any()
     ...: %timeit np.count_nonzero((a!=0) & (a!=1))==0
     ...: %timeit a.size == np.count_nonzero((a==0) | (a==1))
     ...: 
10 loops, best of 3: 28 ms per loop
10 loops, best of 3: 27.5 ms per loop
10 loops, best of 3: 29.1 ms per loop
10 loops, best of 3: 28.9 ms per loop

Their runtimes seem to be comparable.

edited Nov 14, 2016 at 19:18

answered Nov 14, 2016 at 19:05

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Octavius Over a year ago

This also works with scipy sparse matrices, just use a.data in place of a

Thomas Baruchel · Accepted Answer · 2016-11-14 20:32:32Z

13

It looks you can achieve it with something like:

np.array_equal(a, a.astype(bool))

If your array is large, it should avoid copying too many arrays (as in some other answers). Thus, it should probably be slightly faster than other answers (not tested however).

answered Nov 14, 2016 at 20:32

Thomas Baruchel

7,5353 gold badges29 silver badges49 bronze badges

3 Comments

Tommaso Di Noto Over a year ago

Watch out: an array with all zeros also returns True in the above condition

Mehmet Hakan Kurtoğlu Over a year ago

If I'm not mistaken, astype(bool) would still return True, when array has positive values greater than 1. So, this wouldn't give the correct answer.

Thomas Baruchel Over a year ago

@MehmetHakanKurtoğlu This is exactly the idea behind the expression: 2 will be replaced with True but the comparison will fail after that!

user7138814 · Accepted Answer · 2016-11-15 16:09:55Z

4

With only a single loop over the data:

0 <= np.bitwise_or.reduce(ar) <= 1

Note that this doesn't work for floating point dtype.

If the values are guaranteed non-negative you can get short-circuiting behavior:

try:
    np.empty((2,), bool)[ar]
    is_binary = True
except IndexError:
    is_binary = False

This method (always) allocates a temp array of the same shape as the argument and seems to loop over the data slower than the first method.

edited Nov 15, 2016 at 16:09

answered Nov 15, 2016 at 13:11

user7138814

2,05112 silver badges12 bronze badges

Comments

JoshAdel · Accepted Answer · 2016-11-14 19:36:01Z

3

If you have access to Numba (or alternatively cython), you can write something like the following, which will be significantly faster for catching non-binary arrays since it will short circuit the calculation/stop immediately instead of continuing with all of the elements:

import numpy as np
import numba as nb

@nb.njit
def check_binary(x):
    is_binary = True
    for v in np.nditer(x):
        if v.item() != 0 and v.item() != 1:
            is_binary = False
            break

    return is_binary

Running this in pure python without the aid of an accelerator like Numba or Cython makes this approach prohibitively slow.

Timings:

a = np.random.randint(0,2,(3000,3000)) # Only 0s and 1s

%timeit ((a==0) | (a==1)).all()
# 100 loops, best of 3: 15.1 ms per loop

%timeit check_binary(a)
# 100 loops, best of 3: 11.6 ms per loop

a = np.random.randint(0,3,(3000,3000)) # Contains 2 as well

%timeit ((a==0) | (a==1)).all()
# 100 loops, best of 3: 14.9 ms per loop

%timeit check_binary(a)
# 1000000 loops, best of 3: 543 ns per loop

edited Nov 14, 2016 at 19:36

answered Nov 14, 2016 at 19:31

JoshAdel

69.1k27 gold badges146 silver badges146 bronze badges

1 Comment

Divakar Over a year ago

Ah lovely thought to check for first occurrence of an invalid one!

Mai Hai · Accepted Answer · 2021-06-09 08:01:21Z

3

We could use np.isin().

input_array = input_array.squeeze(-1)
is_binary   = np.isin(input_array, [0,1]).all()

1st line:
squeeze to unroll the input array, as we don't want to deal with the complication of np.isin() with a multi-dimension array.

2nd line:
np.isin() checks whether all elements of input belong to 0 or 1.
np.isin() returns a list of [True, False, True, True..].
Then all() to ensure that list contain all True.

answered Jun 9, 2021 at 8:01

Mai Hai

1,39913 silver badges12 bronze badges

Comments

ahmedhosny · Accepted Answer · 2020-07-03 00:52:10Z

2

How about numpy unique?

np.unique(arr)

Should return [0,1] if binary.

answered Jul 3, 2020 at 0:52

ahmedhosny

1,21715 silver badges26 bronze badges

Comments

Bram Schijvenaars · Accepted Answer · 2021-07-25 21:46:21Z

2

The following should work:

ans = set(arr).issubset([0,1])

edited Jul 25, 2021 at 21:46

answered Sep 10, 2020 at 8:42

Bram Schijvenaars

212 bronze badges

Collectives™ on Stack Overflow

Fast way to check if a numpy array is binary (contains only 0 and 1)

7 Answers 7

1 Comment

3 Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

1 Comment

3 Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related