15

Given a numpy array, how can I figure it out if it contains only 0 and 1 quickly? Is there any implemented method?

7 Answers 7

15

Few approaches -

((a==0) | (a==1)).all()
~((a!=0) & (a!=1)).any()
np.count_nonzero((a!=0) & (a!=1))==0
a.size == np.count_nonzero((a==0) | (a==1))

Runtime test -

In [313]: a = np.random.randint(0,2,(3000,3000)) # Only 0s and 1s

In [314]: %timeit ((a==0) | (a==1)).all()
     ...: %timeit ~((a!=0) & (a!=1)).any()
     ...: %timeit np.count_nonzero((a!=0) & (a!=1))==0
     ...: %timeit a.size == np.count_nonzero((a==0) | (a==1))
     ...: 
10 loops, best of 3: 28.8 ms per loop
10 loops, best of 3: 29.3 ms per loop
10 loops, best of 3: 28.9 ms per loop
10 loops, best of 3: 28.8 ms per loop

In [315]: a = np.random.randint(0,3,(3000,3000)) # Contains 2 as well

In [316]: %timeit ((a==0) | (a==1)).all()
     ...: %timeit ~((a!=0) & (a!=1)).any()
     ...: %timeit np.count_nonzero((a!=0) & (a!=1))==0
     ...: %timeit a.size == np.count_nonzero((a==0) | (a==1))
     ...: 
10 loops, best of 3: 28 ms per loop
10 loops, best of 3: 27.5 ms per loop
10 loops, best of 3: 29.1 ms per loop
10 loops, best of 3: 28.9 ms per loop

Their runtimes seem to be comparable.

Sign up to request clarification or add additional context in comments.

1 Comment

This also works with scipy sparse matrices, just use a.data in place of a
13

It looks you can achieve it with something like:

np.array_equal(a, a.astype(bool))

If your array is large, it should avoid copying too many arrays (as in some other answers). Thus, it should probably be slightly faster than other answers (not tested however).

3 Comments

Watch out: an array with all zeros also returns True in the above condition
If I'm not mistaken, astype(bool) would still return True, when array has positive values greater than 1. So, this wouldn't give the correct answer.
@MehmetHakanKurtoğlu This is exactly the idea behind the expression: 2 will be replaced with True but the comparison will fail after that!
4

With only a single loop over the data:

0 <= np.bitwise_or.reduce(ar) <= 1

Note that this doesn't work for floating point dtype.

If the values are guaranteed non-negative you can get short-circuiting behavior:

try:
    np.empty((2,), bool)[ar]
    is_binary = True
except IndexError:
    is_binary = False

This method (always) allocates a temp array of the same shape as the argument and seems to loop over the data slower than the first method.

Comments

3

If you have access to Numba (or alternatively cython), you can write something like the following, which will be significantly faster for catching non-binary arrays since it will short circuit the calculation/stop immediately instead of continuing with all of the elements:

import numpy as np
import numba as nb

@nb.njit
def check_binary(x):
    is_binary = True
    for v in np.nditer(x):
        if v.item() != 0 and v.item() != 1:
            is_binary = False
            break

    return is_binary

Running this in pure python without the aid of an accelerator like Numba or Cython makes this approach prohibitively slow.

Timings:

a = np.random.randint(0,2,(3000,3000)) # Only 0s and 1s

%timeit ((a==0) | (a==1)).all()
# 100 loops, best of 3: 15.1 ms per loop

%timeit check_binary(a)
# 100 loops, best of 3: 11.6 ms per loop

a = np.random.randint(0,3,(3000,3000)) # Contains 2 as well

%timeit ((a==0) | (a==1)).all()
# 100 loops, best of 3: 14.9 ms per loop

%timeit check_binary(a)
# 1000000 loops, best of 3: 543 ns per loop

1 Comment

Ah lovely thought to check for first occurrence of an invalid one!
3

We could use np.isin().

input_array = input_array.squeeze(-1)
is_binary   = np.isin(input_array, [0,1]).all()

1st line:
squeeze to unroll the input array, as we don't want to deal with the complication of np.isin() with a multi-dimension array.

2nd line:
np.isin() checks whether all elements of input belong to 0 or 1.
np.isin() returns a list of [True, False, True, True..].
Then all() to ensure that list contain all True.

Comments

2

How about numpy unique?

np.unique(arr)

Should return [0,1] if binary.

Comments

2

The following should work:

ans = set(arr).issubset([0,1])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.