Using numpy. binary_repr on array of numbers or alternatives - Python

Question

using the following code i am trying to convert a list of numbers into binary number but getting an error

import numpy as np

lis=np.array([1,2,3,4,5,6,7,8,9])
a=np.binary_repr(lis,width=32)

the error after running the program is

Traceback (most recent call last):

File "", line 4, in a=np.binary_repr(lis,width=32)

File "C:\Users.......", in binary_repr if num == 0:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

any way to fix this?

np.binary_repr works on a single value, not on an array-like object. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Sep 7, 2019 at 19:45
i know that .but is there any way to make it work in an array ..? — Zewo
– Zewo, Commented Sep 7, 2019 at 19:46
or is there any code in python whereby i can convert my whole array or list into binary 32 bit? — Zewo
– Zewo, Commented Sep 7, 2019 at 19:48

taras · Accepted Answer · 2019-09-07 19:51:15Z

4

You can use np.vectorize to overcome this issue.

>>> lis=np.array([1,2,3,4,5,6,7,8,9])
>>> a=np.binary_repr(lis,width=32)
>>> binary_repr_vec = np.vectorize(np.binary_repr)
>>> binary_repr_vec(lis, width=32)
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
       '00000000000000000000000000000100',
       '00000000000000000000000000000101',
       '00000000000000000000000000000110',
       '00000000000000000000000000000111',
       '00000000000000000000000000001000',
       '00000000000000000000000000001001'], dtype='<U32')

answered Sep 7, 2019 at 19:51

taras

6,93510 gold badges46 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

willeM_ Van Onsem · Accepted Answer · 2019-09-07 19:51:38Z

As the documentation on binary_repr says:

num : int

Only an integer decimal number can be used.

You can however vectorize this operation, like:

np.vectorize(np.binary_repr)(lis, 32)

this then gives us:

>>> np.vectorize(np.binary_repr)(lis, 32)
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
       '00000000000000000000000000000100',
       '00000000000000000000000000000101',
       '00000000000000000000000000000110',
       '00000000000000000000000000000111',
       '00000000000000000000000000001000',
       '00000000000000000000000000001001'], dtype='<U32')

or if you need this often, you can store the vectorized variant in a variable:

binary_repr_vector = np.vectorize(np.binary_repr)
binary_repr_vector(lis, 32)

Which of course gives the same result:

>>> binary_repr_vector = np.vectorize(np.binary_repr)
>>> binary_repr_vector(lis, 32)
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
       '00000000000000000000000000000100',
       '00000000000000000000000000000101',
       '00000000000000000000000000000110',
       '00000000000000000000000000000111',
       '00000000000000000000000000001000',
       '00000000000000000000000000001001'], dtype='<U32')

Divakar · Accepted Answer · 2019-09-08 07:36:10Z

Approach #1

Here's a vectorized one for an array of numbers, upon leveraging broadcasting -

def binary_repr_ar(A, W):
    p = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0)).view('u1')
    return p.astype('S1').view('S'+str(W)).ravel()

Sample run -

In [67]: A
Out[67]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [68]: binary_repr_ar(A,32)
Out[68]: 
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
       '00000000000000000000000000000100',
       '00000000000000000000000000000101',
       '00000000000000000000000000000110',
       '00000000000000000000000000000111',
       '00000000000000000000000000001000',
       '00000000000000000000000000001001'], dtype='|S32')

Approach #2

Another vectorized one with array-assignment -

def binary_repr_ar_v2(A, W):
    mask = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0))
    out = np.full((len(A),W),48, dtype=np.uint8)
    out[mask] = 49
    return out.view('S'+str(W)).ravel()

Alternatively, use the mask directly to get the string array -

def binary_repr_ar_v3(A, W):
    mask = (((A[:,None] & (1 << np.arange(W-1,-1,-1)))!=0))
    return (mask+np.array([48],dtype=np.uint8)).view('S'+str(W)).ravel()

Note that the final output would be a view into one of the intermediate outputs. So, if you need it to have it own memory space, simply append with .copy().

Timings on a large sized input array -

In [49]: np.random.seed(0)
    ...: A = np.random.randint(1,1000,(100000))
    ...: W = 32

In [50]: %timeit binary_repr_ar(A, W)
    ...: %timeit binary_repr_ar_v2(A, W)
    ...: %timeit binary_repr_ar_v3(A, W)
1 loop, best of 3: 854 ms per loop
100 loops, best of 3: 14.5 ms per loop
100 loops, best of 3: 7.33 ms per loop

From other posted solutions -

In [22]: %timeit [np.binary_repr(i, width=32) for i in A]
10 loops, best of 3: 97.2 ms per loop

In [23]: %timeit np.frompyfunc(np.binary_repr,2,1)(A,32).astype('U32')
10 loops, best of 3: 80 ms per loop

In [24]: %timeit np.vectorize(np.binary_repr)(A, 32)
10 loops, best of 3: 69.8 ms per loop

On @Paul Panzer's solutions -

In [5]: %timeit bin_rep(A,32)
548 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit bin_rep(A,31)
2.2 ms ± 5.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Paul Panzer · Accepted Answer · 2019-09-07 22:10:53Z

2

Here is a fast method using np.unpackbits

(np.unpackbits(lis.astype('>u4').view(np.uint8))+ord('0')).view('S32')
# array([b'00000000000000000000000000000001',
#        b'00000000000000000000000000000010',
#        b'00000000000000000000000000000011',
#        b'00000000000000000000000000000100',
#        b'00000000000000000000000000000101',
#        b'00000000000000000000000000000110',
#        b'00000000000000000000000000000111',
#        b'00000000000000000000000000001000',
#        b'00000000000000000000000000001001'], dtype='|S32')

More general:

def bin_rep(A,n):
    if n in (8,16,32,64):
        return (np.unpackbits(A.astype(f'>u{n>>3}').view(np.uint8))+ord('0')).view(f'S{n}')
    nb = max((n-1).bit_length()-3,0)
    return (np.unpackbits(A.astype(f'>u{1<<nb}')[...,None].view(np.uint8),axis=1)[...,-n:]+ord('0')).ravel().view(f'S{n}')

Note: special casing n = 8,16,32,64 is absolutely worth it since it gives a severalfold speedup for these numbers.

Also note that this method maxes out at 2^64, larger ints require a different approach.

edited Sep 7, 2019 at 22:10

answered Sep 7, 2019 at 21:32

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

4 Comments

Divakar Over a year ago

Is it generalizable to generic window lengths? Are there restrictions to the extents of the number values?

Paul Panzer Over a year ago

@Divakar Well, it is with some work ;-) And obviously beyond uint64 it is getting hairy.

Divakar Over a year ago

Looks promising and obviously a smart one keeping those aside.

Paul Panzer Over a year ago

@Divakar I added a general version. Didn't try to handle ints >= 2^64, though. It is quite a bit slower for non-aligned sizes but still pretty fast.

hpaulj · Accepted Answer · 2019-09-07 19:58:10Z

1

In [193]: alist = [1,2,3,4,5,6,7,8,9]

np.vectorize is convenient, but not fast:

In [194]: np.vectorize(np.binary_repr)(alist, 32)                                                            
Out[194]: 
array(['00000000000000000000000000000001',
       '00000000000000000000000000000010',
       '00000000000000000000000000000011',
        ....
       '00000000000000000000000000001001'], dtype='<U32')
In [195]: timeit np.vectorize(np.binary_repr)(alist, 32)                                                     
71.8 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

plain old list comprehension is better:

In [196]: [np.binary_repr(i, width=32) for i in alist]                                                       
Out[196]: 
['00000000000000000000000000000001',
 '00000000000000000000000000000010',
 '00000000000000000000000000000011',
...
 '00000000000000000000000000001001']
In [197]: timeit [np.binary_repr(i, width=32) for i in alist]                                                
11.5 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

another iterator:

In [200]: timeit np.frompyfunc(np.binary_repr,2,1)(alist,32).astype('U32')                                   
30.1 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

answered Sep 7, 2019 at 19:58

hpaulj

233k14 gold badges260 silver badges392 bronze badges

1 Comment

willeM_ Van Onsem Over a year ago

For small arrays that is indeed the case, for larger arrays, like lis = np.arange(1000), one sees that the list comprehension usually takes two times as much time. The frompyfunc, takes approximately 10% more time. This is not that strange, since numpy is definitely not a good idea for small amounts of data, it only pays off if you process data "in bulk".

Collectives™ on Stack Overflow

Using numpy. binary_repr on array of numbers or alternatives - Python

5 Answers 5

Comments

Comments

1 Comment

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

1 Comment

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related