3

I have created a small function that takes as input an integer, length, and returns a numpy array of the binary representation of all 2**length integer numbers in the range [0:2**length-1].

import numpy as np

def get_bitstrings(length):
  # We need to binary-fy 2^length numbers.
  iterations = 2**length
  # Pre-allocate memory.
  the_array = np.zeros((iterations, length))
  # Go through all decimals in the range [0:iterations-1]
  for num in range(iterations):
    # Get binary representation in string format with 'length' zeroes padded
    t_string = '{f_num:0{f_width}b}'.format(f_num=num, f_width=length)
    # Convert to a Python list
    t_list   = list(t_string)
    # Convert to Numpy array and store.
    the_array[num,:] = np.array(t_list)

  return the_array

if __name__ == '__main__':
  var1 = get_bitstrings(2)
  var2 = get_bitstrings(3)
  print('var1:\n{}\n'.format(var1))
  print('var2:\n{}\n'.format(var2))

which yields:

var1:
[[ 0.  0.]
 [ 0.  1.]
 [ 1.  0.]
 [ 1.  1.]]

var2:
[[ 0.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]
 [ 0.  1.  1.]
 [ 1.  0.  0.]
 [ 1.  0.  1.]
 [ 1.  1.  0.]
 [ 1.  1.  1.]]

The process includes getting a binary representation of each integer number as a string (with 0s padded before it such that the length is constant at length), converting the string into a Python list, and then converting the list into a numpy array.

I found that to be the only way to satisfy the requirement that each bit is an entry in the array – i.e., bitstring 1010 is a 1x4 numpy array and not simply an integer in a 1x1 array. But I am sure there are better alternatives, hence the question.

The problem, as you can imagine, is that this is inefficient. I was wondering whether I can improve this by using Python/Numpy trickery.

Edit: I used to do this in MATLAB with this snippet:

t_length = 5; dc = [0:2^t_length-1]'; bc = rem(floor(dc*pow2(-(t_length-1):0)),2);

But I am a complete noob when it comes to Python/Numpy! Maybe it'll inspire someone. :-)

5
  • Possibly you could work something out with strides but I think even bools are represented as 8 bits in numpy... Commented Feb 20, 2014 at 20:13
  • 2
    np.arange(2 ** k) will give an actual "bitstring" (i.e. np.int) with the required binary value. What do you need this for? Commented Feb 20, 2014 at 20:41
  • @deinonychusaur I need the elements to be integers or floatz, not booleans. Commented Feb 20, 2014 at 20:49
  • @Iarsmans I am not sure what you mean. I need the binary representation of decimal numbers in this specific format because I'm doing linear algebra with bit-strings -- it has to do with evolutionary computation. Commented Feb 20, 2014 at 20:52
  • Have a look at numpy.binary_repr. E.g. [np.binary_repr(item, width=5) for item in range(2**5)]. This will give you strings, but you can convert to an array of ints quite easily. Commented Feb 20, 2014 at 20:55

2 Answers 2

4

You can use NumPy's broadcasting and vectorized operations to do this fairly efficiently:

>>> from numpy import arange, newaxis
>>> powers_of_two = 2**arange(4)[::-1]
>>> (arange(2**4)[:, newaxis] & powers_of_two) / powers_of_two
array([[0, 0, 0, 0],
       [0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 0, 1, 1],
       [0, 1, 0, 0],
       [0, 1, 0, 1],
       [0, 1, 1, 0],
       [0, 1, 1, 1],
       [1, 0, 0, 0],
       [1, 0, 0, 1],
       [1, 0, 1, 0],
       [1, 0, 1, 1],
       [1, 1, 0, 0],
       [1, 1, 0, 1],
       [1, 1, 1, 0],
       [1, 1, 1, 1]])

Brief explanation: we're taking all the integers from 0 to 15 (arange(2**4)), then reshaping that to given an array of shape (16, 1) (that's the [:, newaxis] slice part). Then we take the bitwise-and with powers of two, from highest to lowest (2**arange(4)[::-1]). The reshaping ensures that the bitwise and operation is performed as a sort of 'outer' operation: we take the bitwise and of every element of the original arange with every element of the powers_of_two array. This is NumPy's broadcasting and slicing at work. The absence of an explicit Python-level for loop should make this significantly faster than a solution based on for loops or list comprehensions.

Here's a somewhat sleeker, and as it turns out, faster, alternative along the same lines:

>>> from numpy import arange, newaxis
>>> arange(2**4)[:,newaxis] >> arange(4)[::-1] & 1
array([[0, 0, 0, 0],
       [0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 0, 1, 1],
       [0, 1, 0, 0],
       [0, 1, 0, 1],
       [0, 1, 1, 0],
       [0, 1, 1, 1],
       [1, 0, 0, 0],
       [1, 0, 0, 1],
       [1, 0, 1, 0],
       [1, 0, 1, 1],
       [1, 1, 0, 0],
       [1, 1, 0, 1],
       [1, 1, 1, 0],
       [1, 1, 1, 1]])

As always, if efficiency is a concern then you should make good use of the tools that Python provides in the form of the timeit and profile modules. Timings on my machine with length=16 seem to indicate that the second variant is significantly faster than the first:

taniyama:~ mdickinson$ python -m timeit -s "from numpy import arange, newaxis" "arange(1<<16)[:, newaxis] >> arange(16)[::-1] & 1"
100 loops, best of 3: 4.08 msec per loop
taniyama:~ mdickinson$ python -m timeit -s "from numpy import arange, newaxis" "(arange(1<<16)[:, newaxis] & 2**arange(16)[::-1]) / 2**arange(16)[::-1]"
10 loops, best of 3: 21.6 msec per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Wow! This is incredibly fast and elegant. Thanks a lot, Mark!
1

One way is to use numpy.binary_repr. It will result in a string, but you can easily convert that to an array of ints or floats (just change the dtype argument). For example:

import numpy as np

k = 4
print np.array([list(np.binary_repr(x, k)) for x in range(2**k)], dtype=int)

This yields:

[[0 0 0 0]
 [0 0 0 1]
 [0 0 1 0]
 [0 0 1 1]
 [0 1 0 0]
 [0 1 0 1]
 [0 1 1 0]
 [0 1 1 1]
 [1 0 0 0]
 [1 0 0 1]
 [1 0 1 0]
 [1 0 1 1]
 [1 1 0 0]
 [1 1 0 1]
 [1 1 1 0]
 [1 1 1 1]]

Or, if you wanted a more readable version:

def bitstrings(k):
    binary = [np.binary_repr(item, width=k) for item in range(2**k)]
    return np.array([list(item) for item in binary], dtype=int)

2 Comments

Thanks for the help, Joe, but I would rather avoid a purely iterated solution because that is what I was doing in the first place. Mark's answer comes closer to what I was asking.
Yeah, Mark's answer is quite slick!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.