3

I have some binary string s like 001010. I want to convert it to numpy array a where a[i] = np.array([[1], [0]]) if s[i] == '0' and to np.array([[0], [1]]) otherwise.

So I wrote such code:

a = np.empty([len(s), 2, 1])
for i, char in enumerate(s):
    if char == '0':
        a[i] = np.array([[1], [0]])
    elif char == '1':
        a[i] = np.array([[0], [1]])

Can it be rewritten to a vectorized form without for-loop in a more numpy way?

My expected output looks like:

array([[[1.],
        [0.]],

       [[1.],
        [0.]],

       [[0.],
        [1.]],

       [[1.],
        [0.]],

       [[0.],
        [1.]],

       [[1.],
        [0.]]])

2 Answers 2

5

Approach #1 : Here's one with NumPy char array -

sa = np.frombuffer(s,dtype='S1')
out = np.where(sa[:,None,None]=='0',[[1],[0]],[[0],[1]])

Approach #2 : One more as one-liner -

((np.frombuffer(s,dtype=np.uint8)[:,None]==[48,49])[...,None]).astype(float)

Approach #3 : Final one focused entirely on performance -

a = np.zeros([len(s), 2, 1])
idx = np.frombuffer(s,dtype=np.uint8)-48
a[np.arange(len(idx)),idx] = 1

Timings on a string of 100000 chars -

In [2]: np.random.seed(0)

In [3]: s = ''.join(map(str,np.random.randint(0,2,(100000)).tolist()))

# @yatu's soln
In [4]: %%timeit
     ...: a = np.array(list(s), dtype=int)
     ...: np.where(a==0, np.array([[1], [0]]), np.array([[0], [1]])).T[:,:,None]
10 loops, best of 3: 36.3 ms per loop

# App#1 from this post    
In [5]: %%timeit
     ...: sa = np.frombuffer(s,dtype='S1')
     ...: out = np.where(sa[:,None,None]=='0',[[1],[0]],[[0],[1]])
100 loops, best of 3: 3.56 ms per loop

# App#2 from this post    
In [6]: %timeit ((np.frombuffer(s,dtype=np.uint8)[:,None]==[48,49])[...,None]).astype(float)
1000 loops, best of 3: 1.81 ms per loop

# App#3 from this post    
In [7]: %%timeit
     ...: a = np.zeros([len(s), 2, 1])
     ...: idx = np.frombuffer(s,dtype=np.uint8)-48
     ...: a[np.arange(len(idx)),idx] = 1
1000 loops, best of 3: 1.81 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

3

A simple way to do so is by creating a list from the string, and then turn this list to a np.array of integers by specifying dtype=int:

s = '001010'

a = np.array(list(s), dtype=int)
# array([0, 0, 1, 0, 1, 0])

And then use np.where in order to select among np.array([[1], [0]]) or np.array([[0], [1]]) according to the values in a:

np.where(a==0, np.array([[1], [0]]), np.array([[0], [1]])).T[:,:,None]
array([[[1],
        [0]],

       [[1],
        [0]],

       [[0],
        [1]],

       [[1],
        [0]],

       [[0],
        [1]],

       [[1],
        [0]]])

5 Comments

Actually OP wanted an array of arrays, but still +1 because I think, this is what he actually meant
Hmm not anymore after seing OPs update. Willl have to change
sorry, could you look to updated version of the question?
Updated the answer @RomaKarageorgievich. Let me know if this is what you want. Otherwise please share expected output
@RomaKarageorgievich updated to match expected output

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.