2

In the code below, I can easily reduce the array ['a','b','a','c','b','b','c','a'] to a binary array [0 1 0 1 1 1 1 0] so that 'a' -> 0 and 'b','c' -> 1. How do I transform it to a ternary array so that 'a' -> 0, 'b' -> 1, 'c' -> 2, without using for and if-else? Thanks.

import numpy as np
x = np.array(['a', 'b', 'a', 'c', 'b', 'b', 'c', 'a'])
y = np.where(x=='a', 0, 1)
print(y)

2 Answers 2

1

By doing:

np.where(x == 'a', 0, (np.where(x == 'b', 1, 2)))

note that this changes all the characters that are neither 'a' or 'b' to 2. I've assumed that you have only an array with a,b and c.

Sign up to request clarification or add additional context in comments.

1 Comment

You should be using np.select for more than a binary outcome.
1

A more scalable version is using dictionary of conversion:

my_dict = {'a':0, 'b':1, 'c':2}
x = np.vectorize(my_dict.get)(x)

output:

[0 1 0 2 1 1 2 0]

Another approach is:

np.select([x==i for i in ['a','b','c']], np.arange(3))

For small dictionary @ypno's answer is going to be faster. For larger dictionary, use this answer.


Time Comparison:

Ternary alphabet:

lst = ['a','b','c']
my_dict = {k: v for v, k in enumerate(lst)}

#@Ehsan's solution1
def m1(x):
  return np.vectorize(my_dict.get)(x)

#@ypno's solution
def m2(x):
  return np.where(x == 'a', 0, (np.where(x == 'b', 1, 2)))

#@SteBog's solution
def m3(x):
  y = np.where(x=='a', 0, x)
  y = np.where(x=='b', 1, y)
  y = np.where(x=='c', 2, y)
  return y.astype(np.integer)

#@Ehsan's solution 2 (also suggested by user3483203 in comments)
def m4(x):
   return np.select([x==i for i in lst], np.arange(len(lst)))

#@juanpa.arrivillaga's solution suggested in comments
def m5(x):
  return np.array([my_dict[i] for i in x.tolist()])

in_ = [np.random.choice(lst, size = n) for n in [10,100,1000,10000,100000]]

enter image description here

Same analysis for 8 letter alphabet:

lst = ['a','b','c','d','e','f','g','h']

enter image description here

6 Comments

why use np.vectorize? it really should be avoided. Almost certainly, np.array([my_dict[x] for x in array.tolist()]) will be faster.
@juanpa.arrivillaga Thank you for your suggestion. Could you please provide more information as to why it really should be avoided. Almost certainly? I am not aware of the implementation details, but if you think it is a bad implementation in Numpy, maybe you can raise it to the Numpy developers. However, one personal answer would be readability which is subjective. Another, is that np.vectorize seems to be faster for larger arrays than a list comprehension. I will add time comparison for better judgement. Thank you.
Straight from the documentation: "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop."
@juanpa.arrivillaga Interesting. Thank you for the response. Please checkout the time analysis I updated. Seems like vectorize is slightly faster for larger arrays. Unless, I have missed something, there seems to be some sort of optimization difference in vectorize and simple loop.
likely, it's the overhead of converting to list. If I were actually doing this, I wouldn't be using a numpy.ndarray to begin with. Try it without .tolist(), and then try it with with a regular list, and I bet you will see, it is faster with the list comprehension.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.