Reassigning numpy.array()

Question

In the code below, I can easily reduce the array ['a','b','a','c','b','b','c','a'] to a binary array [0 1 0 1 1 1 1 0] so that 'a' -> 0 and 'b','c' -> 1. How do I transform it to a ternary array so that 'a' -> 0, 'b' -> 1, 'c' -> 2, without using for and if-else? Thanks.

import numpy as np
x = np.array(['a', 'b', 'a', 'c', 'b', 'b', 'c', 'a'])
y = np.where(x=='a', 0, 1)
print(y)

Untrue · Accepted Answer · 2020-08-07 22:59:24Z

1

By doing:

np.where(x == 'a', 0, (np.where(x == 'b', 1, 2)))

note that this changes all the characters that are neither 'a' or 'b' to 2. I've assumed that you have only an array with a,b and c.

answered Aug 7, 2020 at 22:59

Untrue

263 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3483203 Over a year ago

You should be using np.select for more than a binary outcome.

Ehsan · Accepted Answer · 2020-08-10 22:37:08Z

1

A more scalable version is using dictionary of conversion:

my_dict = {'a':0, 'b':1, 'c':2}
x = np.vectorize(my_dict.get)(x)

output:

[0 1 0 2 1 1 2 0]

Another approach is:

np.select([x==i for i in ['a','b','c']], np.arange(3))

For small dictionary @ypno's answer is going to be faster. For larger dictionary, use this answer.

Time Comparison:

Ternary alphabet:

lst = ['a','b','c']
my_dict = {k: v for v, k in enumerate(lst)}

#@Ehsan's solution1
def m1(x):
  return np.vectorize(my_dict.get)(x)

#@ypno's solution
def m2(x):
  return np.where(x == 'a', 0, (np.where(x == 'b', 1, 2)))

#@SteBog's solution
def m3(x):
  y = np.where(x=='a', 0, x)
  y = np.where(x=='b', 1, y)
  y = np.where(x=='c', 2, y)
  return y.astype(np.integer)

#@Ehsan's solution 2 (also suggested by user3483203 in comments)
def m4(x):
   return np.select([x==i for i in lst], np.arange(len(lst)))

#@juanpa.arrivillaga's solution suggested in comments
def m5(x):
  return np.array([my_dict[i] for i in x.tolist()])

in_ = [np.random.choice(lst, size = n) for n in [10,100,1000,10000,100000]]

Same analysis for 8 letter alphabet:

lst = ['a','b','c','d','e','f','g','h']

edited Aug 10, 2020 at 22:37

answered Aug 7, 2020 at 23:05

Ehsan

12.5k2 gold badges24 silver badges36 bronze badges

6 Comments

juanpa.arrivillaga Over a year ago

why use np.vectorize? it really should be avoided. Almost certainly, np.array([my_dict[x] for x in array.tolist()]) will be faster.

Ehsan Over a year ago

@juanpa.arrivillaga Thank you for your suggestion. Could you please provide more information as to why it really should be avoided. Almost certainly? I am not aware of the implementation details, but if you think it is a bad implementation in Numpy, maybe you can raise it to the Numpy developers. However, one personal answer would be readability which is subjective. Another, is that np.vectorize seems to be faster for larger arrays than a list comprehension. I will add time comparison for better judgement. Thank you.

juanpa.arrivillaga Over a year ago

Straight from the documentation: "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop."

Ehsan Over a year ago

@juanpa.arrivillaga Interesting. Thank you for the response. Please checkout the time analysis I updated. Seems like vectorize is slightly faster for larger arrays. Unless, I have missed something, there seems to be some sort of optimization difference in vectorize and simple loop.

juanpa.arrivillaga Over a year ago

likely, it's the overhead of converting to list. If I were actually doing this, I wouldn't be using a numpy.ndarray to begin with. Try it without .tolist(), and then try it with with a regular list, and I bet you will see, it is faster with the list comprehension.

|

Collectives™ on Stack Overflow

Reassigning numpy.array()

2 Answers 2

1 Comment

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related