I have a DataFrame that resembles:
x y z
--------------
0 A 10
0 D 13
1 X 20
...
and I have two sorted arrays for every possible value for x and y:
x_values = [0, 1, ...]
y_values = ['a', ..., 'A', ..., 'D', ..., 'X', ...]
so I wrote a function:
def lookup(record, lookup_list, lookup_attr):
return np.searchsorted(lookup_list, getattr(record, lookup_attr))
and then call:
df_x_indicies = df.apply(lambda r: lookup(r, x_values, 'x')
df_y_indicies = df.apply(lambda r: lookup(r, y_values, 'y')
# df_x_indicies: [0, 0, 1, ...]
# df_y_indicies: [26, ...]
but is there are more performant way to do this? and possibly multiple columns at once to get a returned DataFrame rather than a series?
I tried:
np.where(np.in1d(x_values, df.x))[0]
but this removes duplicate values and that is not desired.