Iterating through numpy array for use in dictionary

Question

I have a project where I'm trying to update a dataframe to a new set of changes being rolled out. There are currently 15,000 data samples in the dataframe, so runtime can become an issue quickly. I know vectorizing a dataframe using numpy is a good way to cut back on runtime, but I'm running into an issue with my numpy array and dictionary.

The goal is to look at the value in col3, use that as the key to df_dict, and use the value of that dictionary entry to multiply to col2 and assign to col1.

I've been able to do this using for loops, but it runs into a serious problem of runtime - especially because there are more steps involved than just what I'm asking for help on.

d = {"col1": [1, 2, 3, 4], "col2": [1, 2, 3, 4], "col3": ["a","b","c","d"]}
df = pd.DataFrame(data=d)
df_dict = {"a":1.2,"b":1.5,"c":0.95,"d":1.25}

df["col1"]=df["col2"].values*df_dict[df["col3"].values]

I expect col1 to be updated to [1.2, 3, 2.85, 5], but instead I get the error TypeError: unhashable type: 'numpy.ndarray'

I get why the error occurs, I just want to find the best alternative.

df["col1"]=df["col2"]* [df_dict.get(i, 1) for i in df["col3"].values] ? — Rakesh
– Rakesh, Commented Jul 2, 2019 at 13:10

Rakesh · Accepted Answer · 2019-07-02 13:11:35Z

1

Looks like you need.

d = {"col1": [1, 2, 3, 4], "col2": [1, 2, 3, 4], "col3": ["a","b","c","d"]}
df = pd.DataFrame(data=d)
df_dict = {"a":1.2,"b":1.5,"c":0.95,"d":1.25}

df["col1"]=df["col2"]* [df_dict.get(i, 1) for i in df["col3"]]
print(df)

Output:

   col1  col2 col3
0  1.20     1    a
1  3.00     2    b
2  2.85     3    c
3  5.00     4    d

answered Jul 2, 2019 at 13:11

Rakesh

82.9k17 gold badges85 silver badges122 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Paulfryy Over a year ago

Thanks for the answer! is this the most runtime-optimized way, considering it is still doing base python loops?

U13-Forward · Accepted Answer · 2019-07-02 13:23:25Z

0

You can use a little better solution using .map.

So replace:

df["col1"]=df["col2"].values*df_dict[df["col3"].values]

With:

df["col1"]=df["col2"] * df['col3'].map(df_dict)

answered Jul 2, 2019 at 13:23

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

1 Comment

hpaulj Over a year ago

Looks like this map works by first converting the dictionary into a Series, pd.Series(df_dict).

Collectives™ on Stack Overflow

Iterating through numpy array for use in dictionary

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related