1

I have a project where I'm trying to update a dataframe to a new set of changes being rolled out. There are currently 15,000 data samples in the dataframe, so runtime can become an issue quickly. I know vectorizing a dataframe using numpy is a good way to cut back on runtime, but I'm running into an issue with my numpy array and dictionary.

The goal is to look at the value in col3, use that as the key to df_dict, and use the value of that dictionary entry to multiply to col2 and assign to col1.

I've been able to do this using for loops, but it runs into a serious problem of runtime - especially because there are more steps involved than just what I'm asking for help on.

d = {"col1": [1, 2, 3, 4], "col2": [1, 2, 3, 4], "col3": ["a","b","c","d"]}
df = pd.DataFrame(data=d)
df_dict = {"a":1.2,"b":1.5,"c":0.95,"d":1.25}

df["col1"]=df["col2"].values*df_dict[df["col3"].values]

I expect col1 to be updated to [1.2, 3, 2.85, 5], but instead I get the error TypeError: unhashable type: 'numpy.ndarray'

I get why the error occurs, I just want to find the best alternative.

2
  • df["col1"]=df["col2"]* [df_dict.get(i, 1) for i in df["col3"].values] ? Commented Jul 2, 2019 at 13:10
  • The dictionary lookup has to be done one by one. Commented Jul 2, 2019 at 15:07

2 Answers 2

1

Looks like you need.

d = {"col1": [1, 2, 3, 4], "col2": [1, 2, 3, 4], "col3": ["a","b","c","d"]}
df = pd.DataFrame(data=d)
df_dict = {"a":1.2,"b":1.5,"c":0.95,"d":1.25}

df["col1"]=df["col2"]* [df_dict.get(i, 1) for i in df["col3"]]
print(df)

Output:

   col1  col2 col3
0  1.20     1    a
1  3.00     2    b
2  2.85     3    c
3  5.00     4    d
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the answer! is this the most runtime-optimized way, considering it is still doing base python loops?
0

You can use a little better solution using .map.

So replace:

df["col1"]=df["col2"].values*df_dict[df["col3"].values]

With:

df["col1"]=df["col2"] * df['col3'].map(df_dict)

1 Comment

Looks like this map works by first converting the dictionary into a Series, pd.Series(df_dict).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.