I have a project where I'm trying to update a dataframe to a new set of changes being rolled out. There are currently 15,000 data samples in the dataframe, so runtime can become an issue quickly. I know vectorizing a dataframe using numpy is a good way to cut back on runtime, but I'm running into an issue with my numpy array and dictionary.
The goal is to look at the value in col3, use that as the key to df_dict, and use the value of that dictionary entry to multiply to col2 and assign to col1.
I've been able to do this using for loops, but it runs into a serious problem of runtime - especially because there are more steps involved than just what I'm asking for help on.
d = {"col1": [1, 2, 3, 4], "col2": [1, 2, 3, 4], "col3": ["a","b","c","d"]}
df = pd.DataFrame(data=d)
df_dict = {"a":1.2,"b":1.5,"c":0.95,"d":1.25}
df["col1"]=df["col2"].values*df_dict[df["col3"].values]
I expect col1 to be updated to [1.2, 3, 2.85, 5], but instead I get the error
TypeError: unhashable type: 'numpy.ndarray'
I get why the error occurs, I just want to find the best alternative.
df["col1"]=df["col2"]* [df_dict.get(i, 1) for i in df["col3"].values]?