3

I have a several pandas Data Frames stored in a dictionary:

df1=pd.DataFrame({'product':['ajoijoft','bbhjbh','cser','sesrd','yfgjke','tfyfyf','drdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df2=pd.DataFrame({'product':['ajyughjoijoft','bdrddbhjbh','rdtrdcser','sdtrdthddesrd','yawafgjke','tesrgsfyfyf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df3=pd.DataFrame({'product':['joijoft','bdbhjbh','rdcser','sdhddesrd','wajke','yf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})

df_dict = {"A":df1,'B':df2, "C":df3}

I want to know the length of the each string in product, so I write as below.

for i, ii in df_dict.items():
    ii['Productsize'] = ii['product'].str.len()

This worked and I could get the length for all "product".

Next, I want to remove rows that have a short product string length, that is: Productsize < 6

I tried to use this code:

for i, ii in df_dict.items():
    ii=ii[~(ii['Productsize'] <= 6)]

However, this did not work. If I write individually (i.e. not in a loop) as below, it will work though.

df1=df1[~(df1['Productsize'] <= 6)]

Does anyone know what the problem might be?

I tried you guys suggested. Unfortunately, this does not work. Do you know why...? Here is the code.

df1=pd.DataFrame({'product':['ajoijoft','bbhjbh','cser','sesrd','yfgjke','tfyfyf','drdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df2=pd.DataFrame({'product':['ajyughjoijoft','bdrddbhjbh','rdtrdcser','sdtrdthddesrd','yawafgjke','tesrgsfyfyf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df3=pd.DataFrame({'product':['joijoft','bdbhjbh','rdcser','sdhddesrd','wajke','yf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})

df_dict = {"A":df1,'B':df2, "C":df3}

for i, ii in df_dict.items():
    ii['Productsize'] = ii['product'].str.len()    

for i, ii in df_dict.items():
    df_dict[i] = ii[~(ii['Productsize'] <= 6)]

3 Answers 3

1

First, you should be using a dictionary or list to hold many similar structured dataframes and not flood your global environment with separate dataframes. Always use a container to organize yourself and set up to run bulk operations like pd.concat to build a master set. But be sure to assign dataframes to dictionary directly and not create separate objects.

As for the reason your dictionary dataframes do not update is you are not correctly assigning. Every instance of df needs to be replaced with df[key]. So,

df[~(df['Productsize'] <= 6)]

Would be replaced as

df_dict[key][~(df_dict[key]['Productsize'] <= 6)]

You lose no functionality of the dataframe when it is stored in a container, just referencing it changes. Therefore adjust accordingly:

for k, v in df_dict.items():
    df_dict[k]['Productsize'] = df_dict[k]['product'].str.len()  
    df_dict[k] = df_dict[k][~(df_dict[k]['Productsize'] <= 6)]

Alternatively, use the value item of dictionary loop, but reassign the temporary changes to current index as @phi explains.

for k, v in df_dict.items():
    v['Productsize'] = v['product'].str.len()  
    v = v[~(v['Productsize'] <= 6)]

    df_dict[k] = v
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much. I tried first one but got the following error "TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed". Also, I tried the last one. However, dataframe did not change. For example, df1 should not have "cser" but it still has it. If you guys can successfully change the dataframes, my environment might be different from yours. I am using python3 on the ipython. This makes difference...??
See edit fixing first option's key issue. As for second option, I hope you are not confusing df1 (separate variable) with first dataframe element of df_dict (container of many items). The latter should update.
1

You probably should not be building a dictionary of frames. But if you did, you should use the following code to modify the dictionary:

for i, ii in df_dict.items():
    df_dict[i] = ii[~(ii['Productsize'] <= 6)]
    #df_dict[i] = ii[(ii['Productsize'] > 6)] 

The statement ii = ii[~(ii['Productsize'] <= 6)] assigns the modified dataframe to the variable ii, but the variable is overwritten at the next loop iteration.

6 Comments

Thanks DyZ. However, this does not work either... For example, if I show df1, there is no changes...
Do you want to change df1 or df_dict? It's two different objects. That's why I suggested not to use a dictionary.
I would like to change all dataframes df1,df2, df3. Because I have so many dataframes more than 100, I think I have to use "for".
df_dict['A']has a modified copy of your df1. But you are probably doing something wrong in the first place. Why do you need the dictionary? Why not concatenate all dataframes in one and change the combined database at once?
Each dataframe represents a data from a person. After removing the "Productsize" less than 6, I want to make a plot for each dataframe. If I combine all dataframe, data from different person will be mixed toghter and I cannot make the plot. That being said, I think there would be some way to extract individual data after combining all dataframe.
|
1

why my code does not work

When you call

for i, ii in df_dict.items()

python create 2 variables i and ii, assigned to the key and the dataframe.
In the mean time, your df1, df2, df3, df_dict do not change (During the first loop, ii and df1 reference to the same object dataframe but they are still two different variables).

Then the next expression creates another object of dataframe, assigns ii to the newly created. Your df1, df2, df3, df_dict still do not change.

ii = ii[~(ii['Productsize'] <= 6)]

In order to change the df1, you have to do it explicitly

df1 = ii

And to change the df_dict

df_dict[i] = ii

You may want to think about your variables like tags

df1 = pd.DataFrame(...)  # Create a dataframe and give it a tag df1
ii = df1  # Give the same dataframe a tag ii
ii = ii[ii.ProductSize < 6]  # Move the tag ii to the new filtered dataframe. df1 still stucks with the first data frame

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.