How to remove specific values sequentially in pandas dataframes?

Question

I have a several pandas Data Frames stored in a dictionary:

df1=pd.DataFrame({'product':['ajoijoft','bbhjbh','cser','sesrd','yfgjke','tfyfyf','drdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df2=pd.DataFrame({'product':['ajyughjoijoft','bdrddbhjbh','rdtrdcser','sdtrdthddesrd','yawafgjke','tesrgsfyfyf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df3=pd.DataFrame({'product':['joijoft','bdbhjbh','rdcser','sdhddesrd','wajke','yf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})

df_dict = {"A":df1,'B':df2, "C":df3}

I want to know the length of the each string in product, so I write as below.

for i, ii in df_dict.items():
    ii['Productsize'] = ii['product'].str.len()

This worked and I could get the length for all "product".

Next, I want to remove rows that have a short product string length, that is: Productsize < 6

I tried to use this code:

for i, ii in df_dict.items():
    ii=ii[~(ii['Productsize'] <= 6)]

However, this did not work. If I write individually (i.e. not in a loop) as below, it will work though.

df1=df1[~(df1['Productsize'] <= 6)]

Does anyone know what the problem might be?

I tried you guys suggested. Unfortunately, this does not work. Do you know why...? Here is the code.

df1=pd.DataFrame({'product':['ajoijoft','bbhjbh','cser','sesrd','yfgjke','tfyfyf','drdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df2=pd.DataFrame({'product':['ajyughjoijoft','bdrddbhjbh','rdtrdcser','sdtrdthddesrd','yawafgjke','tesrgsfyfyf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})
df3=pd.DataFrame({'product':['joijoft','bdbhjbh','rdcser','sdhddesrd','wajke','yf','sresedrdrtjg'],'price':[1,2,3,4,5,6,7],'label':['h','i','j','k','L','n','m']})

df_dict = {"A":df1,'B':df2, "C":df3}

for i, ii in df_dict.items():
    ii['Productsize'] = ii['product'].str.len()    

for i, ii in df_dict.items():
    df_dict[i] = ii[~(ii['Productsize'] <= 6)]

Parfait · Accepted Answer · 2018-06-01 10:57:07Z

1

First, you should be using a dictionary or list to hold many similar structured dataframes and not flood your global environment with separate dataframes. Always use a container to organize yourself and set up to run bulk operations like pd.concat to build a master set. But be sure to assign dataframes to dictionary directly and not create separate objects.

As for the reason your dictionary dataframes do not update is you are not correctly assigning. Every instance of df needs to be replaced with df[key]. So,

df[~(df['Productsize'] <= 6)]

Would be replaced as

df_dict[key][~(df_dict[key]['Productsize'] <= 6)]

You lose no functionality of the dataframe when it is stored in a container, just referencing it changes. Therefore adjust accordingly:

for k, v in df_dict.items():
    df_dict[k]['Productsize'] = df_dict[k]['product'].str.len()  
    df_dict[k] = df_dict[k][~(df_dict[k]['Productsize'] <= 6)]

Alternatively, use the value item of dictionary loop, but reassign the temporary changes to current index as @phi explains.

for k, v in df_dict.items():
    v['Productsize'] = v['product'].str.len()  
    v = v[~(v['Productsize'] <= 6)]

    df_dict[k] = v

edited Jun 1, 2018 at 10:57

answered Jun 1, 2018 at 2:31

Parfait

108k19 gold badges102 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tom_Hanks Over a year ago

Thank you very much. I tried first one but got the following error "TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed". Also, I tried the last one. However, dataframe did not change. For example, df1 should not have "cser" but it still has it. If you guys can successfully change the dataframes, my environment might be different from yours. I am using python3 on the ipython. This makes difference...??

Parfait Over a year ago

See edit fixing first option's key issue. As for second option, I hope you are not confusing df1 (separate variable) with first dataframe element of df_dict (container of many items). The latter should update.

DYZ · Accepted Answer · 2018-05-31 23:14:46Z

1

You probably should not be building a dictionary of frames. But if you did, you should use the following code to modify the dictionary:

for i, ii in df_dict.items():
    df_dict[i] = ii[~(ii['Productsize'] <= 6)]
    #df_dict[i] = ii[(ii['Productsize'] > 6)]

The statement ii = ii[~(ii['Productsize'] <= 6)] assigns the modified dataframe to the variable ii, but the variable is overwritten at the next loop iteration.

answered May 31, 2018 at 23:14

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

6 Comments

Tom_Hanks Over a year ago

Thanks DyZ. However, this does not work either... For example, if I show df1, there is no changes...

DYZ Over a year ago

Do you want to change df1 or df_dict? It's two different objects. That's why I suggested not to use a dictionary.

Tom_Hanks Over a year ago

I would like to change all dataframes df1,df2, df3. Because I have so many dataframes more than 100, I think I have to use "for".

DYZ Over a year ago

df_dict['A']has a modified copy of your df1. But you are probably doing something wrong in the first place. Why do you need the dictionary? Why not concatenate all dataframes in one and change the combined database at once?

Tom_Hanks Over a year ago

Each dataframe represents a data from a person. After removing the "Productsize" less than 6, I want to make a plot for each dataframe. If I combine all dataframe, data from different person will be mixed toghter and I cannot make the plot. That being said, I think there would be some way to extract individual data after combining all dataframe.

|

phi · Accepted Answer · 2018-05-31 23:41:46Z

why my code does not work

When you call

for i, ii in df_dict.items()

python create 2 variables i and ii, assigned to the key and the dataframe.
In the mean time, your df1, df2, df3, df_dict do not change (During the first loop, ii and df1 reference to the same object dataframe but they are still two different variables).

Then the next expression creates another object of dataframe, assigns ii to the newly created. Your df1, df2, df3, df_dict still do not change.

ii = ii[~(ii['Productsize'] <= 6)]

In order to change the df1, you have to do it explicitly

df1 = ii

And to change the df_dict

df_dict[i] = ii

You may want to think about your variables like tags

df1 = pd.DataFrame(...)  # Create a dataframe and give it a tag df1
ii = df1  # Give the same dataframe a tag ii
ii = ii[ii.ProductSize < 6]  # Move the tag ii to the new filtered dataframe. df1 still stucks with the first data frame

Collectives™ on Stack Overflow

How to remove specific values sequentially in pandas dataframes?

3 Answers 3

2 Comments

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related