1

Assuming I've the following data frame:

import pandas as pd
df = pd.DataFrame(['a', 'b', 'c', 'd', 'a', 'c', 'f', 'a'])
print(df)

I can replace any occurrence of 'a' with 'AAA' as follows:

df.columns = ['Letters']   
for i, x in enumerate(df['Letters']):
    if x == 'a':
        df['Letters'][i] = "AAA"
print(df)

But if I extracted unique row and try to do the same thing, it does not work.

df = pd.DataFrame(['a', 'b', 'c', 'd', 'a', 'c', 'f', 'a'])
df.columns = ['Letters']
grouped = df.groupby('Letters')
index = [gp_keys[0] for gp_keys in grouped.groups.values()]
unique_df = df.reindex(index)
print(unique_df) 

for i, x in enumerate(unique_df):
    if x == 'a':
        unique_df.loc[i] = "AAA"
print(unique_df)

I am curious why doing unique_df[i] = "AAA" no longer modifies the data frame values. Even doing unique_df.loc[i] = "AAA" as suggested in the view versus copy post here seems to make no difference. It seems there is something about the groupby function that makes later modification on the data frame elusive. Any thoughts?

8
  • 2
    You are using two different things in both cases: df['Letters'] vs unique_df in the iteration/assignment. So in the second case it tries to set the ith column (it is 'Letters', not 'First'). If you replace unique_df with unique_df['Letters'], it works. But anyway, you should just better do df.loc[df['Letters']=='a', 'Letters'] = "AAA" instead of the for loop. Commented Dec 24, 2014 at 22:34
  • unique_df.loc[i] = "AAA" works fine Commented Dec 24, 2014 at 22:54
  • @AerofoilKite Are you sure, I am running the following and it's not modifying the value: for i, x in enumerate(unique_df): if x == 'a': unique_df.loc[i] = "AAA" print(unique_df) Commented Dec 24, 2014 at 22:55
  • See my answer, your problem is here: enumerate(unique_df): Commented Dec 24, 2014 at 22:56
  • It should be enumerate(unique_df.values): Commented Dec 24, 2014 at 22:56

2 Answers 2

2

It is maybe not fully answering the question, as the example you provided can be simplified, but you really should not enumerate in such a case.
If you want to modify certain values based on a conditions, you can use boolean indexing like:

df.loc[df['Letters']=='a', 'Letters'] = "AAA"

instead of doing a for loop.


The answer the original question: you need to use unique_df['Letters'] instead of unique_df in your second example (as you also did this in the first example).

Sign up to request clarification or add additional context in comments.

2 Comments

Great. Assuming I want to replace more than one value, is there a way of doing it in one swoop instead of stepwise as below: unique_df.loc[unique_df['Letters']=='a', 'Letters'] = "AAA" unique_df.loc[unique_df['Letters']=='c', 'Letters'] = "CCC"
In such a case, you could also do something like unique_df['Letters'].replace(['a', 'c'], ['AAA', 'CCC'])
0

You can try it

S = unique_df['Letters']

for i, x in enumerate(S):
    if x == 'a':
        unique_df['Letters'][i] = "AAA"
        # unique_df.loc[i] = "AAA"       -- this will work too

print(unique_df)

Or, You can use unique_df.values

for i, x in enumerate(unique_df.values):
    if x == 'a':
        unique_df['Letters'][i] = "AAA"
        # unique_df.loc[i] = "AAA"      -- this will work too
print(unique_df)

1 Comment

There is no need to convert it to a Series, unique_df['Letters'] is already a series. Also no need to use values, just enumerate unique_df['Letters'], or better: don't enumerate at all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.