modifying values in pandas dataframe

Question

Assuming I've the following data frame:

import pandas as pd
df = pd.DataFrame(['a', 'b', 'c', 'd', 'a', 'c', 'f', 'a'])
print(df)

I can replace any occurrence of 'a' with 'AAA' as follows:

df.columns = ['Letters']   
for i, x in enumerate(df['Letters']):
    if x == 'a':
        df['Letters'][i] = "AAA"
print(df)

But if I extracted unique row and try to do the same thing, it does not work.

df = pd.DataFrame(['a', 'b', 'c', 'd', 'a', 'c', 'f', 'a'])
df.columns = ['Letters']
grouped = df.groupby('Letters')
index = [gp_keys[0] for gp_keys in grouped.groups.values()]
unique_df = df.reindex(index)
print(unique_df) 

for i, x in enumerate(unique_df):
    if x == 'a':
        unique_df.loc[i] = "AAA"
print(unique_df)

I am curious why doing unique_df[i] = "AAA" no longer modifies the data frame values. Even doing unique_df.loc[i] = "AAA" as suggested in the view versus copy post here seems to make no difference. It seems there is something about the groupby function that makes later modification on the data frame elusive. Any thoughts?

You are using two different things in both cases: df['Letters'] vs unique_df in the iteration/assignment. So in the second case it tries to set the ith column (it is 'Letters', not 'First'). If you replace unique_df with unique_df['Letters'], it works. But anyway, you should just better do df.loc[df['Letters']=='a', 'Letters'] = "AAA" instead of the for loop. — joris
– joris, Commented Dec 24, 2014 at 22:34
@AerofoilKite Are you sure, I am running the following and it's not modifying the value: for i, x in enumerate(unique_df): if x == 'a': unique_df.loc[i] = "AAA" print(unique_df) — sedeh
– sedeh, Commented Dec 24, 2014 at 22:55

joris · Accepted Answer · 2014-12-24 23:09:07Z

2

It is maybe not fully answering the question, as the example you provided can be simplified, but you really should not enumerate in such a case.
If you want to modify certain values based on a conditions, you can use boolean indexing like:

df.loc[df['Letters']=='a', 'Letters'] = "AAA"

instead of doing a for loop.

The answer the original question: you need to use unique_df['Letters'] instead of unique_df in your second example (as you also did this in the first example).

answered Dec 24, 2014 at 23:09

joris

140k37 gold badges257 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sedeh Over a year ago

Great. Assuming I want to replace more than one value, is there a way of doing it in one swoop instead of stepwise as below: unique_df.loc[unique_df['Letters']=='a', 'Letters'] = "AAA" unique_df.loc[unique_df['Letters']=='c', 'Letters'] = "CCC"

joris Over a year ago

In such a case, you could also do something like unique_df['Letters'].replace(['a', 'c'], ['AAA', 'CCC'])

Shahriar · Accepted Answer · 2014-12-24 23:08:40Z

0

You can try it

S = unique_df['Letters']

for i, x in enumerate(S):
    if x == 'a':
        unique_df['Letters'][i] = "AAA"
        # unique_df.loc[i] = "AAA"       -- this will work too

print(unique_df)

Or, You can use unique_df.values

for i, x in enumerate(unique_df.values):
    if x == 'a':
        unique_df['Letters'][i] = "AAA"
        # unique_df.loc[i] = "AAA"      -- this will work too
print(unique_df)

edited Dec 24, 2014 at 23:08

answered Dec 24, 2014 at 22:49

Shahriar

13.9k11 gold badges83 silver badges97 bronze badges

1 Comment

joris Over a year ago

There is no need to convert it to a Series, unique_df['Letters'] is already a series. Also no need to use values, just enumerate unique_df['Letters'], or better: don't enumerate at all.

Collectives™ on Stack Overflow

modifying values in pandas dataframe

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related