64

this is a rather similar question to this question but with one key difference: I'm selecting the data I want to change not by its index but by some criteria.

If the criteria I apply return a single row, I'd expect to be able to set the value of a certain column in that row in an easy way, but my first attempt doesn't work:

>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009], 
...                   'flavour':['strawberry','strawberry','banana','banana',
...                   'strawberry','strawberry','banana','banana'],
...                   'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
...                   'sales':[10,12,22,23,11,13,23,24]})

>>> d
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana     24  2009

>>> d[d.sales==24]
   day flavour  sales  year
7  sun  banana     24  2009

>>> d[d.sales==24].sales = 100
>>> d
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana     24  2009

So rather than setting 2009 Sunday's Banana sales to 100, nothing happens! What's the nicest way to do this? Ideally the solution should use the row number, as you normally don't know that in advance!

3 Answers 3

94

Many ways to do that

1

In [7]: d.sales[d.sales==24] = 100

In [8]: d
Out[8]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana    100  2009

2

In [26]: d.loc[d.sales == 12, 'sales'] = 99

In [27]: d
Out[27]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     99  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana    100  2009

3

In [28]: d.sales = d.sales.replace(23, 24)

In [29]: d
Out[29]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     99  2008
2  sat      banana     22  2008
3  sun      banana     24  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     24  2009
7  sun      banana    100  2009
Sign up to request clarification or add additional context in comments.

8 Comments

Yes! Solution 1. worked. Sort of counter-intuitive that this works: d.sales[d.sales==24] = 100 but this doesn't: d[d.sales==24].sales=100. They look (functionally) to be the same to me. Ah well. Thanks @waitingkuo.
d[d.sales==24] generate a new object.
re. @waitingkuo's comment: is that expected behaviour, pandas guys? Certainly not intuitive that d[d.sales==24] should generate a copy of the original DataFrame. In fact, I'd say that every object should be a reference to the original (including selecting a single row which, correctly, 'collapses' to a pandas Series) unless explicitly requested by the user (via some kind of copy=True). Thoughts?
FYI: these now will raise/warn in 0.13 see (here)[pandas.pydata.org/pandas-docs/dev/…)
@Jeff How to handle these warning, or is there any other right way to do this without warning?
|
16

Not sure about older version of pandas, but in 0.16 the value of a particular cell can be set based on multiple column values.

Extending the answer provided by @waitingkuo, the same operation can also be done based on values of multiple columns.

d.loc[(d.day== 'sun') & (d.flavour== 'banana') & (d.year== 2009),'sales'] = 100

Comments

8

Old question, but I'm surprised nobody mentioned numpy's .where() functionality (which can be called directly from the pandas module).

In this case the code would be:

d.sales = pd.np.where(d.sales == 24, 100, d.sales)

To my knowledge, this is one of the fastest ways to conditionally change data across a series.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.