change values in dataframe row based on condition

Question

I have this dataframe

    Region      2021    2022    2023
0   Europe      0.00    0.00    0.00
1   N.Amerca    0.50    0.50    0.50
2   N.Amerca    4.40    4.40    4.40
3   N.Amerca    0.00    8.00    8.00
4   Asia        0.00    0.00    1.75
5   Asia        0.00    0.00    0.00
6   Asia        0.00    0.00    2.00
7   N.Amerca    0.00    0.00    0.50
8   Eurpoe      6.00    6.00    6.00
9   Asia        7.50    7.50    7.50
10  Asia        3.75    3.75    3.75
11  Asia        3.50    3.50    3.50
12  Asia        3.80    3.80    3.80
13  Asia        0.00    0.00    0.00
14  Europe      6.52    6.52    6.52

Once a value in 2021 is found it should carry a 0 to the rest (2022 and 2023) and if a value in 2022 is found -it should carry 0 to the rest. In other words, once value in found in columns 2021 and forth it should zero the rest on the right.

expected result would be:

    Region      2021    2022    2023
0   Europe      0.00    0.00    0.00
1   N.Amerca    0.50    0.00    0.00
2   N.Amerca    4.40    0.00    0.00
3   N.Amerca    0.00    8.00    0.00
4   Asia        0.00    0.00    1.75
5   Asia        0.00    0.00    0.00
6   Asia        0.00    0.00    2.00
7   N.Amerca    0.00    0.00    0.50
8   Eurpoe      6.00    0.00    0.00
9   Asia        7.50    0.00    0.00
10  Asia        3.75    0.00    0.00
11  Asia        3.50    0.00    0.00
12  Asia        3.80    0.00    0.00
13  Asia        0.00    0.00    0.00
14  Europe      6.52    0.00    0.00

I have tried to apply a lambda:

def foo(r):
   #if r['2021')>0: then 2020 and forth should be zero) 

df = df.apply(lambda x: foo(x), axis=1)

but the challange is that there are 2021 - to 2030 and the foo becomes a mess)

BENY · Accepted Answer · 2021-04-13 15:16:21Z

3

Let us try duplicated

df = df.mask(df.T.apply(pd.Series.duplicated).T,0)
Out[57]: 
      Region  2021  2022  2023
0     Europe  0.00   0.0  0.00
1   N.Amerca  0.50   0.0  0.00
2   N.Amerca  4.40   0.0  0.00
3   N.Amerca  0.00   8.0  0.00
4       Asia  0.00   0.0  1.75
5       Asia  0.00   0.0  0.00
6       Asia  0.00   0.0  2.00
7   N.Amerca  0.00   0.0  0.50
8     Eurpoe  6.00   0.0  0.00
9       Asia  7.50   0.0  0.00
10      Asia  3.75   0.0  0.00
11      Asia  3.50   0.0  0.00
12      Asia  3.80   0.0  0.00
13      Asia  0.00   0.0  0.00
14    Europe  6.52   0.0  0.00

answered Apr 13, 2021 at 15:16

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

BENY Over a year ago

@ProcolHarum I am not sure why ~ maybe just do not like the apply ~

ProcolHarum Over a year ago

I know that apply is a memory consumer but your answer is on spot

anky Over a year ago

I am not sure -> havent tried, but maybe you can also explore df.shift on axis=1 and cumsum on axis=1 together to put in a condition too :)

rhug123 · Accepted Answer · 2024-07-16 19:17:38Z

1

This is another way:

df2 = df.set_index('Region').diff(axis=1).reset_index()
df2['2021'] = df['2021']

or:

df.iloc[:,1:].where(df.iloc[:,1:].ne(0).cumsum(axis=1).eq(1),0)

or

df2 = df.ne(0).cummax(axis=1)
df.where(df2 ^ df2.shift(axis=1),0)

or

df.where(df.rank(method = 'first',ascending=False,axis=1).eq(1),0)

Output:

    2021  2022  2023
0   0.00   0.0  0.00
1   0.50   0.0  0.00
2   4.40   0.0  0.00
3   0.00   8.0  0.00
4   0.00   0.0  1.75
5   0.00   0.0  0.00
6   0.00   0.0  2.00
7   0.00   0.0  0.50
8   6.00   0.0  0.00
9   7.50   0.0  0.00
10  3.75   0.0  0.00
11  3.50   0.0  0.00
12  3.80   0.0  0.00
13  0.00   0.0  0.00
14  6.52   0.0  0.00

edited Jul 16, 2024 at 19:17

answered Apr 13, 2021 at 15:42

rhug123

8,8801 gold badge14 silver badges27 bronze badges

Collectives™ on Stack Overflow

change values in dataframe row based on condition

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related