2

I have this dataframe

    Region      2021    2022    2023
0   Europe      0.00    0.00    0.00
1   N.Amerca    0.50    0.50    0.50
2   N.Amerca    4.40    4.40    4.40
3   N.Amerca    0.00    8.00    8.00
4   Asia        0.00    0.00    1.75
5   Asia        0.00    0.00    0.00
6   Asia        0.00    0.00    2.00
7   N.Amerca    0.00    0.00    0.50
8   Eurpoe      6.00    6.00    6.00
9   Asia        7.50    7.50    7.50
10  Asia        3.75    3.75    3.75
11  Asia        3.50    3.50    3.50
12  Asia        3.80    3.80    3.80
13  Asia        0.00    0.00    0.00
14  Europe      6.52    6.52    6.52

Once a value in 2021 is found it should carry a 0 to the rest (2022 and 2023) and if a value in 2022 is found -it should carry 0 to the rest. In other words, once value in found in columns 2021 and forth it should zero the rest on the right.

expected result would be:

    Region      2021    2022    2023
0   Europe      0.00    0.00    0.00
1   N.Amerca    0.50    0.00    0.00
2   N.Amerca    4.40    0.00    0.00
3   N.Amerca    0.00    8.00    0.00
4   Asia        0.00    0.00    1.75
5   Asia        0.00    0.00    0.00
6   Asia        0.00    0.00    2.00
7   N.Amerca    0.00    0.00    0.50
8   Eurpoe      6.00    0.00    0.00
9   Asia        7.50    0.00    0.00
10  Asia        3.75    0.00    0.00
11  Asia        3.50    0.00    0.00
12  Asia        3.80    0.00    0.00
13  Asia        0.00    0.00    0.00
14  Europe      6.52    0.00    0.00

I have tried to apply a lambda:

def foo(r):
   #if r['2021')>0: then 2020 and forth should be zero) 

df = df.apply(lambda x: foo(x), axis=1)

but the challange is that there are 2021 - to 2030 and the foo becomes a mess)

2 Answers 2

3

Let us try duplicated

df = df.mask(df.T.apply(pd.Series.duplicated).T,0)
Out[57]: 
      Region  2021  2022  2023
0     Europe  0.00   0.0  0.00
1   N.Amerca  0.50   0.0  0.00
2   N.Amerca  4.40   0.0  0.00
3   N.Amerca  0.00   8.0  0.00
4       Asia  0.00   0.0  1.75
5       Asia  0.00   0.0  0.00
6       Asia  0.00   0.0  2.00
7   N.Amerca  0.00   0.0  0.50
8     Eurpoe  6.00   0.0  0.00
9       Asia  7.50   0.0  0.00
10      Asia  3.75   0.0  0.00
11      Asia  3.50   0.0  0.00
12      Asia  3.80   0.0  0.00
13      Asia  0.00   0.0  0.00
14    Europe  6.52   0.0  0.00
Sign up to request clarification or add additional context in comments.

3 Comments

@ProcolHarum I am not sure why ~ maybe just do not like the apply ~
I know that apply is a memory consumer but your answer is on spot
I am not sure -> havent tried, but maybe you can also explore df.shift on axis=1 and cumsum on axis=1 together to put in a condition too :)
1

This is another way:

df2 = df.set_index('Region').diff(axis=1).reset_index()
df2['2021'] = df['2021']

or:

df.iloc[:,1:].where(df.iloc[:,1:].ne(0).cumsum(axis=1).eq(1),0)

or

df2 = df.ne(0).cummax(axis=1)
df.where(df2 ^ df2.shift(axis=1),0)

or

df.where(df.rank(method = 'first',ascending=False,axis=1).eq(1),0)

Output:

    2021  2022  2023
0   0.00   0.0  0.00
1   0.50   0.0  0.00
2   4.40   0.0  0.00
3   0.00   8.0  0.00
4   0.00   0.0  1.75
5   0.00   0.0  0.00
6   0.00   0.0  2.00
7   0.00   0.0  0.50
8   6.00   0.0  0.00
9   7.50   0.0  0.00
10  3.75   0.0  0.00
11  3.50   0.0  0.00
12  3.80   0.0  0.00
13  0.00   0.0  0.00
14  6.52   0.0  0.00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.