Sequentially update DataFrame value based on previous row's value

Question

Not sure if there is a more elegant way to do what I want to do. Basically, I need to determine the current row's "position" value based on the "factor" value and the previous row's "position" value.

I tried to loop through the DataFrame and use some if else statements to update the value, but it is very clumpy and the values didn't get updated.

Please kindly help, million thanks!

                       factor  position
time                                   
2022-05-13 06:00:00  0.489471         0
2022-05-13 07:00:00  0.711030         0
2022-05-13 08:00:00  0.566865         0
2022-05-13 09:00:00  0.489471         0
2022-05-13 10:00:00  0.288419         0

import pandas as pd

df = pd.DataFrame({'time': ['2022-05-13 06:00:00', '2022-05-13 07:00:00', '2022-05-13 08:00:00','2022-05-13 09:00:00', '2022-05-13 10:00:00'],
                   'factor': [0.489471, 0.711030, 0.566865, 0.489471, 0.288419],
                   'position': [0, 0, 0, 0, 0]})
df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)

threshold_2 = 0.7
threshold_1 = 0.35

for i in range(0, len(df)):
    # no position
    if i == 0 or df.iloc[i-1, :]['position'] == 0:
        if df.iloc[i, :]['factor'] > threshold_2:
            df.iloc[i, :]['position'] = 1
        else:
            df.iloc[i, :]['position'] = 0

    #has position
    elif df.iloc[i-1, :]['position'] != 0:
        if df.iloc[i, :]['factor'] > threshold_1:
            df.iloc[i, :]['position'] = 1
        else:
            df.iloc[i, :]['position'] = 0

can you explain the logic and provide the expected output?

mozway
– mozway

2022-11-05 07:10:00 +00:00
Commented Nov 5, 2022 at 7:10 — mozway
– mozway, Commented Nov 5, 2022 at 7:10

mozway · Accepted Answer · 2022-11-05 07:16:51Z

2

IIUC, you can use this vectorial alternative:

threshold_2 = 0.7
threshold_1 = 0.35

m1 = df['factor'].gt(threshold_2)

group = m1.cumsum()

m2 = df.loc[group>0, 'factor'].gt(threshold_1).groupby(group).cummin()

df['position'] = (m1|df.index.isin(m2[m2].index)).astype(int)

output:

                       factor  position
time                                   
2022-05-13 06:00:00  0.489471         0
2022-05-13 07:00:00  0.711030         1
2022-05-13 08:00:00  0.566865         1
2022-05-13 09:00:00  0.489471         1
2022-05-13 10:00:00  0.288419         0

answered Nov 5, 2022 at 7:16

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rabinzel Over a year ago

little offtopic. Would you mind having a quick look on my answer here ( the updated part) please? Task is: "groupby CUI, find row where value equals a value in one column and pick value of another column. Set whole group to that value." Would you do it the same way I did there?

mozway Over a year ago

@Rabinzel I would rather mask the non-MSH values and bfill

Rabinzel Over a year ago

ok, thanks! but that only work in that case, because MSH values are both the last of its group right ?

mozway Over a year ago

Yes, otherwise masking and groupby.transform('first'), or your approach

Rabinzel · Accepted Answer · 2022-11-05 07:15:50Z

When you use chained indexing, the order and type of the indexing operation partially determine whether the result is a slice into the original object, or a copy of the slice.

Read more about it here

I made some little changes to your code, so that it is working, e.g you don't need the inner if/else, you can just write 1 if foo else 0.

df = pd.DataFrame({'time': ['2022-05-13 06:00:00', '2022-05-13 07:00:00', '2022-05-13 08:00:00','2022-05-13 09:00:00', '2022-05-13 10:00:00'],
                   'factor': [0.489471, 0.711030, 0.566865, 0.489471, 0.288419],
                   'position': [0, 0, 0, 0, 0]})
df['time'] = pd.to_datetime(df['time'])
df.set_index('time', inplace=True)
threshold_2 = 0.7
threshold_1 = 0.35

for i in range(0, len(df)):
    # no position
    if i == 0 or df.loc[df.index[i-1], 'position'] == 0:
        df.loc[df.index[i], 'position'] = 1 if df.loc[df.index[i], 'factor'] > threshold_2 else 0
    #has position
    elif df.loc[df.index[i-1], 'position'] != 0:
        df.loc[df.index[i], 'position'] = 1 if df.loc[df.index[i], 'factor'] > threshold_1 else 0

print(df)

                       factor  position
time                                   
2022-05-13 06:00:00  0.489471         0
2022-05-13 07:00:00  0.711030         1
2022-05-13 08:00:00  0.566865         1
2022-05-13 09:00:00  0.489471         1
2022-05-13 10:00:00  0.288419         0

Collectives™ on Stack Overflow

Sequentially update DataFrame value based on previous row's value

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related