calculate diff to previous matching row in a dataframe

Question

I have a series of timestamps (+ other data) that come from 2 separate streams of data ticking at different rates, an example below (NB: the frequency of the real data has some jitter so it's not a simple fixed stride like below)

src,idx,ts
B,1,20
A,1,100
A,2,200
A,3,300
B,2,320
A,4,400
A,5,500
A,6,600
B,3,620

for each A tick, I need to calculate the offset from the preceding B tick so it would become

src,idx,ts
A,1,80
A,2,180
A,3,280
A,4,80
A,5,180
A,6,280

how to do this in pandas without iteration?

I thought of some sort of rolling window but with a dynamic/criteria based window or some hybrid of merge_asof and group by but can't think of a way to do it.

mcsoini · Accepted Answer · 2025-07-05 15:23:47Z

5

You could group by changing B and subtract the first (B-row) from each group ts. Then maybe filter by not equal B to reproduce your desired final df:

import pandas as pd

df = pd.DataFrame(
    {"src": ["B", "A", "A", "A", "B", "A", "A", "A", "B"], 
     "idx": [1, 1, 2, 3, 2, 4, 5, 6, 3], 
     "ts": [20, 100, 200, 300, 320, 400, 500, 600, 620]}
)

df["ts"] -= df.groupby(df.src.eq("B").cumsum())["ts"].transform("first")

df.query("src != 'B'")

More detail:

df.src.eq("B").cumsum() gives a Series which increases by one each time a "B" is encountered. This is what we want to group the DataFrame into sections between subsequent "B"s. For each group between each B (inclusive) and the following B (exclusive), we subtract the ts value at the first B position from all ts values within the group, hence resetting to zero at each B.

edited Jul 5 at 15:23

answered Jul 5 at 14:59

mcsoini

6,7922 gold badges21 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Reinderien Jul 5 at 15:14

Apply should be avoided here.

mcsoini Jul 5 at 15:23

good point you make. edited.

Reinderien · Accepted Answer · 2025-07-05 16:02:36Z

Here is another implementation. I have not benchmarked it. It relies on a forward-fill.

import pandas as pd

df = pd.DataFrame({
    'src': ['B', 'A', 'A', 'A', 'B', 'A', 'A', 'A', 'B'],
     'idx': [1, 1, 2, 3, 2, 4, 5, 6, 3],
     'ts': [20, 100, 200, 300, 320, 400, 500, 600, 620],
})

bts = df.loc[df['src'] == 'B', 'ts'].reindex(df.index, method='ffill')
df['delta'] = df['ts'] - bts
print(df)

  src  idx   ts  delta
0   B    1   20      0
1   A    1  100     80
2   A    2  200    180
3   A    3  300    280
4   B    2  320      0
5   A    4  400     80
6   A    5  500    180
7   A    6  600    280
8   B    3  620      0

If you really only want the A rows, then

import pandas as pd

df = pd.DataFrame({
    'src': ['B', 'A', 'A', 'A', 'B', 'A', 'A', 'A', 'B'],
     'idx': [1, 1, 2, 3, 2, 4, 5, 6, 3],
     'ts': [20, 100, 200, 300, 320, 400, 500, 600, 620],
})

is_a = df['src'] == 'A'
bts = df.loc[~is_a, 'ts'].reindex(df.index, method='ffill')
df['delta'] = df['ts'] - bts
print(df.loc[is_a, ['idx', 'delta']])

   idx  delta
1    1     80
2    2    180
3    3    280
5    4     80
6    5    180
7    6    280

PaulS · Accepted Answer · 2025-07-07 20:07:50Z

2

Another possible solution:

m = df['src'].eq('B')
df.assign(ts = df['ts'].sub(df['ts'].where(m).ffill()))[~m]

It first creates a Boolean mask m to identify rows where src is B. Then, using Series.where, it keeps timestamps only where src is "B" and replaces other entries with NaN; next, Series.ffill forward-fills these timestamps so that each A row gets the timestamp of the preceding B. Finally, the code subtracts this forward-filled B timestamp from each original timestamp via Series.sub and returns only the rows where src is not B.

Output:

  src  idx     ts
1   A    1   80.0
2   A    2  180.0
3   A    3  280.0
5   A    4   80.0
6   A    5  180.0
7   A    6  280.0

edited Jul 7 at 20:07

answered Jul 6 at 17:09

PaulS

27.1k3 gold badges18 silver badges40 bronze badges

Collectives™ on Stack Overflow

calculate diff to previous matching row in a dataframe

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related