Pandas find sequence or pattern in column

Question

Here's some example data for the problem I'm working on:

index     Quarter    Sales_Growth
0          2001q1    0
1          2002q2    0
2          2002q3    1
3          2002q4    0
4          2003q1    0
5          2004q2    0
6          2004q3    1
7          2004q4    1

The Sales_Growth column tells me if there was indeed sales growth in the quarter or not. 0 = no growth, 1 = growth.

First, I'm trying to return the first Quarter when there were two consecutive quarters of no sales growth.

With the data above this answer would be 2001q1.

Then, I want to return the 2nd quarter of consecutive sales growth that occurs AFTER the initial two quarters of no growth.

The answer to this question would be 2004q4.

I've searched but the closest answer I can find I can't get to work: https://stackoverflow.com/a/26539166/3225420

I am a Pandas beginner.

John Zwinck · Accepted Answer · 2017-03-02 12:32:53Z

9

You're doing subsequence matching. This is a bit strange, but bear with me:

growth = df.Sales_Growth.astype(str).str.cat()

That gives you:

'00100011'

Then:

growth.index('0011')

Gives you 4 (obviously you'd add a constant 3 to get the index of the last row matched by the pattern).

I feel this approach starts off a bit ugly, but the end result is really usable--you can search for any fixed pattern with no additional coding.

answered Mar 2, 2017 at 12:32

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Python_Learner Over a year ago

Picked this answer for two main reasons: First, it worked and secondly I understand it. Other answers may have worked just as well but were beyond my understanding. The fault is not on the people who provided them that I don't understand, it's on me. I need to learn more. Simply sharing my reasoning.

John Zwinck Over a year ago

@SDS: Cheers. I also found this way of dealing with it much easier to think about. Plus it's less typing. :)

Shan Dou Over a year ago

Just...superb! :)

languitar · Accepted Answer · 2017-03-02 12:18:03Z

9

For Q1:

temp = df.Sales_Growth + df.Sales_Growth.shift(-1)
df[temp == 0].head(1)

For Q2:

df[(df.Sales_Growth == 1) & (df.Sales_Growth.shift(1) == 1) & (df.Sales_Growth.shift(2) == 0) & (df.Sales_Growth.shift(3) == 0)].head(1)

answered Mar 2, 2017 at 12:18

languitar

6,8342 gold badges42 silver badges66 bronze badges

1 Comment

Seth Kingsley Over a year ago

This should be the top answer. @john-zwink's answer above doesn't generalize to multiple columns, requires you to serialize/delimit the data manually, and would generate the entire string in memory before searching. Shifting in the correct solution!

Bill G · Accepted Answer · 2017-03-02 20:54:25Z

3

Building on the earlier answers. Q1:

temp = df.Sales_Growth.rolling_apply(window=2, min_periods=2, \
    kwargs={pattern: [0,0]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

In the rolling_apply call, window and min_periods must match the length of the pattern list being passed to the rolling_apply function.

Q2: Same approach, different pattern:

temp = df.Sales_Growth.rolling_apply(window=4, min_periods=4, \
    kwargs={pattern: [0,0,1,1]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

answered Mar 2, 2017 at 20:54

Bill G

744 bronze badges

1 Comment

John Zwinck Over a year ago

You're effectively making a Python loop here, which is slow. Usually in Pandas we avoid that.

Collectives™ on Stack Overflow

Pandas find sequence or pattern in column

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related