8

Here's some example data for the problem I'm working on:

index     Quarter    Sales_Growth
0          2001q1    0
1          2002q2    0
2          2002q3    1
3          2002q4    0
4          2003q1    0
5          2004q2    0
6          2004q3    1
7          2004q4    1

The Sales_Growth column tells me if there was indeed sales growth in the quarter or not. 0 = no growth, 1 = growth.

First, I'm trying to return the first Quarter when there were two consecutive quarters of no sales growth.

With the data above this answer would be 2001q1.

Then, I want to return the 2nd quarter of consecutive sales growth that occurs AFTER the initial two quarters of no growth.

The answer to this question would be 2004q4.

I've searched but the closest answer I can find I can't get to work: https://stackoverflow.com/a/26539166/3225420

I am a Pandas beginner.

3 Answers 3

9

You're doing subsequence matching. This is a bit strange, but bear with me:

growth = df.Sales_Growth.astype(str).str.cat()

That gives you:

'00100011'

Then:

growth.index('0011')

Gives you 4 (obviously you'd add a constant 3 to get the index of the last row matched by the pattern).

I feel this approach starts off a bit ugly, but the end result is really usable--you can search for any fixed pattern with no additional coding.

Sign up to request clarification or add additional context in comments.

3 Comments

Picked this answer for two main reasons: First, it worked and secondly I understand it. Other answers may have worked just as well but were beyond my understanding. The fault is not on the people who provided them that I don't understand, it's on me. I need to learn more. Simply sharing my reasoning.
@SDS: Cheers. I also found this way of dealing with it much easier to think about. Plus it's less typing. :)
Just...superb! :)
9

For Q1:

temp = df.Sales_Growth + df.Sales_Growth.shift(-1)
df[temp == 0].head(1)

For Q2:

df[(df.Sales_Growth == 1) & (df.Sales_Growth.shift(1) == 1) & (df.Sales_Growth.shift(2) == 0) & (df.Sales_Growth.shift(3) == 0)].head(1)

1 Comment

This should be the top answer. @john-zwink's answer above doesn't generalize to multiple columns, requires you to serialize/delimit the data manually, and would generate the entire string in memory before searching. Shifting in the correct solution!
3

Building on the earlier answers. Q1:

temp = df.Sales_Growth.rolling_apply(window=2, min_periods=2, \
    kwargs={pattern: [0,0]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

In the rolling_apply call, window and min_periods must match the length of the pattern list being passed to the rolling_apply function.

Q2: Same approach, different pattern:

temp = df.Sales_Growth.rolling_apply(window=4, min_periods=4, \
    kwargs={pattern: [0,0,1,1]}, func=lambda x, pattern: x == pattern)
print(df[temp==1].head())

1 Comment

You're effectively making a Python loop here, which is slow. Usually in Pandas we avoid that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.