python Iterative loop through columns of dataframe

Question

Working on a problem, I have the following dataframe in python

    week    hour    week_hr     store_code  baskets
0   201616  106     201616106   505         0
1   201616  107     201616107   505         0
2   201616  108     201616108   505         0
3   201616  109     201616109   505         18
4   201616  110     201616110   505         0
5   201616  106     201616108   910         0
6   201616  107     201616106   910         0
7   201616  108     201616107   910         2
8   201616  109     201616108   910         3
9   201616  110     201616109   910         10

Here "hour" variable is a concat of "weekday" and "hour of shop", example weekday is monday=1 and hour of shop is 6am then hour variable = 106, similarly cal_hr is a concat of week and hour. I want to get those rows where i see a trend of no baskets , i.e 0 baskets for rolling 3 weeks. in the above case i will only get the first 3 rows. i.e. for store 505 there is a continuous cycle of 1 baskets from 106 to 108. But i do not want the rows (4,5,6) because even though there are 0 baskets for 3 continuous hours but the hours are actually NOT continuous. 110 -> 106 -> 107 . For the hours to be continuous they should lie in the range of 106 - 110.. Essentially i want all stores and the respective rows if it has 0 baskets for continuous 3 hours on any given day. Dummy output

    week    hour    week_hr     store_code  baskets
0   201616  106     201616106   505         0
1   201616  107     201616107   505         0
2   201616  108     201616108   505         0

Can i do this in python using pandas and loops? The dataset requires sorting by store and hour. Completely new to python (

Additional info, can i loop in the following way, loop through each hour, but for each hour check for the next 3 hours, if the baskets in those hours are 0 , then return those rows, — Mukul
– Mukul, Commented Jul 22, 2016 at 18:29
I would suggest first getting a list with all element that have baskets = 0. (create a dictionnary with store_code as keys, and list of hours where baskets = 0 as value. Then, find any three or more elements in a row for the list corresponding to each store_code, and finally using the list of 'good hours & store_code' to get the full data from the db. I'll try to do a more detailed answer — HolyDanna
– HolyDanna, Commented Jul 22, 2016 at 18:30

caiohamamura · Accepted Answer · 2016-10-29 13:27:58Z

1

Do the following:

Sort by store_code, week_hr
Filter by 0
Store the subtraction between df['week_hr'][1:].values-df['week_hr'][:-1].values so you will get to know if they are continuos.

Now you can give groups to continuous and filter as you want.

import numpy as np
import pandas as pd

# 1
t1 = df.sort_values(['store_code', 'week_hr'])

# 2
t2 = t1[t1['baskets'] == 0]

# 3
continuous = t2['week_hr'][1:].values-t2['week_hr'][:-1].values == 1
groups = np.cumsum(np.hstack([False, continuous==False]))
t2['groups'] = groups

# 4
t3 = t2.groupby(['store_code', 'groups'], as_index=False)['week_hr'].count()
t4 = t3[t3.week_hr > 2]
print pd.merge(t2, t4[['store_code', 'groups']])

There's no need for looping!

edited Oct 29, 2016 at 13:27

answered Jul 22, 2016 at 20:14

caiohamamura

2,89025 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mukul Over a year ago

Thanks a lot. Will the above code look for pattern of 3 or more consecutive occurences of 0. Can i create an index for each week day and hour. Example . For week 201616 and hour of monday 106 and so on.i can create a variable that has index. So the sequence become actual number. Say for 201616106 value is 1 then 201616107 value is 2..and so on..or may be i dont even need to do it? Can you please explain in detail. Still learning python. Newbie here

caiohamamura Over a year ago

The code I provided will work for that dataframe you specified and will detect 3 or more consecutive 0's for the same store_code, week_hr. The relevant code is t4 = t3[t3.week_hr > 2] where week_hr is the count of consecutive occurrences (so it is filtering more than 2 consecutive occurrences). There's no need to create index, the week_hr will work just fine with groupby clause. Have you run it? Did it work?

Mukul Over a year ago

this has worked great , just that i am unable to understand the part after #3. What does hstack do ? Can you please explain. Thanks a lot for your help

caiohamamura Over a year ago

When you subtract next-current your actual array will have a -1 length. Ex: 4,5,6,7 will lead to True,True,True, so I have to hstack to add the first element as False to be able to add a column with the same length.

Cuong Tran · Accepted Answer · 2016-07-22 19:23:30Z

0

You can solve:

Sort by store_code, week_hr
Filter by 0
Group by store_code
Find continuous

Code:

t1 = df.sort_values(['store_code', 'week_hr'])

t2 = t1[t1['baskets'] == 0]

grouped = t2.groupby('store_code')['week_hr'].apply(lambda x: x.tolist())    

for store_code, week_hrs in grouped.iteritems():
    print(store_code, week_hrs)
    # do something

edited Jul 22, 2016 at 19:23

answered Jul 22, 2016 at 19:14

Cuong Tran

2,0092 gold badges21 silver badges22 bronze badges

1 Comment

Mukul Over a year ago

Thanks a lot cuong. Will the above code look for pattern of 3 or more consecutive occurences of 0. What does apply do ? What is tolist().what is iteritems? Can you please explain in detail. Still learning python. Newbie here.

Collectives™ on Stack Overflow

python Iterative loop through columns of dataframe

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related