2

I have a following dataframe where I have one another list of index position based on some condition so just want to create the new dataframe based on the index position and check some condition on that.

df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]
re_ind = list(np.where(df['diff_in_min'] >= 300))
re_ind = [np.array([85, 512], dtype='int64')]

Just I want to create another dataframe based on re_ind position, ex:

first_df = df[0:85] 
another_df = [85:512] 
last_df =  [512:] 

and each dataframe I want to check one condition

count = 0
temp_df = df[:re_ind[0]]
if temp_df['diff_in_min'].sum() > 500:
    count += 1
temp_df = df[re_ind[0]:re_ind[1]]
if temp_df['diff_in_min'].sum() > 500:
    count += 1
if temp_df = df[re_ind[1]:]
if temp_df['diff_in_min'].sum() > 500:
    count += 1

How can I do that using for loop with creating new data frame using existing dataframe?

2 Answers 2

1

From sample data for groups created by df['diff_in_min'] >= 300) add cumulative sum, then aggregate sum, compare for another condition and count Trues by sum:

s = (df['diff_in_min'] >= 300).cumsum()

out = (df['diff_in_min'].groupby(s).sum() > 500).sum()
print (out)
2
Sign up to request clarification or add additional context in comments.

Comments

0

jezrael's answer is much better and more succinct. However, in keeping with your style of programming, here is another way you could tackle it:

import pandas as pd

df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]

df_list = []
df_list.append(df[df['index']<85])
df_list.append(df[(df['index']>=85) & (df['index'] <512)])
df_list.append(df[df['index']>=512])

count = 0
for temp_df in df_list:
    if temp_df['diff_in_min'].sum() > 500:
        count += 1

print(f"Count = {count}")

OUTPUT:

Count = 2

Which is exactly what jezrael got, and why my vote goes to them.

2 Comments

you created df_list manually if it this list contains more than 50 int then will not useful your code.
given an example with more than 50, then I probably would have tackled it differently. However, I was trying to keep inline with your programming style - not mine :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.