pandas: create another data frame using existing data frame from the given index position

Question

I have a following dataframe where I have one another list of index position based on some condition so just want to create the new dataframe based on the index position and check some condition on that.

df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]
re_ind = list(np.where(df['diff_in_min'] >= 300))
re_ind = [np.array([85, 512], dtype='int64')]

Just I want to create another dataframe based on re_ind position, ex:

first_df = df[0:85] 
another_df = [85:512] 
last_df =  [512:]

and each dataframe I want to check one condition

count = 0
temp_df = df[:re_ind[0]]
if temp_df['diff_in_min'].sum() > 500:
    count += 1
temp_df = df[re_ind[0]:re_ind[1]]
if temp_df['diff_in_min'].sum() > 500:
    count += 1
if temp_df = df[re_ind[1]:]
if temp_df['diff_in_min'].sum() > 500:
    count += 1

How can I do that using for loop with creating new data frame using existing dataframe?

jezrael · Accepted Answer · 2022-10-31 08:31:58Z

1

From sample data for groups created by df['diff_in_min'] >= 300) add cumulative sum, then aggregate sum, compare for another condition and count Trues by sum:

s = (df['diff_in_min'] >= 300).cumsum()

out = (df['diff_in_min'].groupby(s).sum() > 500).sum()
print (out)
2

answered Oct 31, 2022 at 8:31

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ScottC · Accepted Answer · 2022-10-31 08:46:32Z

0

jezrael's answer is much better and more succinct. However, in keeping with your style of programming, here is another way you could tackle it:

import pandas as pd

df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]

df_list = []
df_list.append(df[df['index']<85])
df_list.append(df[(df['index']>=85) & (df['index'] <512)])
df_list.append(df[df['index']>=512])

count = 0
for temp_df in df_list:
    if temp_df['diff_in_min'].sum() > 500:
        count += 1

print(f"Count = {count}")

OUTPUT:

Count = 2

Which is exactly what jezrael got, and why my vote goes to them.

answered Oct 31, 2022 at 8:46

ScottC

4,1251 gold badge9 silver badges22 bronze badges

2 Comments

Sushil Kokil Over a year ago

you created df_list manually if it this list contains more than 50 int then will not useful your code.

ScottC Over a year ago

given an example with more than 50, then I probably would have tackled it differently. However, I was trying to keep inline with your programming style - not mine :)

Collectives™ on Stack Overflow

pandas: create another data frame using existing data frame from the given index position

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related