0

I need to split a dataframe into 3 unique dataframes based on a header-row reoccuring in the dataframe.

My dataframe looks like:

        0         1             2     ....   14
0   Alert     Type      Response           Cost
1     w1        x1            y1            z1
2     w2        x2            y2            z3
.      .         .             .             .
.      .         .             .             .
144 Alert     Type      Response           Cost
145   a1        b1            c1             d1
146   a2        b2            c2             d2

I was trying to get the index numbers containing the word "Alert" with loc to slice the dataframe into the sub dataframes.

indexes = df.index[df.loc[df[0] == "Alert"]].tolist()

But this returns:

IndexError: arrays used as indices must be of integer (or boolean) type

Any hint on that error or is there even a way I don't see (e.g. smth like group by?)

Thanks for your help.

0

2 Answers 2

3

np.split

dfs = np.split(df, np.flatnonzero(df[0] == 'Alert')[1:])

Explanation

  • Find where df[0] is equal to 'Alert'

    np.flatnonzero(df[0] == 'Alert')
    
  • Ignore the first one because we don't need an empty list element

    np.flatnonzero(df[0] == 'Alert')[1:]
    
  • Use np.split to get the list

    np.split(df, np.flatnonzero(df[0] == 'Alert')[1:])
    

show results

print(*dfs, sep='\n\n')

      0     1         2     14
0  Alert  Type  Response  Cost
1     w1    x1        y1    z1
2     w2    x2        y2    z3

        0     1         2     14
144  Alert  Type  Response  Cost
145     a1    b1        c1    d1
146     a2    b2        c2    d2
Sign up to request clarification or add additional context in comments.

Comments

2

@piRSquared answer works great, so let me just explain you error.

This is how you can get the indexes where the first element is Alert:

indexes = list(df.loc[df['0'] == "Alert"].index)

Your error arises from the fact that df.index is a pandas.RangeIndex object, so it cannot be further indexed.

Then you can split your dataframe using a list comprehension like this:

listdf = [df.iloc[i:j] for i, j in zip(indexes, indexes[1:] + [len(df)])]

1 Comment

Sometimes we forget to do this. Very useful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.