How to split dataframe into multiple dataframes based on header rows

Question

I need to split a dataframe into 3 unique dataframes based on a header-row reoccuring in the dataframe.

My dataframe looks like:

        0         1             2     ....   14
0   Alert     Type      Response           Cost
1     w1        x1            y1            z1
2     w2        x2            y2            z3
.      .         .             .             .
.      .         .             .             .
144 Alert     Type      Response           Cost
145   a1        b1            c1             d1
146   a2        b2            c2             d2

I was trying to get the index numbers containing the word "Alert" with loc to slice the dataframe into the sub dataframes.

indexes = df.index[df.loc[df[0] == "Alert"]].tolist()

But this returns:

IndexError: arrays used as indices must be of integer (or boolean) type

Any hint on that error or is there even a way I don't see (e.g. smth like group by?)

Thanks for your help.

piRSquared · Accepted Answer · 2019-06-06 16:09:38Z

3

`np.split`

dfs = np.split(df, np.flatnonzero(df[0] == 'Alert')[1:])

Explanation

Find where df[0] is equal to 'Alert'
```
np.flatnonzero(df[0] == 'Alert')
```
Ignore the first one because we don't need an empty list element
```
np.flatnonzero(df[0] == 'Alert')[1:]
```

Use np.split to get the list

np.split(df, np.flatnonzero(df[0] == 'Alert')[1:])

show results

print(*dfs, sep='\n\n')

      0     1         2     14
0  Alert  Type  Response  Cost
1     w1    x1        y1    z1
2     w2    x2        y2    z3

        0     1         2     14
144  Alert  Type  Response  Cost
145     a1    b1        c1    d1
146     a2    b2        c2    d2

edited Jun 6, 2019 at 16:09

answered Jun 6, 2019 at 16:07

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Valentino · Accepted Answer · 2019-06-06 16:26:42Z

2

@piRSquared answer works great, so let me just explain you error.

This is how you can get the indexes where the first element is Alert:

indexes = list(df.loc[df['0'] == "Alert"].index)

Your error arises from the fact that df.index is a pandas.RangeIndex object, so it cannot be further indexed.

Then you can split your dataframe using a list comprehension like this:

listdf = [df.iloc[i:j] for i, j in zip(indexes, indexes[1:] + [len(df)])]

edited Jun 6, 2019 at 16:26

answered Jun 6, 2019 at 16:25

Valentino

7,3716 gold badges22 silver badges36 bronze badges

1 Comment

piRSquared Over a year ago

Sometimes we forget to do this. Very useful.

Collectives™ on Stack Overflow

How to split dataframe into multiple dataframes based on header rows

2 Answers 2

`np.split`

Explanation

show results

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

np.split

Explanation

show results

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related

`np.split`