Split multiple sheets excel file by one column in Python

Question

For an example excel file students_data.xlsx which have 3 sheets: students_name, students_score and students_age

students_name:

   id class  name
0   1     a  jack
1   2     a  lucy
2   3     b   joe
3   4     b  even
4   5     b    ho

students_score

   id class  score
0   1     a     66
1   2     a     77
2   3     b     87
3   4     b     60
4   5     b     90

students_age

   id class  age
0   1     a   15
1   2     a   14
2   3     b   13
3   4     b   12
4   5     b   14

I have splitted each sheet to different classes using:

import pandas as pd
df = pd.read_excel("students_data.xlsx", sheet_name="students_name")
for i, g in df.groupby("class"):
    g.to_excel("students_name/{}.xlsx".format(i), index=False, index_label=False)
df = pd.read_excel("students_data.xlsx", sheet_name="students_score")
for i, g in df.groupby("class"):
    g.to_excel("students_score/{}.xlsx".format(i), index=False, index_label=False)
df = pd.read_excel("students_data.xlsx", sheet_name="students_age")
for i, g in df.groupby("class"):
    g.to_excel("students_age/{}.xlsx".format(i), index=False, index_label=False)

But I want to split by class with same schema for each excel file, for example, for a.xlsx, it will have 3 same sheets with original file but only data in class equals to a.

The final a.xlxs will have the following sheets:

students_name:

   id class  name
0   1     a  jack
1   2     a  lucy

students_score

   id class  score
0   1     a     66
1   2     a     77

students_age

   id class  age
0   1     a   15
1   2     a   14

The b.xlsx will look like a.xlsx, but only class euquals to b data contains.

How can I split and save excel files correctly? Thank you.

jezrael · Accepted Answer · 2019-09-30 09:24:02Z

3

First create dictionary of all DataFrames by sheet_name=None parameter.

dfs = pd.read_excel('students_data.xlsx', sheet_name=None)

Then get all possible class by extract values of column class, flatten and convert to set.

c = set([y for k, v  in dfs.items() for y in v['class']])
print (c)
{'a', 'b'}

Last loop each value of set, create new file, filter and create all sheetnames of filtered rows:

for i in c:
    with pd.ExcelWriter("students_score/{}.xlsx".format(i)) as writer:
        for k, v in dfs.items():
            v[v['class'] == i].to_excel(writer, index=False, index_label=False, sheet_name=k)

answered Sep 30, 2019 at 9:24

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Split multiple sheets excel file by one column in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related