2

For an example excel file students_data.xlsx which have 3 sheets: students_name, students_score and students_age

students_name:

   id class  name
0   1     a  jack
1   2     a  lucy
2   3     b   joe
3   4     b  even
4   5     b    ho

students_score

   id class  score
0   1     a     66
1   2     a     77
2   3     b     87
3   4     b     60
4   5     b     90

students_age

   id class  age
0   1     a   15
1   2     a   14
2   3     b   13
3   4     b   12
4   5     b   14

I have splitted each sheet to different classes using:

import pandas as pd
df = pd.read_excel("students_data.xlsx", sheet_name="students_name")
for i, g in df.groupby("class"):
    g.to_excel("students_name/{}.xlsx".format(i), index=False, index_label=False)
df = pd.read_excel("students_data.xlsx", sheet_name="students_score")
for i, g in df.groupby("class"):
    g.to_excel("students_score/{}.xlsx".format(i), index=False, index_label=False)
df = pd.read_excel("students_data.xlsx", sheet_name="students_age")
for i, g in df.groupby("class"):
    g.to_excel("students_age/{}.xlsx".format(i), index=False, index_label=False)

But I want to split by class with same schema for each excel file, for example, for a.xlsx, it will have 3 same sheets with original file but only data in class equals to a.

The final a.xlxs will have the following sheets:

students_name:

   id class  name
0   1     a  jack
1   2     a  lucy

students_score

   id class  score
0   1     a     66
1   2     a     77

students_age

   id class  age
0   1     a   15
1   2     a   14

The b.xlsx will look like a.xlsx, but only class euquals to b data contains.

How can I split and save excel files correctly? Thank you.

1 Answer 1

3

First create dictionary of all DataFrames by sheet_name=None parameter.

dfs = pd.read_excel('students_data.xlsx', sheet_name=None)

Then get all possible class by extract values of column class, flatten and convert to set.

c = set([y for k, v  in dfs.items() for y in v['class']])
print (c)
{'a', 'b'}

Last loop each value of set, create new file, filter and create all sheetnames of filtered rows:

for i in c:
    with pd.ExcelWriter("students_score/{}.xlsx".format(i)) as writer:
        for k, v in dfs.items():
            v[v['class'] == i].to_excel(writer, index=False, index_label=False, sheet_name=k)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.