0

I have a code to transofrm an excel file with data whioch has two headers into a dataframe. Somehow my code is not doing that what i expect:

import pandas as pd

df = pd.read_excel('excel_file.xlsx', sheet_name='Class', header=None, names= header_names, skiprows=2)

somehow I receive a weird output:

NaN NaN NaT        NaN      NaN NaT     
                  3  NaN NaT   
2015-06-30        4  NaN

etc.

I dont know what I am doing wrong.

My excel file is structured as the following but has even more columns.

A                   B                   C   
math                biology             physics 
                            
01.06.2015  3       01.06.2015  3       01.06.2015  3
30.06.2016  4       30.06.2016  4       30.06.2016  4
15.06.2017  1       15.06.2017  3       15.06.2017  3
20.06.2018  3       20.06.2018  2       20.06.2018  5
15.06.2019  2       15.06.2019  3       15.06.2019  2

The expected dataframe output should be like this:

date        grade   course  group
01.06.2015  3       math    A
30.06.2016  4       math    A
15.06.2017  1       math    A
20.06.2018  3       math    A
15.06.2019  2       math    A
01.06.2015  3       biology B
30.06.2016  4       biology B
15.06.2017  3       biology B
20.06.2018  2       biology B
15.06.2019  3       biology B
01.06.2015  3       physics C
30.06.2016  4       physics C
15.06.2017  3       physics C
20.06.2018  5       physics C
15.06.2019  2       physics C

1 Answer 1

1

just by reading this data you won't get the output you showed. Also, you skip 2 rows which have information about the course and the group. What I would do is:

import pandas as pd
df = pd.read_excel('stack.xlsx',header=None, skiprows=3)

df = pd.DataFrame(df)

0   1   2   3   4   5
0   01.06.2015  3   01.06.2015  3   01.06.2015  3

Now, changing names of columns

df.columns=['math A','grade A', 'Bio B','grade B','physic C', 'grade C']


math A  grade A Bio B   grade B physic C    grade C
0   01.06.2015  3   01.06.2015  3   01.06.2015  3

After that, you can iterate through the lines and create a new data frame with the output you want :)

3 loops like this and you will get what you want, just remember that you cannot initialize with i = 0, but i you will start the next loop, is the i you finished the prev loop

df_wanted = pd.DataFrame(columns=['date','grade','course','group'])
i = 0
for index,row in df[['math A','grade A']].iterrows():
    df_wanted.loc[i,'date'] = row[0]
    df_wanted.loc[i,'grade'] = row[1]
    df_wanted.loc[i,'course'] = 'math'
    df_wanted.loc[i,'group'] = 'A'
    i+=1

date    grade   course  group
0   01.06.2015  3   math    A
Sign up to request clarification or add additional context in comments.

2 Comments

Wow so cool! How do I iterate the next step? This is the one which is also not clear for me. Maybe you can help here too and extend the example?
cool it works also if i have n columns i have to make n loops. sad that there is not another way

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.