0

this is my first public question here because I am absolutely lost in this case...

I have an excel file that looks like this:

excel file image

I applied the following commands to transform the excel file into a dataframe:

df = pd.ExcelFile(r'file_path/excel_file.xlsx')
df = df.parse(sheet_name, header = [0, 1, 2])

Now the problem is that the empty header spaces (in the 'Name', 'ID' and 'Test mode' columns) are being filled with 'Unnamed: 0_level_1' and 'Unnamed: 1_level_1' in the first row and 'Unnamed: 0_level_2' and 'Unnamed: 1_level_2' and 'Unnamed: 2_level_2' in the second row.

How can I merge these rows together for every header column so that it just says 'Name', 'ID' and 'Test mode' under the condition that I don't have the opportunity to edit the excel file? So basically I want to merge all header rows where this happens.

this is how I want the dataframe header to look like

Can anyone help me?

Thanks for the help!

5
  • Does this answer your question? Pandas: parse merged header columns from Excel Commented Nov 9, 2021 at 16:40
  • You whant to merge these cells in the read DataFrame? Commented Nov 9, 2021 at 17:39
  • Not entirely... I have a few cells in the second and third header row that are empty and with the fill method df.iloc[0:2] = df.iloc[0:2].fillna(method='ffill', axis=1) the program fills all these empty spaces with the previous entry, which I don't want. I tried filling the empty spaces with '', inplace = True but that makes the values in the subcolumns inaccessible... Commented Nov 10, 2021 at 11:38
  • @Glauco if that gives me the solution I want why not! My main problem is just that I want to access the values from specific columns by using the 'right' column names/paths so e.g. ID = df['ID'] or Test_mode = df[('Preconditions', 'Test mode')] or Current = df[('Preconditions', 'Electronic Load', 'Current')] @not_speshal this indexing works when I use the 'ffill' method but like I said it gives me unwanted header descriptions... Commented Nov 10, 2021 at 11:44
  • Ok now is clear, thank's for explanation, the resulting DataFrame has a Multilevel Column Index pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html. You can remove the index during the parse phase and add right name or starting from your df you could flatten the column index. maybe to_flat_index() useful Commented Nov 10, 2021 at 13:09

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.