Python: merging dataframe header rows for every header column that includes 'Unnamed'

Ask Question

Asked 4 years ago

Modified 4 years ago

Viewed 226 times

this is my first public question here because I am absolutely lost in this case...

I have an excel file that looks like this:

excel file image

I applied the following commands to transform the excel file into a dataframe:

df = pd.ExcelFile(r'file_path/excel_file.xlsx')
df = df.parse(sheet_name, header = [0, 1, 2])

Now the problem is that the empty header spaces (in the 'Name', 'ID' and 'Test mode' columns) are being filled with 'Unnamed: 0_level_1' and 'Unnamed: 1_level_1' in the first row and 'Unnamed: 0_level_2' and 'Unnamed: 1_level_2' and 'Unnamed: 2_level_2' in the second row.

How can I merge these rows together for every header column so that it just says 'Name', 'ID' and 'Test mode' under the condition that I don't have the opportunity to edit the excel file? So basically I want to merge all header rows where this happens.

this is how I want the dataframe header to look like

Can anyone help me?

Thanks for the help!

edited Nov 9, 2021 at 16:37

mozway

267k13 gold badges56 silver badges106 bronze badges

asked Nov 9, 2021 at 16:35

hackerboi1337

Does this answer your question? Pandas: parse merged header columns from Excel

not_speshal
– not_speshal

2021-11-09 16:40:40 +00:00
Commented Nov 9, 2021 at 16:40
You whant to merge these cells in the read DataFrame?

Glauco
– Glauco

2021-11-09 17:39:11 +00:00
Commented Nov 9, 2021 at 17:39
Not entirely... I have a few cells in the second and third header row that are empty and with the fill method df.iloc[0:2] = df.iloc[0:2].fillna(method='ffill', axis=1) the program fills all these empty spaces with the previous entry, which I don't want. I tried filling the empty spaces with '', inplace = True but that makes the values in the subcolumns inaccessible...

hackerboi1337
– hackerboi1337

2021-11-10 11:38:30 +00:00
Commented Nov 10, 2021 at 11:38
@Glauco if that gives me the solution I want why not! My main problem is just that I want to access the values from specific columns by using the 'right' column names/paths so e.g. ID = df['ID'] or Test_mode = df[('Preconditions', 'Test mode')] or Current = df[('Preconditions', 'Electronic Load', 'Current')] @not_speshal this indexing works when I use the 'ffill' method but like I said it gives me unwanted header descriptions...

hackerboi1337
– hackerboi1337

2021-11-10 11:44:26 +00:00
Commented Nov 10, 2021 at 11:44
Ok now is clear, thank's for explanation, the resulting DataFrame has a Multilevel Column Index pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html. You can remove the index during the parse phase and add right name or starting from your df you could flatten the column index. maybe to_flat_index() useful

Glauco
– Glauco

2021-11-10 13:09:08 +00:00
Commented Nov 10, 2021 at 13:09

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Python: merging dataframe header rows for every header column that includes 'Unnamed'

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked