Mark Empty values in Pandas DataFrame Multi-Row Header

Question

I have a CSV file called mrh.csv which has first two rows representing the header:

Name,Height,Age
"",Metres,""
A,-1,25
B,95,-1

I am using the following code to read it into DataFrame:

import pandas as pd
pd.read_csv('mrh.csv', header=[0,1], na_values=[-1,''])

This results in a Data Frame with the following contents:

    Name                Height  Age
    Unnamed: 0_level_1  Metres  Unnamed: 2_level_1

0   A                   NaN     25.0
1   B                   95.0    NaN

Using the na_values parameter of read_csv I can mark the missing values marked as -1 in the file, but the missing header row values, when marked as "" (I also tried -1) are displayed as Unnamed: x_level_y (or -1 if it is used instead).

Is there a way to not display the missing values - to remove the Unnamed: x_level_y or substitute it with a meaningful value?

Desired output 1:

    Name  Height  Age
          Metres    

0   A     NaN     25.0
1   B     95.0    NaN

Desired output 2:

    Name  Height  Age
    -     Metres  - 

0   A     NaN     25.0
1   B     95.0    NaN

What do you mean by a meaningful value, can you show the output you desire to get? — Bharath M Shetty
– Bharath M Shetty, Commented Jan 2, 2018 at 11:25

jezrael · Accepted Answer · 2018-01-02 13:47:24Z

1

You can create new MultiIndex and assign to columns:

df = pd.read_csv('mrh.csv', header=[0,1], na_values=[-1,''])

a = df.columns.get_level_values(level=0)
b = df.columns.get_level_values(level=1).str.replace('Un.*','')
df.columns = [a, b]
print (df)
  Name Height   Age
       Metres      
0    A    NaN  25.0
1    B   95.0   NaN

Or:

a = df.columns.get_level_values(level=0)
b = df.columns.get_level_values(level=1).str.replace('Un.*','-')
df.columns = [a, b]
print (df)
  Name Height   Age
     - Metres     -
0    A    NaN  25.0
1    B   95.0   NaN

answered Jan 2, 2018 at 13:47

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Bharath M Shetty Over a year ago

This is almost same as mine

jezrael Over a year ago

Hmmm, are you angry? Because I think not, but I promise you something, so if want you can add this solution to youar answer and I remove this.

Bharath M Shetty Over a year ago

Its ohk let it stay. Mine still pointing to the bug that need to be fixed.

Bharath M Shetty · Accepted Answer · 2018-01-02 14:28:40Z

1

I dont think its possible using read_csv, you can modify the index after loading that is :

from io import StringIO

txt = '''Name,Height,Age
"",Metres,""
A,-1,25
B,95,-1'''

df = pd.read_csv(StringIO(txt),header=[0,1],na_values=['-1',''])

df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1)
df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1)

Output:

   Name Height   Age
        Metres      
0    A    NaN  25.0
1    B   95.0   NaN

To know assigning df.columns twice you can check here. Its still mysterious

Edit, set_levels is still buggy you can use :

df.columns = df.columns.set_levels(df.columns.levels[1].str.replace('Un.*', ''), level=1)

edited Jan 2, 2018 at 14:28

answered Jan 2, 2018 at 12:35

Bharath M Shetty

30.6k6 gold badges65 silver badges111 bronze badges

7 Comments

jezrael Over a year ago

It looks like bug, last row should be df.columns = df.columns.set_levels(df.columns.get_level_values(level=1),level=1)

Bharath M Shetty Over a year ago

@jezrael You can check the link I posted a question, let me wait till the bug is fixed. Im waiting for an answer to my question

jezrael Over a year ago

I really like answer, but no idea how ;)

jezrael Over a year ago

But I think if your solution is buggy, better dont use it ;)

Bharath M Shetty Over a year ago

@jezrael how about we fix it. It still is a good function, just need a bit updation of bug.

|

Abhay Sharma · Accepted Answer · 2018-01-02 12:03:02Z

0

import pandas as pd
pd.read_csv("mrh.csv").fillna("-").to_csv("mrh.csv",index=None)
df1 = pd.read_csv("mrh.csv",header=[0,1],na_values=[-1,''])
df1

output:

   Name Height  Age
   -    Metres  -
0   A   NaN    25.0
1   B   95     NaN

answered Jan 2, 2018 at 12:03

Abhay Sharma

1

2 Comments

Krzysztof Słowiński Over a year ago

If possible I would like to avoid modifying the original file.

Rahul Gupta Over a year ago

While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion

Collectives™ on Stack Overflow

Mark Empty values in Pandas DataFrame Multi-Row Header

3 Answers 3

3 Comments

7 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

7 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related