pandas read dataframe multi-header values

Question

I have this dataframe with multiple headers

name,   00590BL,    01090BL,    01100MS,    02200MS
lat,    613297, 626278, 626323, 616720
long,   5185127,    5188418,    5188431,    5181393
elv,    1833,   1915,   1915,   1499
1956-01-01, 1,  2,  2,  -2
1956-01-02, 2,  3,  3,  -1
1956-01-03, 3,  4,  4,  0
1956-01-04, 4,  5,  5,  1
1956-01-05, 5,  6,  6,  2

I read this as

dfr      =  pd.read_csv(f_name,
                            skiprows     = 0,
                            header       = [0,1,2,3], 
                            index_col    = 0,
                            parse_dates  = True
                            )

I would like to extract the value related the rows named 'lat' and 'long'. A easy way, could be to read the dataframe in two step. In other words, the idea could be have two dataframes. I do not like this because it is not very elegant and it not seems to take advantage of pandas potentiality. I believe that I could use some feature related to multi-index.

what do you think?

dfr.columns.get_level_values('lat') and then the same for long? — ouroboros1
– ouroboros1, Commented Jan 21, 2023 at 23:52
Don't forget to use skipinitialspace=True to get a clean MultiIndex whithout any leading whitespaces. And don't forget to convert lat and long (and elv) to numeric. — Corralien
– Corralien, Commented Jan 22, 2023 at 0:53

Corralien · Accepted Answer · 2023-01-22 01:04:16Z

You can use get_level_values:

dfr = pd.read_csv(f_name, skiprows=0, header=[0, 1, 2, 3], index_col=0, 
                  parse_dates=[0], skipinitialspace=True)

lat = df.columns.get_level_values('lat').astype(int)
long = df.columns.get_level_values('long').astype(int)
elv = df.columns.get_level_values('elv').astype(int)

Output:

>>> lat.to_list()
[613297, 626278, 626323, 616720]

>>> long.to_list()
[5185127, 5188418, 5188431, 5181393]

>>> elv.to_list()
[1833, 1915, 1915, 1499]

If you only need the first row of column header, use droplevel

df = dfr.droplevel(['lat', 'long', 'elv'], axis=1).rename_axis(columns=None))
print(df)

# Output
            00590BL  01090BL  01100MS  02200MS
1956-01-01        1        2        2       -2
1956-01-02        2        3        3       -1
1956-01-03        3        4        4        0
1956-01-04        4        5        5        1
1956-01-05        5        6        6        2

Carlos · Accepted Answer · 2023-01-21 23:45:10Z

0

One way to do this is to use the .loc method to select the rows by their label. For example, you could use the following code to extract the 'lat' values:

lat_values = dfr.loc['lat']

And similarly, you could use the following code to extract the 'long' values:

long_values = dfr.loc['long']

Alternatively, you can use the .xs method to extract the values of the desired level.

lat_values = dfr.xs('lat', level=1, axis=0) long_values = dfr.xs('long', level=1, axis=0)

Both these approach will extract the values for 'lat' and 'long' rows from the dataframe and will allow you to access it as one dataframe with one index.

answered Jan 21, 2023 at 23:45

Carlos

313 bronze badges

2 Comments

diedro Over a year ago

Maybe I am wrong but It seems not working. This is the error "*** KeyError: 'lat'".

Carlos Over a year ago

I apologize for the confusion. If you are getting a "KeyError: 'lat'" it means that the row with the label 'lat' does not exist in the DataFrame. The .loc method is used to select rows by label, so if the label is not present in the DataFrame, an error will be raised. It's possible that the rows you want to extract are not labeled 'lat' and 'long' but rather they are in the subcolumn level. In that case, you can access them using the .loc method on the subcolumn level of the DataFrame.

Collectives™ on Stack Overflow

pandas read dataframe multi-header values

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related