1

I have this dataframe with multiple headers

name,   00590BL,    01090BL,    01100MS,    02200MS
lat,    613297, 626278, 626323, 616720
long,   5185127,    5188418,    5188431,    5181393
elv,    1833,   1915,   1915,   1499
1956-01-01, 1,  2,  2,  -2
1956-01-02, 2,  3,  3,  -1
1956-01-03, 3,  4,  4,  0
1956-01-04, 4,  5,  5,  1
1956-01-05, 5,  6,  6,  2

I read this as

dfr      =  pd.read_csv(f_name,
                            skiprows     = 0,
                            header       = [0,1,2,3], 
                            index_col    = 0,
                            parse_dates  = True
                            )

I would like to extract the value related the rows named 'lat' and 'long'. A easy way, could be to read the dataframe in two step. In other words, the idea could be have two dataframes. I do not like this because it is not very elegant and it not seems to take advantage of pandas potentiality. I believe that I could use some feature related to multi-index.

what do you think?

2
  • dfr.columns.get_level_values('lat') and then the same for long? Commented Jan 21, 2023 at 23:52
  • Don't forget to use skipinitialspace=True to get a clean MultiIndex whithout any leading whitespaces. And don't forget to convert lat and long (and elv) to numeric. Commented Jan 22, 2023 at 0:53

2 Answers 2

1

You can use get_level_values:

dfr = pd.read_csv(f_name, skiprows=0, header=[0, 1, 2, 3], index_col=0, 
                  parse_dates=[0], skipinitialspace=True)

lat = df.columns.get_level_values('lat').astype(int)
long = df.columns.get_level_values('long').astype(int)
elv = df.columns.get_level_values('elv').astype(int)

Output:

>>> lat.to_list()
[613297, 626278, 626323, 616720]

>>> long.to_list()
[5185127, 5188418, 5188431, 5181393]

>>> elv.to_list()
[1833, 1915, 1915, 1499]

If you only need the first row of column header, use droplevel

df = dfr.droplevel(['lat', 'long', 'elv'], axis=1).rename_axis(columns=None))
print(df)

# Output
            00590BL  01090BL  01100MS  02200MS
1956-01-01        1        2        2       -2
1956-01-02        2        3        3       -1
1956-01-03        3        4        4        0
1956-01-04        4        5        5        1
1956-01-05        5        6        6        2
Sign up to request clarification or add additional context in comments.

Comments

0

One way to do this is to use the .loc method to select the rows by their label. For example, you could use the following code to extract the 'lat' values:

lat_values = dfr.loc['lat']

And similarly, you could use the following code to extract the 'long' values:

long_values = dfr.loc['long']

Alternatively, you can use the .xs method to extract the values of the desired level.

lat_values = dfr.xs('lat', level=1, axis=0) long_values = dfr.xs('long', level=1, axis=0)

Both these approach will extract the values for 'lat' and 'long' rows from the dataframe and will allow you to access it as one dataframe with one index.

2 Comments

Maybe I am wrong but It seems not working. This is the error "*** KeyError: 'lat'".
I apologize for the confusion. If you are getting a "KeyError: 'lat'" it means that the row with the label 'lat' does not exist in the DataFrame. The .loc method is used to select rows by label, so if the label is not present in the DataFrame, an error will be raised. It's possible that the rows you want to extract are not labeled 'lat' and 'long' but rather they are in the subcolumn level. In that case, you can access them using the .loc method on the subcolumn level of the DataFrame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.