Converting multi-index dataframe to Xarray dataset either loses annual sequence or gives an error

Question

Firstly - apologies but I am unable to reproduce this error using code. I will try and describe it as best as possible using screenshots of the data and errors.

I've got a large dataframe indexed by 'Year' and 'Season' with values for latitude, longitude, and Rainfall with some others which looks like this:

This is organised to respect the annual sequence of 'Winter', 'Spring', 'Summer', 'Autumn' (numbers 1:4 in Season column) - and I need to keep this sequence after conversion to an Xarray Dataset too. But if I try and convert straight to Dataset:

future = future.to_xarray()

I get the following error:

So it is clear I need to reindex by unique identifiers, I tried using just lat and lon but this gives the same error (as there are duplicates). Resetting the index then reindexing then using lat, lon and time like so:

future = future.reset_index()
future.head()

future.set_index(['latitude', 'longitude', 'time'], inplace=True)
future.head()

allows for the

future = future.to_xarray()

code to work:

The problem is that this has now lost its annual sequencing, you can see from the Season variable in the dataset that it starts at '1' '1' '1' for the first 3 months of the year but then jumps to '3','3','3' meaning we're going from winter to summer and skipping spring.

This is only the case after re-indexing the dataframe, but I can't convert it to a Dataset without re-indexing, and I can't seem to re-index without disrupting the annual sequence. Is there some way to fix this?

I hope this is clear and the error is illustrated enough for someone to be able to help!

EDIT: I think the issue here is when it indexes by date it automatically orders the dates chronologically (e.g. 1952 follows 1951 etc), but I don't want this, I want it to maintain the sequence in the initial dataframe (which is organised seasonally, but it could have a spring from 1955 followed by a summer from 2000 followed by an autumn from 1976) - I need to retain this sequence.

EDIT 2:

So the dataset looks like this when I set 'Year' as the index, or just keep the index generic but I need the tg variable to have lat/lon associated with it so the dataset looks like this:

<xarray.Dataset>
Dimensions:    (Year: 190080)
Coordinates:
  * Year       (Year) int64 1970 1970 1970 1970 1970 1970 1970 1970 1970 ...
Data variables:
    Season     (Year) object '1' '1' '2' '2' '2' '3' '3' '3' '4' '4' '4' '1' ...
    latitude   (Year) float64 51.12 51.12 51.12 51.12 51.12 51.12 51.12 ...
    longitude  (Year) float64 -10.88 -10.88 -10.88 -10.88 -10.88 -10.88 ...
    seasdif    (Year) float32 -0.79192877 -0.79192877 -0.55932236 ...
    tg         (Year, latitude, longitude) float32 nan nan nan nan nan nan nan nan nan nan nan ...
    time       (Year) datetime64[ns] 1970-01-31 1970-02-28 1970-03-31 ...

I am not familiar with to_xarray, doesn't it sort your data according to your index since it uses it as coordinates? If so, that could explain that the first printed data won't be sorted the way you expect them to be. — ysearka
– ysearka, Commented Aug 20, 2018 at 12:03
Yes, that is correct. But I can't index the data by just latitude and longitude or I hit an error (same as above), so am am not sure how to overcome this. — Pad
– Pad, Commented Aug 20, 2018 at 13:47
Is there a problem in using both temporal and geographical information as coordinates? future.set_index(['Year','season','latitude', 'longitude', 'time']) — ysearka
– ysearka, Commented Aug 20, 2018 at 14:21
What I tried was adding a generic index and converting it directly to xarray. It converted without an error for my minimal example. But I am unable to drop the generic index column from the xarray. This did not change my order. — Interested_Programmer
– Interested_Programmer, Commented Aug 24, 2018 at 14:02

Interested_Programmer · Accepted Answer · 2018-08-28 14:34:22Z

1

Tell me if this works for you. I have added an extra index column and use it to sort in the end.

import pandas as pd
import xarray as xr
import numpy as np

df = pd.DataFrame({'Year':[1951,1951,1951,1951],'Season':[1,1,1,3],'lat': 
[51,51,51,51],'long':[10.8,10.8,10.6,10.6],'time':['1950-12-31','1951-01-31','1951- 
02-28','1950-12-31']})

Made the index as a separate column 'Order' and then used it along with set_index. This is due to the fact that, I could sort through only an index or 1-D column and we had three coordinates.

df.reset_index(level=0, inplace=True)
df = df.rename(columns={'index': 'Order'})
df['time'] = pd.to_datetime(df['time'])
df.set_index(['lat', 'long', 'time','Order'], inplace=True)
df.head()
df = df.to_xarray()

This should preserve the order and have lat,lon,time associated with tg(I dont have it in my df though).

df2 = df
df2.sortby('Order')

You could also drop the 'Order' column, though I am not sure if it will alter your order.(It does not alter mine)

df2.drop('Order')

edited Aug 28, 2018 at 14:34

answered Aug 28, 2018 at 14:10

Interested_Programmer

3221 gold badge5 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pad Over a year ago

Thank you! When I try and convert to xarray I'm hitting a memory error, strange as it worked very quickly before... I will try and make it quicker somehow...

Pad Over a year ago

Sorry- I am still hitting memory errors, I am trying to split it into smaller pieces !

Collectives™ on Stack Overflow

Converting multi-index dataframe to Xarray dataset either loses annual sequence or gives an error

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related