4

Firstly - apologies but I am unable to reproduce this error using code. I will try and describe it as best as possible using screenshots of the data and errors.

I've got a large dataframe indexed by 'Year' and 'Season' with values for latitude, longitude, and Rainfall with some others which looks like this: enter image description here

This is organised to respect the annual sequence of 'Winter', 'Spring', 'Summer', 'Autumn' (numbers 1:4 in Season column) - and I need to keep this sequence after conversion to an Xarray Dataset too. But if I try and convert straight to Dataset:

future = future.to_xarray()

I get the following error: enter image description here

So it is clear I need to reindex by unique identifiers, I tried using just lat and lon but this gives the same error (as there are duplicates). Resetting the index then reindexing then using lat, lon and time like so:

future = future.reset_index()
future.head()

enter image description here

future.set_index(['latitude', 'longitude', 'time'], inplace=True)
future.head()

enter image description here

allows for the

future = future.to_xarray()

code to work:

enter image description here

The problem is that this has now lost its annual sequencing, you can see from the Season variable in the dataset that it starts at '1' '1' '1' for the first 3 months of the year but then jumps to '3','3','3' meaning we're going from winter to summer and skipping spring.

This is only the case after re-indexing the dataframe, but I can't convert it to a Dataset without re-indexing, and I can't seem to re-index without disrupting the annual sequence. Is there some way to fix this?

I hope this is clear and the error is illustrated enough for someone to be able to help!

EDIT: I think the issue here is when it indexes by date it automatically orders the dates chronologically (e.g. 1952 follows 1951 etc), but I don't want this, I want it to maintain the sequence in the initial dataframe (which is organised seasonally, but it could have a spring from 1955 followed by a summer from 2000 followed by an autumn from 1976) - I need to retain this sequence.

EDIT 2:

So the dataset looks like this when I set 'Year' as the index, or just keep the index generic enter image description here but I need the tg variable to have lat/lon associated with it so the dataset looks like this:

<xarray.Dataset>
Dimensions:    (Year: 190080)
Coordinates:
  * Year       (Year) int64 1970 1970 1970 1970 1970 1970 1970 1970 1970 ...
Data variables:
    Season     (Year) object '1' '1' '2' '2' '2' '3' '3' '3' '4' '4' '4' '1' ...
    latitude   (Year) float64 51.12 51.12 51.12 51.12 51.12 51.12 51.12 ...
    longitude  (Year) float64 -10.88 -10.88 -10.88 -10.88 -10.88 -10.88 ...
    seasdif    (Year) float32 -0.79192877 -0.79192877 -0.55932236 ...
    tg         (Year, latitude, longitude) float32 nan nan nan nan nan nan nan nan nan nan nan ...
    time       (Year) datetime64[ns] 1970-01-31 1970-02-28 1970-03-31 ...
22
  • I am not familiar with to_xarray, doesn't it sort your data according to your index since it uses it as coordinates? If so, that could explain that the first printed data won't be sorted the way you expect them to be. Commented Aug 20, 2018 at 12:03
  • Yes, that is correct. But I can't index the data by just latitude and longitude or I hit an error (same as above), so am am not sure how to overcome this. Commented Aug 20, 2018 at 13:47
  • Is there a problem in using both temporal and geographical information as coordinates? future.set_index(['Year','season','latitude', 'longitude', 'time']) Commented Aug 20, 2018 at 14:21
  • 1
    What I tried was adding a generic index and converting it directly to xarray. It converted without an error for my minimal example. But I am unable to drop the generic index column from the xarray. This did not change my order. Commented Aug 24, 2018 at 14:02
  • 1
    After creating xarray use df.set_coords(['Year','Season']) Commented Aug 24, 2018 at 18:10

1 Answer 1

1

Tell me if this works for you. I have added an extra index column and use it to sort in the end.

import pandas as pd
import xarray as xr
import numpy as np

df = pd.DataFrame({'Year':[1951,1951,1951,1951],'Season':[1,1,1,3],'lat': 
[51,51,51,51],'long':[10.8,10.8,10.6,10.6],'time':['1950-12-31','1951-01-31','1951- 
02-28','1950-12-31']})

Made the index as a separate column 'Order' and then used it along with set_index. This is due to the fact that, I could sort through only an index or 1-D column and we had three coordinates.

df.reset_index(level=0, inplace=True)
df = df.rename(columns={'index': 'Order'})
df['time'] = pd.to_datetime(df['time'])
df.set_index(['lat', 'long', 'time','Order'], inplace=True)
df.head()
df = df.to_xarray()

This should preserve the order and have lat,lon,time associated with tg(I dont have it in my df though).

df2 = df
df2.sortby('Order')

You could also drop the 'Order' column, though I am not sure if it will alter your order.(It does not alter mine)

df2.drop('Order')

df

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! When I try and convert to xarray I'm hitting a memory error, strange as it worked very quickly before... I will try and make it quicker somehow...
Sorry- I am still hitting memory errors, I am trying to split it into smaller pieces !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.