0

I want to create a new dataframe with x amount of years which takes random seasons from previous weather data.

Code to illustrate the problem:

import pandas as pd
import numpy as np

dates = pd.date_range('20070101',periods=3200)
df = pd.DataFrame(data=np.random.randint(0,100,(3200,1)), columns =list('A'))
df['date'] = dates
df = df[['date','A']]

Apply season function to the datetime index

def get_season(row):
    if row['date'].month >= 3 and row['date'].month <= 5:
        return '2'
    elif row['date'].month >= 6 and row['date'].month <= 8:
        return '3'
    elif row['date'].month >= 9 and row['date'].month <= 11:
        return '4'
    else:
        return '1'

Apply the function

df['Season'] = df.apply(get_season, axis=1)

Create a 'Year' column for indexing

df['Year'] = df['date'].dt.year

Multi-index by Year and Season

df = df.set_index(['Year', 'Season'], inplace=False)

Create new dataframes based on season to select from

winters = df.query('Season == "1"')
springs = df.query('Season == "2"')
summers = df.query('Season == "3"')
autumns = df.query('Season == "4"')

I now want to create a new DataFrame which takes a random winter from the wintersdataframe, followed by a random spring from the springs, followed by a random summer from summersand random autumn from autumns and does this for a specified number of years (e.g. 100) but I can't see how to do this.

EDIT:

Duplicate seasons are allowed (it should sample seasons randomly), and the first spring does not have to belong to the same year as the first winter, this doesn't matter.

EDIT 2: Solution using all seasonal dataframes:

years = df['date'].dt.year.unique()
dfs = []
for i in range(outputyears):
    dfs.append(winters.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(springs.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(summers.query("Year == %d"  %np.random.choice(years, 1)))
    dfs.append(autumns.query("Year == %d"  %np.random.choice(years, 1)))

rnd = pd.concat(dfs)
2
  • it's not clear - are duplicates allowed? does first spring should belong to the same year as the first winter? Commented May 2, 2016 at 13:21
  • Apologies - duplicates are allowed (it should sample seasons randomly) , and no - the first spring should not belong to the same year as the first winter, this doesn't matter. Commented May 2, 2016 at 13:36

1 Answer 1

1

It's most probably not the best way to do it, but you can do it this way:

years = df['date'].dt.year.unique()

dfs = []
for i in range(100):
    dfs.append(df.query("Year == %d and Season == '1'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '2'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '3'"  %np.random.choice(years, 1)))
    dfs.append(df.query("Year == %d and Season == '4'"  %np.random.choice(years, 1)))

rnd = pd.concat(dfs)
Sign up to request clarification or add additional context in comments.

3 Comments

This does work (thanks!) for this simplified version of the problem - but it doesn't select seasons from the four separate seasonal dataframes, which is what I want to do...
Ah - I just need to change the df.query to winters.query etc. Thanks!
I've been trying to apply this method to a dataframe which was missing some seasons (which I need to do) and i'm hitting some errors (outlined here: stackoverflow.com/questions/37140439/…) you might be able to help if you have time!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.