2

I have a dataframe and these are the first 5 index, there are several rows with different datapoint for a date and then it goes to the next day

DatetimeIndex(['2014-01-01', '2014-01-01', '2014-01-01', '2014-01-01',
               '2014-01-01'],
              dtype='datetime64[ns]', name='DayStartedOn', freq=None)

and this is the current column dtypes

country                  object
type                     object
name                     object
injection               float64
withdrawal              float64
cy_month              period[M]

I wish to add a column with calendar year month, and 2 columns with different fiscal years and months. better to separate year and month in different columns like: calendar year, calendar month, fiscal year, fiscal month. The objective is to keep these column values when I perform regroup or resample with other columns

I achieved above cy_month by

df['cy_month']=df.index.to_period('M')

even I don't feel very comfortable with this, as I want the period, not the monthend

I tried to add these 2 columns for calendar year:

pd.Period(df_storage_clean.index.year, freq='A-DEC')  

for another fiscal year:

pd.Period(df_storage_clean.index.year, freq='A-SEP') 

but had Traceback:

ValueError: Value must be Period, string, integer, or datetime

So I started to NOT using pandas by loop row by row and add to a list,

lst_period_cy=[]
for y in lst_cy:
    period_cy=pd.Period(y, freq='A-DEC')
    lst_period_cy.append(period_cy)

then convert the list to a Series or df and add it back to the df

but I suppose it's not efficient (150k rows data) so haven't continued

1 Answer 1

1

Just in case you haven't found a solution yet ...

You could do the following:

df.reset_index(drop=False, inplace=True)
df['cal_year_month'] = df.DayStartedOn.dt.month
df['cal_year'] = df.DayStartedOn.dt.year
df['fisc_year'] = df.DayStartedOn.apply(pd.Period, freq='A-SEP')
df.set_index('DayStartedOn', drop=True, inplace=True)

My assumption is that, as in your example, the index is named DayStartedOn. If that's not the case then the code has to be adjusted accordingly.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot, df['fisc_year'] = df.DayStartedOn.apply(pd.Period, freq='A-SEP') is definitely an interesting way.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.