2

So I have a an array of timeseries` that are generated based on a fund_id:

def get_adj_nav(self, fund_id):
    df_nav = read_frame(
        super(__class__, self).filter(fund__id=fund_id, nav__gt=0).exclude(fund__account_class=0).order_by(
            'valuation_period_end_date'), coerce_float=True,
        fieldnames=['income_payable', 'valuation_period_end_date', 'nav', 'outstanding_shares_par'],
        index_col='valuation_period_end_date')
    df_dvd, skip = self.get_dvd(fund_id=fund_id)
    df_nav_adj = calculate_adjusted_prices(
        df_nav.join(df_dvd).fillna(0).rename_axis({'payout_per_share': 'dividend'}, axis=1), column='nav')
return df_nav_adj

def json_total_return_table(request, fund_account_id):
ts_list = []
for fund_id in Fund.objects.get_fund_series(fund_account_id=fund_account_id):
    if NAV.objects.filter(fund__id=fund_id, income_payable__lt=0).exists():
        ts = NAV.objects.get_adj_nav(fund_id)['adj_nav']
        ts.name = Fund.objects.get(id=fund_id).account_class_description
        ts_list.append(ts.copy())
        print(ts)
    df_adj_nav = pd.concat(ts_list, axis=1) # ====> Throws error
    cols_to_datetime(df_adj_nav, 'index')
    df_adj_nav = ffn.core.calc_stats(df_adj_nav.dropna()).to_csv(sep=',')

So an example of how the time series look like is this:

valuation_period_end_date
2013-09-03    17.234000
2013-09-04    17.277000
2013-09-05    17.363000
2013-09-06    17.326900
2013-09-09    17.400800
2013-09-10    17.473000
2013-09-11    17.486800
2013-09-12    17.371600
....
Name: CLASS I, Length: 984, dtype: float64

Another timeseries:

valuation_period_end_date
2013-09-03    17.564700
2013-09-04    17.608500
2013-09-05    17.696100
2013-09-06    17.659300
2013-09-09    17.734700
2013-09-10    17.808300
2013-09-11    17.823100
2013-09-12    17.704900
....
Name: CLASS F, Length: 984, dtype: float64

For each timeseries the Lengths are different and I am wondering if that is the reason for the error I am getting: cannot reindex from a duplicate axis. I am new to pandas so I was wondering if you guys have any advice.

Thanks

EDIT: Also the indexes aren't supposed to be unique.

1 Answer 1

1

Perhaps something like this would work. I've added the fund_id to the dataframe and reindexed it to the valuation_period_end_date and fund_id.

# Only fourth line above error.
ts = (
    NAV.objects.get_adj_nav(fund_id['adj_nav']
    .to_frame()
    .assign(fund_id=fund)
    .reset_index()
    .set_index(['valuation_period_end_date', 'fund_id']))

And then stack with axis=0, group on the date and fund_id (assuming there is only one unique value per date and fund_id, you can take the first value), then unstack fund_id to pivot it as columns:

df_adj_nav = (
    pd.concat(ts_list, axis=0)
    .groupby(['valuation_period_end_date', 'fund_id'])
    .first()
    .to_frame()
    .unstack('fund_id'))
Sign up to request clarification or add additional context in comments.

8 Comments

Get an error: 'Series' object has no attribute 'assign'
see edit above. Use to_frame() to convert to dataframe from series.
This seems to have worked but now I just want to change the name of the index. Currently the we have this fund_id 80 81 82 83 but I want CLASS A CLASS C CLASS F SERIES 1. Basically I tried something like this: ts.name = Fund.objects.get(id=fund_id).account_class_description but it didn't work.
The reason why I want to do this is because I want to convert the table to a csv format and I get this error: TypeError: sequence item 1: expected str instance, tuple found. Which I assume is because of the (fund_id 80 81 82 83)
You can change the name when you do the to_frame command. to_frame("new name") should work.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.