1

I would like to calculate the daily sales from average sales using the following function:

def derive_daily_sales(avg_sales_series, period, first_day_sales):
    """
    derive the daily sales from previous_avg_sales start date to current_avg_sales end date
    for detail formula, please refer to README.md

    @avg_sales_series: an array of avg  sales(e.g. 2020-08-04 to 2020-08-06)
    @period: the averaging period in days (e.g. 30 days, 90 days)
    @first_day_sales: the sales at the first day of previous_avg_sales
    """

    x_n1 = avg_sales_series[-1]*period - avg_sales_series[0]*period + first_day_sales

    return x_n1

The avg_sales_series is supposed to be a pandas series.

The dataframe looks like the following:

date, customer_id, avg_30_day_sales
12/08/2020, 1, 30
13/08/2020, 1, 40
14/08/2020, 1, 40
12/08/2020, 2, 20
13/08/2020, 2, 40
14/08/2020, 2, 30

I would like to first groupby customer_id and sort by date. Then, get the rolling window of size 2. And apply the custom function derive_daily_sales assuming that period=30 and first_day_sales equal to the first avg_30_day_sales.

I tried:

df_sales_grouped = df_sales.sort_values('date').groupby(['customer_id','date'])]

df_daily_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].rolling(2).apply(derive_daily_sales, axis=1, period=30, first_day_sales= df_sales['avg_30_day_sales'][0])
0

1 Answer 1

1

You should not group by the date since you want to roll over that column, so the grouping should be:

df_sales_grouped = df_sales.sort_values('date').groupby('customer_id')

Next, what you actually want to do is apply a rolling window on each group in the dataframe. So you need to use apply twice, once on the grouped dataframe and once on each rolling window. This can be done as follows:

rolling_arguments = {'period': 30, 'first_day_sales': df_sales['avg_30_day_sales'][0]}
df_sales['daily_sales'] = df_sales_grouped['avg_30_day_sales'].apply(
    lambda g: g.rolling(2).apply(derive_daily_sales, kwargs=rolling_arguments))

For the given input data, the result is:

      date  customer_id  avg_30_day_sales  daily_sales
12/08/2020            1                30          NaN
13/08/2020            1                40        330.0
14/08/2020            1                40         30.0
12/08/2020            2                20          NaN
13/08/2020            2                40        630.0
14/08/2020            2                30       -270.0
Sign up to request clarification or add additional context in comments.

3 Comments

But I got a keyError:-1. Do you have any idea?
I solved it by using .iloc in the function. The input is not an array but a pandas series.
@JOHN: Correct, doing it this way would give an pandas series as input to derive_daily_sales. If an array is needed, you could use .values on the input (or just handle it as a series as you did, i.e., with iloc).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.