Sales data aggregation in R

Question

I have daily sales data for multiple products in three stores. It looks something like this:

item_id	store_id	category_id	dept_id	date	event_name	daiy_price
a	tx_1	food	1	2012/12/24		6
a	tx_1	food	1	2012/12/25	christmas	6
a	tx_1	food	1	2012/12/26		6
a	tx_1	food	1	2012/12/27		7
a	tx_1	food	1	2012/12/28		7
b	tx_1	food	1	2012/12/24		6
b	tx_1	food	1	2012/12/25	christmas	6
b	tx_1	food	1	2012/12/26		6
b	tx_1	food	1	2012/12/27		7
b	tx_1	food	1	2012/12/28		7

I want to round each week to the monday date using floor_date from lubridate package in R, then use group_by and summarise to compute weekly price average. However, i want each week to retain event_name if there is any during a particular week. Including event_name in group_by is not helpful since a day in a week may have an event but other days may not and they get grouped separately. Say each week starts on a monday and 2012/12/24 is a monday, how do i achieve this

item_id	store_id	category_id	dept_id	date	event_name	weekly_avg_price
a	tx_1	food	1	2012/12/24	christmas	6.4
b	tx_1	food	1	2012/12/24	christmas	6.4

Edward · Accepted Answer · 2024-12-18 07:24:30Z

You can achieve your desired output for this toy example by filling in the empty event names and then summarising the weekly prices by event_name and taking the first of the other columns.

library(tidyverse)
library(lubridate)

mutate(df, week_start=floor_date(date, unit = "week", week_start = 1)) |> 
  group_by(item_id, store_id, category_id, dept_id) |>
  fill(event_name, .direction="downup") |>
  ungroup() |>
  summarise(store_id=first(store_id),
            category_id=first(category_id),
            event_name=first(event_name), 
            dept_id=first(dept_id),
            weekly_avg_price=mean(daiy_price, na.rm=TRUE), .by=c(item_id, week_start))

Gives:

# A tibble: 2 × 7
  item_id week_start store_id category_id event_name dept_id weekly_avg_price
  <chr>   <date>     <chr>    <chr>       <chr>        <dbl>            <dbl>
1 a       2012-12-24 tx_1     food        christmas        1              6.4
2 b       2012-12-24 tx_1     food        christmas        1              6.4

Data:

df <- structure(list(item_id = c("a", "a", "a", "a", "a", "b", "b", 
"b", "b", "b"), store_id = c("tx_1", "tx_1", "tx_1", "tx_1", 
"tx_1", "tx_1", "tx_1", "tx_1", "tx_1", "tx_1"), category_id = c("food", 
"food", "food", "food", "food", "food", "food", "food", "food", 
"food"), dept_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), date = structure(c(15698, 
15699, 15700, 15701, 15702, 15698, 15699, 15700, 15701, 15702
), class = "Date"), event_name = c(NA, "christmas", NA, NA, NA, 
NA, "christmas", NA, NA, NA), daiy_price = c(6, 6, 6, 7, 7, 6, 
6, 6, 7, 7)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))

Collectives™ on Stack Overflow

Sales data aggregation in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related