1

I have daily sales data for multiple products in three stores. It looks something like this:

item_id store_id category_id dept_id date event_name daiy_price
a tx_1 food 1 2012/12/24 6
a tx_1 food 1 2012/12/25 christmas 6
a tx_1 food 1 2012/12/26 6
a tx_1 food 1 2012/12/27 7
a tx_1 food 1 2012/12/28 7
b tx_1 food 1 2012/12/24 6
b tx_1 food 1 2012/12/25 christmas 6
b tx_1 food 1 2012/12/26 6
b tx_1 food 1 2012/12/27 7
b tx_1 food 1 2012/12/28 7

I want to round each week to the monday date using floor_date from lubridate package in R, then use group_by and summarise to compute weekly price average. However, i want each week to retain event_name if there is any during a particular week. Including event_name in group_by is not helpful since a day in a week may have an event but other days may not and they get grouped separately. Say each week starts on a monday and 2012/12/24 is a monday, how do i achieve this

item_id store_id category_id dept_id date event_name weekly_avg_price
a tx_1 food 1 2012/12/24 christmas 6.4
b tx_1 food 1 2012/12/24 christmas 6.4

1 Answer 1

1

You can achieve your desired output for this toy example by filling in the empty event names and then summarising the weekly prices by event_name and taking the first of the other columns.

library(tidyverse)
library(lubridate)

mutate(df, week_start=floor_date(date, unit = "week", week_start = 1)) |> 
  group_by(item_id, store_id, category_id, dept_id) |>
  fill(event_name, .direction="downup") |>
  ungroup() |>
  summarise(store_id=first(store_id),
            category_id=first(category_id),
            event_name=first(event_name), 
            dept_id=first(dept_id),
            weekly_avg_price=mean(daiy_price, na.rm=TRUE), .by=c(item_id, week_start))

Gives:

# A tibble: 2 × 7
  item_id week_start store_id category_id event_name dept_id weekly_avg_price
  <chr>   <date>     <chr>    <chr>       <chr>        <dbl>            <dbl>
1 a       2012-12-24 tx_1     food        christmas        1              6.4
2 b       2012-12-24 tx_1     food        christmas        1              6.4

Data:

df <- structure(list(item_id = c("a", "a", "a", "a", "a", "b", "b", 
"b", "b", "b"), store_id = c("tx_1", "tx_1", "tx_1", "tx_1", 
"tx_1", "tx_1", "tx_1", "tx_1", "tx_1", "tx_1"), category_id = c("food", 
"food", "food", "food", "food", "food", "food", "food", "food", 
"food"), dept_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), date = structure(c(15698, 
15699, 15700, 15701, 15702, 15698, 15699, 15700, 15701, 15702
), class = "Date"), event_name = c(NA, "christmas", NA, NA, NA, 
NA, "christmas", NA, NA, NA), daiy_price = c(6, 6, 6, 7, 7, 6, 
6, 6, 7, 7)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
"data.frame"))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.