What I'm trying to do
I'm attempting to write a function that uses dplyr verbs and that takes an "arrow open dataset" as the first argument, and a column in that dataset as the second argument. Since I would like to pass the column as a string (necessary for the context of my actual task I'm working on, i.e. Shiny), I'm using the syntax .data[[.column]]. Below is an image of the error I'm getting and some code to reproduce said error. Any help or insight is appreciated.
Image of error message
Code to reproduce error
# install.packages(c("dplyr", "ggplot2", "arrow"))
library(dplyr)
arrow::write_parquet(x = ggplot2::mpg, sink = "sample_data.parquet")
dat <- arrow::open_dataset("sample_data.parquet")
glimpse(dat)
get_metric <- function(.data, .metric) {
.data %>%
group_by(manufacturer, cyl) %>%
summarize(
new_col = sum(.data[[.metric]], na.rm = T)
) %>%
ungroup()
}
get_metric(dat, "cty") %>% collect()
Additional code that works but doesn't use arrow as much so not ideal for speed
In this code I collect before the tidy eval stuff so its just essentially regular dplyr code. It runs, but is a slower than code that I've successfully gotten to run before extracting stuff into said function.
get_metric2 <- function(.data, .metric) {
.data %>%
collect() %>%
group_by(manufacturer, cyl) %>%
summarize(
new_col = sum(.data[[.metric]], na.rm = T)
) %>%
ungroup()
}
get_metric2(dat, "cty")

manufacturer,cylin your function?