1

I'm trying to create a summary table that gives me the proportion of yes responses for 17 questions sorted by year. I just don't know how to apply the summarize operation to multiple columns easily without hard-coding it.

Unfortunately, I can't use the summarize_at or summarize_all functions because I'm working with a dataframe. I was thinking of writing a function, looping through the columns, and rbinding the summary columns together, but summarize is a bit weird with column names, which can't be of type character. What do you recommend?

Here's what I currently have:

s2 <- db %>%
  group_by(Year)%>%
  summarize(Q1=round(sum(Q1d, na.rm=TRUE)*100/length(which(!is.na(Q1d))),1),
            Q2=round(sum(Q2d, na.rm=TRUE)*100/length(which(!is.na(Q2d))),1),
            Q3=round(sum(Q3d, na.rm=TRUE)*100/length(which(!is.na(Q3d))),1),
            Q4=round(sum(Q4d, na.rm=TRUE)*100/length(which(!is.na(Q4d))),1),
            Q5=round(sum(Q5d, na.rm=TRUE)*100/length(which(!is.na(Q5d))),1),
            Q6=round(sum(Q6d, na.rm=TRUE)*100/length(which(!is.na(Q6d))),1),
            Q7=round(sum(Q7d, na.rm=TRUE)*100/length(which(!is.na(Q7d))),1),
            Q8=round(sum(Q8d, na.rm=TRUE)*100/length(which(!is.na(Q8d))),1),
            Q9=round(sum(Q9d, na.rm=TRUE)*100/length(which(!is.na(Q9d))),1),
            Q10=round(sum(Q10d, na.rm=TRUE)*100/length(which(!is.na(Q10d))),1),
            Q11=round(sum(Q11d, na.rm=TRUE)*100/length(which(!is.na(Q11d))),1),
            Q12=round(sum(Q12d, na.rm=TRUE)*100/length(which(!is.na(Q12d))),1),
            Q13=round(sum(Q13d, na.rm=TRUE)*100/length(which(!is.na(Q13d))),1),
            Q14=round(sum(Q14d, na.rm=TRUE)*100/length(which(!is.na(Q14d))),1),
            Q15=round(sum(Q15d, na.rm=TRUE)*100/length(which(!is.na(Q15d))),1),
            Q16=round(sum(Q16d, na.rm=TRUE)*100/length(which(!is.na(Q16d))),1),
            Q17=round(sum(Q17d, na.rm=TRUE)*100/length(which(!is.na(Q17d))),1),
            )

Note: Q1d, Q2d... are the names of the columns

1 Answer 1

0

We can use across in dplyr

library(dplyr)
library(stringr)
db %>%
    group_by(Year) %>%
    summarise(across(matches('^Q\\d+d$'), ~ 
              sum(., na.rm = TRUE) * 100 /sum(!is.na(.))), 
         .groups = 'drop') %>%
    rename_with(~ str_remove(., 'd$'), -Year)

or using collapse

library(collapse)
f1 <- function(x) sum(x, na.rm = TRUE) * 100/sum(!is.na(x))
collap(db, ~ Year, FUN = f1)
#   Year      Q1d Q2d
#1 2010 250.0000 350
#2 2015 293.3333 320

data

db <- data.frame(Year = c(2010, 2010, 2015, 2015, 2015, 2015),
   Q1d = c(2.5, NA, 3, 3.5, NA, 2.3), Q2d = c(NA, 3.5, NA, 2, 4.6, 3))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.