1

I'm trying to summarize a data set with not only total counts per group, but also counts of subsets. So starting with something like this:

df <- data.frame(
  Group=c('A','A','B','B','B'),
  Size=c('Large','Large','Large','Small','Small')
)

df_summary <- df %>%
  group_by(Group) %>%
  summarize(group_n=n())

I can get a summary of the number of observations for each group:

> df_summary
# A tibble: 2 x 2
  Size  size_n
  <chr>  <int>
1 Large      3
2 Small      2

Is there anyway I can add some sort of subsetting information to n() to get, say, a count of how many observations per group were Large in this example? In other words, ending up with something like:

  Group group_n Large_n
1     A       2       2
2     B       3       1

Thank you!

0

3 Answers 3

2

We could use count: count(xyz) is the same as group_by(xyz) %>% summarise(xyz = n())

library(dplyr)

df %>% 
  count(Group, Size)

  Group  Size n
1     A Large 2
2     B Large 1
3     B Small 2

OR

library(dplyr)
library(tidyr)

df %>% 
  count(Group, Size) %>% 
  pivot_wider(names_from = Size, values_from = n)

  Group Large Small
  <chr> <int> <int>
1 A         2    NA
2 B         1     2
Sign up to request clarification or add additional context in comments.

2 Comments

These definitely do it for this - the only snag is in practice I'll want to also be summarizing numeric values at the same time. Worst case scenario I can do that separately and join the summaries.
This is no problem because count(xyz) is the same as group_by(xyz) %>% summarise(xyz = n())
1

I approach this problem using an ifelse and a sum:

df_summary <- df %>%
  group_by(Group) %>%
  summarize(group_n=n(),
            Large_n = sum(ifelse(Size == "Large", 1, 0)))

The last line turns Size into a binary indicator taking the value 1 if Size == "Large" and 0 otherwise. Summing this indicator is equivalent to counting the number of rows with "Large".

1 Comment

Thanks for this one. I think this will deliver just what I need.
1
 df_summary <- df %>%
    group_by(Group) %>%
    mutate(group_n=n())%>% 
    ungroup() %>% 
    group_by(Group,Size) %>% 
    mutate(Large_n=n()) %>% 
    ungroup() %>% 
    distinct(Group, .keep_all = T)

# A tibble: 2 x 4
  Group Size  group_n Large_n
  <chr> <chr>   <int>   <int>
1 A     Large       2       2
2 B     Large       3       1

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.