Creating counts of subset with dplyr

Question

I'm trying to summarize a data set with not only total counts per group, but also counts of subsets. So starting with something like this:

df <- data.frame(
  Group=c('A','A','B','B','B'),
  Size=c('Large','Large','Large','Small','Small')
)

df_summary <- df %>%
  group_by(Group) %>%
  summarize(group_n=n())

I can get a summary of the number of observations for each group:

> df_summary
# A tibble: 2 x 2
  Size  size_n
  <chr>  <int>
1 Large      3
2 Small      2

Is there anyway I can add some sort of subsetting information to n() to get, say, a count of how many observations per group were Large in this example? In other words, ending up with something like:

  Group group_n Large_n
1     A       2       2
2     B       3       1

Thank you!

TarJae · Accepted Answer · 2022-06-07 18:18:57Z

2

We could use count: count(xyz) is the same as group_by(xyz) %>% summarise(xyz = n())

library(dplyr)

df %>% 
  count(Group, Size)

  Group  Size n
1     A Large 2
2     B Large 1
3     B Small 2

OR

library(dplyr)
library(tidyr)

df %>% 
  count(Group, Size) %>% 
  pivot_wider(names_from = Size, values_from = n)

  Group Large Small
  <chr> <int> <int>
1 A         2    NA
2 B         1     2

edited Jun 7, 2022 at 18:18

answered Jun 7, 2022 at 18:10

TarJae

80.2k6 gold badges30 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Michael Clauss Over a year ago

These definitely do it for this - the only snag is in practice I'll want to also be summarizing numeric values at the same time. Worst case scenario I can do that separately and join the summaries.

TarJae Over a year ago

This is no problem because count(xyz) is the same as group_by(xyz) %>% summarise(xyz = n())

Simon.S.A. · Accepted Answer · 2022-06-07 21:04:04Z

1

I approach this problem using an ifelse and a sum:

df_summary <- df %>%
  group_by(Group) %>%
  summarize(group_n=n(),
            Large_n = sum(ifelse(Size == "Large", 1, 0)))

The last line turns Size into a binary indicator taking the value 1 if Size == "Large" and 0 otherwise. Summing this indicator is equivalent to counting the number of rows with "Large".

answered Jun 7, 2022 at 21:04

Simon.S.A.

7,0127 gold badges27 silver badges46 bronze badges

1 Comment

Michael Clauss Over a year ago

Thanks for this one. I think this will deliver just what I need.

Gustavo P · Accepted Answer · 2022-06-07 18:23:53Z

1

 df_summary <- df %>%
    group_by(Group) %>%
    mutate(group_n=n())%>% 
    ungroup() %>% 
    group_by(Group,Size) %>% 
    mutate(Large_n=n()) %>% 
    ungroup() %>% 
    distinct(Group, .keep_all = T)

# A tibble: 2 x 4
  Group Size  group_n Large_n
  <chr> <chr>   <int>   <int>
1 A     Large       2       2
2 B     Large       3       1

edited Jun 7, 2022 at 18:23

answered Jun 7, 2022 at 18:22

Gustavo P

462 bronze badges

1 Comment

Community Over a year ago

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

Creating counts of subset with dplyr

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related