Create combined bar plot of multiple variables using ggplot

Question

I am trying to create a frequency (in % terms) bar plot using the following data:

>fulldata
Type Category
Sal         0
Sal         0
Sal         1
Sal         0
Sal         1
Sal         1
Self        1
Self        0
Self        1
Self        0
Self        0

So, I am trying to create a bar plot (using ggplot) which shows both the % of Sal and Self in the fulldata and % of Sal and Self in the Category==1 side by side (with labels of % values). I tried creating a separate data frame by filtering Category==1 from the fulldata but they are getting overlapping over each other. I tried the following:

> Category1 = fulldata[which(fulldata$Category==1),]

ggplot(fulldata, aes(x=Type,y = (..count..)/sum(..count..)))+
    geom_bar()+
    geom_label(stat = "count", aes(label=round(..count../sum(..count..),3)*100), 
               vjust=1.2,size=3, format_string='{:.1f}%')+
    scale_y_continuous(labels = scales::percent)+
    labs(x = "Type", y="Percentage")+
    geom_bar(data = Category1, position = "dodge", color = "red")

*Original data has around 80000 rows.

dc37 · Accepted Answer · 2020-04-24 16:48:28Z

1

One possible solution is to start by calculating all proportions out of ggplot2.

Here, a fake example:

df <- data.frame(Type = sample(c("Sal","Self"),100, replace = TRUE),
                 Category = sample(c(0,1),100, replace = TRUE))

We can calculate each proportion as follow to obtain the final dataframe:

library(tidyr)
library(dplyr)

df %>% group_by(Category, Type) %>% count() %>% 
  pivot_wider(names_from = Category, values_from = n) %>%
  mutate(Total = `0`+ `1`) %>%
  pivot_longer(-Type, names_to = "Category", values_to = "n") %>%
  group_by(Category) %>%
  mutate(Percent = n / sum(n))

# A tibble: 6 x 4
# Groups:   Category [3]
  Type  Category     n Percent
  <fct> <chr>    <int>   <dbl>
1 Sal   0           27   0.458
2 Sal   1           22   0.537
3 Sal   Total       49   0.49 
4 Self  0           32   0.542
5 Self  1           19   0.463
6 Self  Total       51   0.51

Then, if you had the sequence to ggplot2, you can get the barg raph in one single sequence:

df %>% group_by(Category, Type) %>% count() %>% 
  pivot_wider(names_from = Category, values_from = n) %>%
  mutate(Total = `0`+ `1`) %>%
  pivot_longer(-Type, names_to = "Category", values_to = "n") %>%
  group_by(Category) %>%
  mutate(Percent = n / sum(n)) %>%
  ggplot(aes(x = reorder(Category, desc(Category)), y = Percent, fill = Type))+
  geom_col()+
  geom_text(aes(label = scales::percent(Percent)), position = position_stack(0.5))+
  scale_y_continuous(labels = scales::percent)+
  labs(y = "Percentage", x = "Category")

Does it answer your question ?

edited Apr 24, 2020 at 16:48

answered Apr 24, 2020 at 16:32

dc37

16.3k4 gold badges19 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Looper Over a year ago

Is there any way by which we can do it directly using ggplot rather than creating a new frequency table as I have a lot of variables?

dc37 Over a year ago

It will be really hard to do that in ggplot2 because you have count for category and count for all which is quite tricky. I edited my answer to show you how to do it in a single pipe sequence without to have to calculate for various category. Let me know if it is ok.

Looper Over a year ago

I am getting this error: ``` Error: This tidyselect interface doesn't support predicates yet.```

dc37 Over a year ago

I never saw this error. What version of tidyverse, R, Rstudio are you using ? Did you try on the example that I provided ?

Collectives™ on Stack Overflow

Create combined bar plot of multiple variables using ggplot

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related