1

I am trying to create a frequency (in % terms) bar plot using the following data:

>fulldata
Type Category
Sal         0
Sal         0
Sal         1
Sal         0
Sal         1
Sal         1
Self        1
Self        0
Self        1
Self        0
Self        0

So, I am trying to create a bar plot (using ggplot) which shows both the % of Sal and Self in the fulldata and % of Sal and Self in the Category==1 side by side (with labels of % values). I tried creating a separate data frame by filtering Category==1 from the fulldata but they are getting overlapping over each other. I tried the following:

> Category1 = fulldata[which(fulldata$Category==1),]

ggplot(fulldata, aes(x=Type,y = (..count..)/sum(..count..)))+
    geom_bar()+
    geom_label(stat = "count", aes(label=round(..count../sum(..count..),3)*100), 
               vjust=1.2,size=3, format_string='{:.1f}%')+
    scale_y_continuous(labels = scales::percent)+
    labs(x = "Type", y="Percentage")+
    geom_bar(data = Category1, position = "dodge", color = "red")

*Original data has around 80000 rows.

1 Answer 1

1

One possible solution is to start by calculating all proportions out of ggplot2.

Here, a fake example:

df <- data.frame(Type = sample(c("Sal","Self"),100, replace = TRUE),
                 Category = sample(c(0,1),100, replace = TRUE))

We can calculate each proportion as follow to obtain the final dataframe:

library(tidyr)
library(dplyr)

df %>% group_by(Category, Type) %>% count() %>% 
  pivot_wider(names_from = Category, values_from = n) %>%
  mutate(Total = `0`+ `1`) %>%
  pivot_longer(-Type, names_to = "Category", values_to = "n") %>%
  group_by(Category) %>%
  mutate(Percent = n / sum(n))

# A tibble: 6 x 4
# Groups:   Category [3]
  Type  Category     n Percent
  <fct> <chr>    <int>   <dbl>
1 Sal   0           27   0.458
2 Sal   1           22   0.537
3 Sal   Total       49   0.49 
4 Self  0           32   0.542
5 Self  1           19   0.463
6 Self  Total       51   0.51 

Then, if you had the sequence to ggplot2, you can get the barg raph in one single sequence:

df %>% group_by(Category, Type) %>% count() %>% 
  pivot_wider(names_from = Category, values_from = n) %>%
  mutate(Total = `0`+ `1`) %>%
  pivot_longer(-Type, names_to = "Category", values_to = "n") %>%
  group_by(Category) %>%
  mutate(Percent = n / sum(n)) %>%
  ggplot(aes(x = reorder(Category, desc(Category)), y = Percent, fill = Type))+
  geom_col()+
  geom_text(aes(label = scales::percent(Percent)), position = position_stack(0.5))+
  scale_y_continuous(labels = scales::percent)+
  labs(y = "Percentage", x = "Category")

enter image description here

Does it answer your question ?

Sign up to request clarification or add additional context in comments.

4 Comments

Is there any way by which we can do it directly using ggplot rather than creating a new frequency table as I have a lot of variables?
It will be really hard to do that in ggplot2 because you have count for category and count for all which is quite tricky. I edited my answer to show you how to do it in a single pipe sequence without to have to calculate for various category. Let me know if it is ok.
I am getting this error: ``` Error: This tidyselect interface doesn't support predicates yet.```
I never saw this error. What version of tidyverse, R, Rstudio are you using ? Did you try on the example that I provided ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.