5

I'm trying to display percentage numbers as labels inside the bars of a stacked bar plot in ggplot2. I found some other post from 3 years ago but I'm not able to reproduce it: How to draw stacked bars in ggplot2 that show percentages based on group?

The answer to that post is almost exactly what I'm trying to do.

Here is a simple example of my data:

df = data.frame('sample' = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4'),
                'class' = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3'))
ggplot(data=df, aes(x=sample, fill=class)) + 
    coord_flip() +
    geom_bar(position=position_fill(reverse=TRUE), width=0.7)

enter image description here

I'd like for every bar to show the percentage/fraction, so in this case they would all be 33%. In reality it would be nice if the values would be calculated on the fly, but I can also hand the percentages manually if necessary. Can anybody help?

Side question: How can I reduce the space between the bars? I found many answers to that as well but they suggest using the width parameter in position_fill(), which doesn't seem to exist anymore.

Thanks so much!

EDIT:

So far, there are two examples that show exactly what I was asking for (big thanks for responding so quickly), however they fail when applying it to my real data. Here is the example data with just another element added to show what happens:

df = data.frame('sample' = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4','cond1'),
                'class' = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3','class2'))

Essentially, I'd like to have only one label per class/condition combination.

1

3 Answers 3

5

I think what OP wanted was labels on the actual sections of the bars. We can do this using data.table to get the count percentages and the formatted percentages and then plot using ggplot:

library(data.table)
library(scales)
dt <- setDT(df)[,list(count = .N), by = .(sample,class)][,list(class = class, count = count,
                percent_fmt = paste0(formatC(count*100/sum(count), digits = 2), "%"),
                percent_num = count/sum(count)
                ), by = sample]

ggplot(data=dt, aes(x=sample, y= percent_num, fill=class)) +   
  geom_bar(position=position_fill(reverse=TRUE), stat = "identity", width=0.7) +
  geom_text(aes(label = percent_fmt),position = position_stack(vjust = 0.5)) + coord_flip()

enter image description here

Edit: Another solution where you calculate the y-value of your label in the aggregate. This is so we don't have to rely on position_stack(vjust = 0.5):

dt <- setDT(df)[,list(count = .N), by = .(sample,class)][,list(class = class, count = count,
               percent_fmt = paste0(formatC(count*100/sum(count), digits = 2), "%"),
               percent_num = count/sum(count),
               cum_pct = cumsum(count/sum(count)),
               label_y = (cumsum(count/sum(count)) + cumsum(ifelse(is.na(shift(count/sum(count))),0,shift(count/sum(count))))) / 2
), by = sample]

ggplot(data=dt, aes(x=sample, y= percent_num, fill=class)) +   
  geom_bar(position=position_fill(reverse=TRUE), stat = "identity", width=0.7) +
  geom_text(aes(label = percent_fmt, y = label_y)) + coord_flip()
Sign up to request clarification or add additional context in comments.

5 Comments

That's indeed what I was looking to do, thanks! There is something strange about it though. Your example works perfectly, but when I apply this to my data (which is a bit more than in the example given), the output looks like this: link I think that it is labelling every instance of a class that goes into the bar. But then I don't understand why the bars are lost and why the axis is completely off.
Hmm, can you post more of your data?
I'll add it to the post.
@fakechek see my updates. I changed sum(count) instead of .N in the second data.table chain. I also collapsed in the first step. I think this was the source of the issue.
Thank you so much, now it is perfect :)
3

Here is a solution where you first calculate the percentages using dplyr and then plot them:

UPDATED:

options(stringsAsFactors = F)

df = data.frame(sample = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4'), 
                class = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3'))

library(dplyr) 
library(scales)

df%>%
  # count how often each class occurs in each sample.
  count(sample, class)%>% 
  group_by(sample)%>%
  mutate(pct = n / sum(n))%>%
  ggplot(aes(x = sample, y = pct, fill = class)) + 
  coord_flip() +
  geom_col(width=0.7)+
  geom_text(aes(label = paste0(round(pct * 100), '%')),
            position = position_stack(vjust = 0.5))

enter image description here

4 Comments

Thanks so much, unfortunately I have the same problem as with someone else solution. Briefly, when adding more elements (class/conditions combinations), that get grouped according to the colour code, they all get their own label. Instead, I'm trying to add only one label per color :/
I'll update the answer to be a bit more robust to more class/conditions combinations.
If you don't want to count the classes within each sample but rather have unique combinations of classes and samples you could add the line 'distinct(sample, class)' at the start of the analysis (above ' count(sample, class)'). In this alternative calculation the percentages within each sample will always be equal.
Fantastic, thanks a lot for the great help! This is beautiful!
2

Use scales

library(scales)
ggplot(data=df, aes(x=sample, fill=class)) +
  coord_flip() +
  geom_bar(position=position_fill(reverse=TRUE), width=0.7) +
  scale_y_continuous(labels =percent_format())

1 Comment

Thanks for the advice, that was something I also wanted to address later on. But actually, what I'm trying to do here is adding a label to every bar, e.g. using geom_text() or geom_label(), saying the percentage of each class in that condition.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.