0

I have a dataset (1000 IDs, 9 classes) similar to this one:

ID     Class     Value
1      A         0.014
1      B         0.665
1      C         0.321
2      A         0.234
2      B         0.424
2      C         0.342
...    ...       ...

The Value column are (relative) abundances, i.e. the sum of all classes for one individual equals 1.

I would like to create a ggplot geom_bar plot in R where the x axis is not ordered by IDs but by decreasing class abundance, similar to this one:

enter image description here

In our example, let's say that Class B is the most abundant class across all individuals, followed by Class C and finally Class A, the first bar of the x axis would be for the individual with the highest Class B, the second bar would the individual with the second highest Class B, etc.

This is what I tried:

ggplot(df, aes(x=ID, y=Value, fill=Class)) +
  geom_bar(stat="identity") +
  xlab("") +
  ylab("Relative Abundance\n")
3
  • 1
    You might find a hint here: stackoverflow.com/questions/25664007/… Commented Sep 25, 2018 at 8:57
  • 1
    Possible duplicate of Reorder bars in geom_bar ggplot2 Commented Sep 25, 2018 at 9:02
  • Thank you, I saw this post before but it takes into account only the values, and not the classes and I would like to manually sort the classes in this order: B > C > A. Commented Sep 25, 2018 at 9:04

1 Answer 1

1

You can do the reordering before passing the result to ggplot():

library(dplyr)
library(ggplot2)

# sum the abundance for each class, across all IDs, & sort the result
sort.class <- df %>% 
  count(Class, wt = Value) %>%
  arrange(desc(n)) %>%
  pull(Class)

# get ID order, sorted by each ID's abundance in the most abundant class
ID.order <- df %>%
  filter(Class == sort.class[1]) %>%
  arrange(desc(Value)) %>%
  pull(ID)

# factor ID / Class in the desired order
df %>%
  mutate(ID = factor(ID, levels = ID.order)) %>%
  mutate(Class = factor(Class, levels = rev(sort.class))) %>%
  ggplot(aes(x = ID, y = Value, fill = Class)) +
  geom_col(width = 1) #geom_col is equivalent to geom_bar(stat = "identity")

plot

Sample data:

library(tidyr)

set.seed(1234)
df <- data.frame(
  ID = seq(1, 100),
  A = sample(seq(2, 3), 100, replace = TRUE),
  B = sample(seq(5, 9), 100, replace = TRUE),
  C = sample(seq(3, 7), 100, replace = TRUE),
  D = sample(seq(1, 2), 100, replace = TRUE)
) %>%
  gather(Class, Value, -ID) %>%
  group_by(ID) %>%
  mutate(Value = Value / sum(Value)) %>%
  ungroup() %>% 
  arrange(ID, Class)

> df
# A tibble: 400 x 3
      ID Class  Value
   <int> <chr>  <dbl>
 1     1 A     0.143 
 2     1 B     0.357 
 3     1 C     0.429 
 4     1 D     0.0714
 5     2 A     0.176 
 6     2 B     0.412 
 7     2 C     0.294 
 8     2 D     0.118 
 9     3 A     0.2   
10     3 B     0.4   
# ... with 390 more rows
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.