1

I have a ggplot2 plot such as the following:

mtcars %>%
  group_by(gear, carb) %>%
  summarise(
    avg = mean(mpg),
    n = n(),
    gear = gear,
    carb = carb
  ) %>%
  ggplot(aes(
    x = factor(gear),
    y = avg,
    color = carb,
    group = carb
  )) +
  geom_point(position = "dodge")

Which renders this:

enter image description here

I now need to plot the density distribution of the three categories of the x axis (gear) using the value of n for the number of observations. I want this smooth distribution plot for n of each gear in each carb color.

I attempt to use the following:

mtcars %>% 
  group_by(gear,carb) %>%
  summarise(
    avg = mean(mpg), 
    n = n(),
    gear=gear,
    carb=carb
    ) %>% 
  ggplot(aes(
    x=factor(gear),
    y=avg,
    color = carb,
    group=carb)) +
  geom_point(position = "dodge") +
  geom_density(aes(x=factor(gear),y=n,color=carb))

But I receive the error:

Problem while setting up geom.
ℹ Error occurred in the 2nd layer.
Caused by error in `compute_geom_1()`:
! `geom_density()` requires the following missing aesthetics: x
Backtrace:
  1. base (local) `<fn>`(x)
  2. ggplot2:::print.ggplot(x)
  4. ggplot2:::ggplot_build.ggplot(x)
  5. ggplot2:::by_layer(...)
 12. ggplot2 (local) f(l = layers[[i]], d = data[[i]])
 13. l$compute_geom_1(d)
 14. ggplot2 (local) compute_geom_1(..., self = self)

I have tried placing the variables inside aes() and outside, but nothing has allowed me to generate this plot. How can I generate distribution plots and add them to the larger other ggplot. I recognize I will have to rescale the value of n in order to fit the space below the points in the plot, but how can I set up the plot to accept the n as the variable of the count in the geom_density?

1 Answer 1

3

The simple solution is to use stat = "identity" in the geom_density call.:

library(tidyverse)

mtcars %>% 
  group_by(gear,carb) %>%
  summarise(
    avg = mean(mpg), 
    n = n()
  ) %>% 
  ggplot(aes(
    x=factor(gear),
    y=avg,
    color = carb,
    group=carb)) +
  geom_point(position = "dodge") +
  geom_density(aes(x=factor(gear),y=n,color=carb),
               stat = "identity")

A slightly more complex solution (in case that's what you're looking for) is to create two graphs with two separate y axes and place them one on top of the other using the patchwork package. For example (with some simplified code):

library(patchwork)

g1 <- mtcars |> 
  ggplot(aes(gear, mpg, colour = factor(carb))) +
  stat_summary(geom = "point", fun = mean)

g2 <- mtcars |> 
  ggplot(aes(gear, colour = factor(carb))) +
  stat_count(geom = "density",
             aes(y = after_stat(count)),
             position = "identity") +
  theme(legend.position = "none")

g1 / g2  + plot_layout(heights = 2:1, guides = "collect")

Edit - using a smooth density

You can use geom_density on its own (with y = after_stat(count) still to get n's), though this is a density function and so it 'estimates' each binned category taking into account the upper and lower categories too (so never quite passes through a simple counted integer):

library(tidyverse)
library(patchwork)

g1 <- mtcars |> 
  ggplot(aes(factor(gear), mpg, colour = factor(carb))) +
  stat_summary(geom = "point", fun = mean)

g2 <- mtcars |> 
  ggplot(aes(gear, colour = factor(carb))) +
  geom_density(aes(y = after_stat(count))) +
  theme(legend.position = "none")

g1 / g2  + plot_layout(heights = 2:1, guides = "collect")

Or perhaps a stat_smooth will work nicely here (with some tinkering to get 0s in data):

library(tidyverse)

counts <- mtcars %>% 
  group_by(gear,carb) %>%
  summarise(
    avg = mean(mpg), 
    n = n()
  ) 
#> `summarise()` has grouped output by 'gear'. You can override using the
#> `.groups` argument.


counts |> 
  ungroup() |> 
  expand(gear, carb) |> 
  left_join(counts) |> 
  replace_na(list(n = 0)) |> 
  ggplot(aes(gear, colour = factor(carb))) +
  stat_smooth(aes(y = n), method = "loess", se = FALSE) 

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, your solutions are great. One question: is there any way to smooth the curve so that it looks more like a distribution plot? I know the factor has few levels and low variation, but is it possible to smooth it like a common frequency distribution?
Have put another set of code as an experimental potential way of getting 'smoothed', although the 'density' smoothing mechanism gives some odd results in a binned discrete scale! But perhaps this might be a better way of getting smoothed lines?
These are both great! Thanks for coming up with these. Truly clever appraoch!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.