0

I want to add multiple vertical lines in my density plot that start at the x-axis and end at the curve using ggplot2. I'm using the starwars dataset from dplyr. I want to plot the height variable as a normal distribution. The dashed lines inside the curve represent the standard deviations. So far I got this (just the plot without the lines):

sd.values = seq(66, 264, 34.77043)
zeros.vector = rep(0, 6)

ggplot(starwars, aes(x=height, y=dnorm(height, m=mean(height, na.rm=T), s=sd(height, na.rm=T)))) +
  geom_line() + labs(x='height', y='f(height)') +
  scale_x_continuous(breaks=sd.values,labels=sd.values)

density plot without lines

enter image description here

Now, I want to add the dashed lines using geom_segment:

ggplot(starwars, aes(x=height, y=dnorm(height, m=mean(height, na.rm=T), s=sd(height, na.rm=T))))+
  geom_line() + labs(x='height', y='f(height)') +
  scale_x_continuous(breaks=sd.values, labels=sd.values) +
  geom_segment((aes(x=sd.values, y=zeros.vector, xend=sd.values,
                    yend=dnorm(sd.values, m=mean(height, na.rm=T), s=sd(height, na.rm=T)))),
               linetyp ='dashed')

But in the end, I only get the following error message:

Error: Aesthetics must be either length 1 or the same as the data (87): x, y, xend and yend

Any idea what I have to change in order to add the dashed lines?

2 Answers 2

2

You need to add a new data.frame (or tibble) to the graph, which can have different dimensions. E.g. like this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
sd.values = seq(66, 264, 34.77043)
# zeros.vector = rep(0, 6)

ggplot(starwars, aes(x=height, y=dnorm(height, m=mean(height, na.rm=T), s=sd(height, na.rm=T))))+
    geom_line() + labs(x='height', y='f(height)') +
    scale_x_continuous(breaks=sd.values, labels=sd.values) +
    geom_segment(mapping = aes(x=SD, y=Zeros, xend=SD,
                      yend=dnorm(SD, m=mean(starwars$height, na.rm=T), s=sd(starwars$height, na.rm=T))),
                 linetype ='dashed', inherit.aes = F, data=data.frame(SD=sd.values, Zeros=rep(0, 6)))
#> Warning: Removed 6 row(s) containing missing values (geom_path).

Created on 2020-12-27 by the reprex package (v0.3.0)

Sign up to request clarification or add additional context in comments.

Comments

2

When you specify the data argument in ggplot(), this becomes the default dataset. All aesthetic expressions must have the same length as that dataset, unless you specify a new data for a geom. To avoid setting a default dataset, you can specify the data argument in the geoms.

library(tidyverse)

data(starwars)

sd.values <-  seq(66, 264, 34.77043)
mean_height <-  mean(starwars$height, na.rm = TRUE)
sd_height <-  sd(starwars$height, na.rm = TRUE)

ggplot() + 
  geom_line(data = starwars, 
            aes(x = height, y = dnorm(height, m = mean_height, sd = sd_height))) + 
  geom_segment(data = NULL, 
               aes(x = sd.values, xend = sd.values, 
                   y = 0, yend = dnorm(sd.values, m = mean_height, sd = sd_height)),
               linetype = 'dashed')

distribution graph

Note though that the following call will fail even though you specify data=NULL, because ggplot2 will replace the NULL dataset with starwars, the default.

ggplot(data = starwars, aes(x = height, y = dnorm(height, m = mean_height, sd = sd_height))) + 
  geom_line() + 
  geom_segment(data = NULL, 
               aes(x = sd.values, xend = sd.values, 
                   y = 0, yend = dnorm(sd.values, m = mean_height, sd = sd_height)))

Alternatively, you can create a new dataset and specify that.

library(tidyverse)

data(starwars)

mean_height <-  mean(starwars$height, na.rm = TRUE)
sd_height <-  sd(starwars$height, na.rm = TRUE)

df <- data.frame(
  sd_values = seq(66, 264, 34.77043)
) %>% mutate(yend = dnorm(sd_values, mean_height, sd_height))


ggplot() + 
  geom_line(data = starwars, 
            aes(x = height, y = dnorm(height, m = mean_height, sd = sd_height))) + 
  geom_segment(data = df, 
               aes(x = sd_values, xend = sd_values, 
                   y = 0, yend = yend),
               linetype = 'dashed')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.