3

The following code worked in ggplot2 before I updated to version 2.2.0. Now I get Error: Aesthetics must be either length 1 or the same as the data (30): x, y, xend, yend. The error is caused by the two geom_segment calls.

drug1 <- c(.7, -1.6, -.2, -1.2, -.1, 3.4, 3.7, .8, 0, 2)
drug2 <- c(1.9, .8, 1.1, .1, -.1, 4.4, 5.5, 1.6, 4.6, 3.4)
d <- data.frame(Drug=c(rep('Drug 1', 10), rep('Drug 2', 10),
                  rep('Difference', 10)),
                extra=c(drug1, drug2, drug2 - drug1))

ggplot(d, aes(x=Drug, y=extra)) + 
  geom_boxplot(col='lightyellow1', alpha=.3, width=.5) + 
  geom_dotplot(binaxis='y', stackdir='center', position='dodge') +
  stat_summary(fun.y=mean, geom="point", col='red', shape=18, size=5) +
  geom_segment(aes(x=rep('Drug 1', 30), xend=rep('Drug 2', 30), y=drug1, yend=drug2),
               col=gray(.8)) +
  geom_segment(aes(x='Drug 1', xend='Difference', y=drug1, yend=drug2 - drug1),
               col=gray(.8)) +
  xlab('') + ylab('Extra Hours of Sleep') + coord_flip()

Update: Improved code that works:

drug1 <- c(.7, -1.6, -.2, -1.2, -.1, 3.4, 3.7, .8, 0, 2)
drug2 <- c(1.9, .8, 1.1, .1, -.1, 4.4, 5.5, 1.6, 4.6, 3.4)
d <- data.frame(Drug=c(rep('Drug 1', 10), rep('Drug 2', 10),
                  rep('Difference', 10)),
                extra=c(drug1, drug2, drug2 - drug1))
w <- data.frame(drug1, drug2, diff=drug2 - drug1)

ggplot(d, aes(x=Drug, y=extra)) +
  geom_boxplot(col='lightyellow1', alpha=.3, width=.5) + 
  geom_dotplot(binaxis='y', stackdir='center', position='dodge') +
  stat_summary(fun.y=mean, geom="point", col='red', shape=18, size=5) +
  geom_segment(data=w, aes(x='Drug 1', xend='Drug 2', y=drug1, yend=drug2),
               col=gray(.8)) +
  geom_segment(data=w, aes(x='Drug 1', xend='Difference', y=drug1, yend=drug2 - drug1),
               col=gray(.8)) +
  xlab('') + ylab('Extra Hours of Sleep') + coord_flip()
4
  • 1
    Your drug1 and drug2 are both length 10 . try y=rep(drug1, 3) and yend=rep(drug2, 3)) (I also think it would be nicer to add these to a second data frame rather than leaving ggplot to look in the global env) Commented Dec 10, 2016 at 18:46
  • 1
    Excellent. I'm improving the code as you suggest, in the original posting. Commented Dec 10, 2016 at 21:44
  • @FrankHarrell so I understand the context, are drug1 and drug2 paired values (e.g. associated with the same subject)? Commented Dec 10, 2016 at 21:55
  • Correct, as in a crossover study. Commented Dec 10, 2016 at 22:23

1 Answer 1

1

The updated version of the code produces a data-frame d, that looks like this:

drug1 <- c(.7, -1.6, -.2, -1.2, -.1, 3.4, 3.7, .8, 0, 2)
drug2 <- c(1.9, .8, 1.1, .1, -.1, 4.4, 5.5, 1.6, 4.6, 3.4)
d <- data.frame(Drug=c(rep('Drug 1', 10), rep('Drug 2', 10),
                  rep('Difference', 10)),
                extra=c(drug1, drug2, drug2 - drug1))

> d
         Drug extra
1      Drug 1   0.7
2      Drug 1  -1.6
3      Drug 1  -0.2
4      Drug 1  -1.2
5      Drug 1  -0.1
6      Drug 1   3.4
7      Drug 1   3.7
8      Drug 1   0.8
9      Drug 1   0.0
10     Drug 1   2.0
11     Drug 2   1.9
12     Drug 2   0.8
13     Drug 2   1.1
14     Drug 2   0.1
15     Drug 2  -0.1
16     Drug 2   4.4
17     Drug 2   5.5
18     Drug 2   1.6
19     Drug 2   4.6
20     Drug 2   3.4
21 Difference   1.2
22 Difference   2.4
23 Difference   1.3
24 Difference   1.3
25 Difference   0.0
26 Difference   1.0
27 Difference   1.8
28 Difference   0.8
29 Difference   4.6
30 Difference   1.4

This is a problematic way to create the data-frame for two reasons:

  1. The variables drug1 and drug2 exist in both the global environment and within the data.frame d. This creates the potential for confusion, masking, and other errors.

  2. The only way Difference is tied to the values that produced the difference is the row ordering. For instance, the values in row 1 and row 11 produced the difference in row 21. This can create problems if you do any later modification of the data set.

I would suggest creating the data-frame in a manner like this:

d2 <- data.frame(
  pair = 1:10,
  drug1 = c(.7, -1.6, -.2, -1.2, -.1, 3.4, 3.7, .8, 0, 2),
  drug2 = c(1.9, .8, 1.1, .1, -.1, 4.4, 5.5, 1.6, 4.6, 3.4)
) 

   pair drug1 drug2
1     1   0.7   1.9
2     2  -1.6   0.8
3     3  -0.2   1.1
4     4  -1.2   0.1
5     5  -0.1  -0.1
6     6   3.4   4.4
7     7   3.7   5.5
8     8   0.8   1.6
9     9   0.0   4.6
10   10   2.0   3.4

There is an explicit pair variable that links the values, and no extra copies of drug1 and drug2 exist outside of d2.

You can then use tidyr to convert to tidy/long format (for nice use with ggplot and modeling packages):

tidyr::gather(d2, drug, value, drug1, drug2)

   pair  drug value
1     1 drug1   0.7
2     2 drug1  -1.6
3     3 drug1  -0.2
4     4 drug1  -1.2
5     5 drug1  -0.1
6     6 drug1   3.4
7     7 drug1   3.7
8     8 drug1   0.8
9     9 drug1   0.0
10   10 drug1   2.0
11    1 drug2   1.9
12    2 drug2   0.8
13    3 drug2   1.1
14    4 drug2   0.1
15    5 drug2  -0.1
16    6 drug2   4.4
17    7 drug2   5.5
18    8 drug2   1.6
19    9 drug2   4.6
20   10 drug2   3.4
Sign up to request clarification or add additional context in comments.

3 Comments

I see what you are getting at but that is a very long way to do it in my humble opinion. A separate package should not be needed for this application. I prefer the original but you are right it would be better to not have variables hanging around in the global environment. The improved code I posted is not confused by these global variables though.
When I look at the updated section of your original post, the two issues still remain. Not having an explicit variable like pair could easily lead to problems.
We'll have to agree to disagree on that point. I don't need pair when I have direct access to the two paired measurements.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.