3

I am working with a hierarchical model (lmerMod) that has an outcome and two categorical variables, cat1 and cat2. I want to make an interaction plot using ggplot2:: that visualizes the model coefficients and takes into account two specific complexities:

Pairwise comparisons: I need to compare the coefficients for each level of cat2 (Control, T1, T2, T3) against each other within each level of cat1 (1 and 2). Statistical significance: I want to highlight the statistical significance of these comparisons using asterisks or some similar notation. Examples are ggsignif or ggpval packages, or ggpubr::stat_compare_means() function.

Requirements:

  • The geometry should be pointrange.
  • cat1 should be on the x-axis.
  • cat2 should define the color aesthetic.
  • I want to use the lmerTest package for model fitting (to get correct p-values).

Challenges:

  • Using ggeffects::hypothesis_test() takes a very long time for large datasets (N=46,000, G1=4,000, G2=40).
  • I struggle with ggplot solutions for having brackets (significance stars can be places with geom_text() and ifelse() I guess)

Reproducible Example:

Here's a simple R example using a toy dataset:

# Load required packages
library(lmerTest)
library(ggplot2)

# Create toy data
set.seed(123)
N <- 1000  # Increased sample size
G1 <- 20   # Increased number of levels for G1
G2 <- 10   # Increased number of levels for G2

df <- data.frame(
  outcome = rnorm(N),
  cat1 = factor(rep(1:2, each = N / 2)),
  cat2 = factor(rep(c("Control", "T1", "T2", "T3"), each = N / 4)),
  G1 = factor(rep(1:G1, each = N / G1)),
  G2 = factor(rep(1:G2, each = N / G2))
) 

# Fit the hierarchical model
fit <- lmerTest::lmer(outcome ~ cat1 * cat2 + (1|G1) + (1|G2), data = df)
summary(fit)

# Attempt to make the plot
interactions::cat_plot(model = fit, pred = cat1, modx = cat2, 
                       interval.geom = "linerange", colors = "Set1")
# ... (I want significance stars and brackets connecting the coefficients: this is where I am stuck)

Categorical by categorical interaction without significant differences

1 Answer 1

2

First, note that there are two problems with your data generation, when fitting the model

fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients

boundary (singular) fit: see help('isSingular')

The first issue arises because there is perfect multicollinearity in the predictors. This means one or more predictors can be perfectly predicted using other predictors in the model. When this happens, the matrix of predictors (often called the design matrix) cannot be inverted, and the model coefficients cannot be uniquely estimated.

The second is likely due to there being too little variation in one of the random effects. for the sake of this answer I will just omit the random effect for G2. So, hopefully, I think we can achieve what you want with the following code:

library(lmerTest)
library(ggplot2)
library(effects)

# Create toy data
set.seed(123)
N <- 1000
G1 <- 20
G2 <- 10

df <- data.frame(
  outcome = rnorm(N),
  cat1 = factor(rep(1:2, each = N / 2)),
  cat2 = factor(sample(c("Control", "T1", "T2", "T3"), N, replace = TRUE)), # Shuffle the cat2 levels randomly
  G1 = factor(rep(1:G1, each = N / G1)),
  G2 = factor(rep(1:G2, each = N / G2))
)

# Fit the hierarchical model with interaction
fit <- lmerTest::lmer(outcome ~ cat1 * cat2 + (1|G1) + (1|G2), data = df)

# Extract effects of interaction
eff <- effect("cat1:cat2", fit)

# Convert effect object to dataframe for plotting
eff_df <- as.data.frame(eff)

# Rename columns for ease of use
names(eff_df) <- c("cat1", "cat2", "fit", "lower", "upper")

# Visualization using ggplot2
ggplot(eff_df, aes(x = cat1, y = fit, color = cat2, group = cat2)) +
  geom_point(aes(shape = cat2), size = 3) +
  geom_line(aes(linetype = cat2)) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2) +
  labs(y = "Predicted Outcome", title = "Interaction Effect of cat1 and cat2") +
  theme_minimal()

which generates the following plot:

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. The data generation was just meant to produce a reproducible example. Your visualization does show these changes, but not the significance. In the example above was trying to add brackets with asterisks to highlight whether, e.g., there is a statistically significance increase from Cat2=="Control" to Cat2=="T1" when cat1==1. An example is in the ggsignif intro:cran.r-project.org/web/packages/ggsignif/vignettes/intro.html
Try not to worry about statistical significance. p values are not very useful. Effect sizes are far more important imho. Suppose you have an effect of -5, with a p-value of 0.051, which would not be significant at the 5% level. Compare that with an effect size of -0.1 with a p-value of 0.0499 which would be significant. Let's say it's a study about an intervention to reduce BMI in people with BMI> 30. A reduction of -5 is incredibly meaningful, whereas -0.1 is completely irrelevant.
I work in scientific research and we care for both substantial and statistical significance. You should not exclude that large effects are due to chance only. This is pretty common in underpowered studies.
Indeed, which is a good reason for ensuring sufficient power with a sample size calc first. Anyway, I'll see if I can incorporate ggsignif into my answer a bit later.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.