Sorry to make this more complicated, but I think as it stands the diagram gives the wrong impression of how LDA works. I mean, it does give a rough visual intuition of two-group LDA, but it could really wrong-foot someone who was trying to get a mathematical or geometrical understanding of how it works.
The LD vector is not just the line connecting the raw means of each group (unless both groups have equal and spherical covariance, which in this case they don't). It is only equivalent to the line joining the group means after the original data has undergone a whitening transform (i.e. a linear transformation that makes the covariance ellipses circular).
The discriminant boundary is orthogonal to the LD vector, but it is not in general orthogonal to the line connecting the means of the two groups in raw data space.
For your diagram to work as an accurate representation of LDA in its current format, you would need to show the data after a whitening transform.
Let's set-up the same way as you did in your gist:
library(MASS)
library(tidyverse)
set.seed(1234)
n <- 100
mu1 <- c(3,4)
mu2 <- c(6,2)
Sigma <- matrix(c(2, 1, 1, 2), 2, 2)
X1 <- mvrnorm(n, mu1, Sigma) |>
as.data.frame() |>
setNames(nm = c("y1", "y2")) |>
cbind(group = "A")
X2 <- mvrnorm(n, mu2, Sigma) |>
as.data.frame() |>
setNames(nm = c("y1", "y2")) |>
cbind(group = "B")
X <- rbind(X1, X2)
Suppose now we make this easier to transform and visualise by doing away with the ggplot grid and annotations altogether and create our own background grid
grid <- data.frame(x1 = c(seq(-10, 10, 2.5), rep(-10, 9)),
y1 = c(rep(-10, 9), seq(-10, 10, 2.5)),
x2 = c(seq(-10, 10, 2.5), rep(10, 9)),
y2 = c(rep(10, 9), seq(-10, 10, 2.5)),
wt = rep(c(0.2, 0.1, 0.2, 0.1, 0.5, 0.1, 0.2, 0.1, 0.2), 2)
)
Then we can make a plot that shows our initial data:
ggplot(X, aes(y1, y2, color = group)) +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, linewidth = I(wt)),
colour = "gray50", data = grid) +
geom_point() +
stat_ellipse(geom = "polygon", aes(fill = group), level = 0.68, alpha = 0.2) +
theme_void(16) +
coord_equal(xlim = c(-2.5, 10), ylim = c(-2.5, 10))

If you want to demonstrate how an LDA works geometrically, the first thing that needs to happen is you center your data, so you could plot this
X_centered <- scale(as.matrix(X[1:2]), center = TRUE, scale = FALSE) |>
as.data.frame() |>
cbind(group = X$group)
ggplot(X_centered, aes(y1, y2, color = group)) +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, linewidth = I(wt)),
colour = "gray50", data = grid) +
geom_point() +
stat_ellipse(geom = "polygon", aes(fill = group), level = 0.68, alpha = 0.2) +
theme_void(16) +
coord_equal(xlim = c(-6.25, 6.25), ylim = c(-6.25, 6.25))

The next thing you need to do is whiten the data (so that the intragroup covariance is spherical). This is a linear transformation, and we can apply it to our grid as well so that we know what the transformation is actually doing, effectively squeezing the grid in the Northeast-Southwest direction.
e <- eigen((cov(X1[1:2]) + cov(X2[1:2])) / 2)
white_mat <- e$vectors %*% diag(1 / sqrt(e$values)) %*% t(e$vectors)
X_white <- as.matrix(X_centered[1:2]) %*% white_mat
X_white <- as.data.frame(X_white) |>
cbind(group = X$group) |>
rename(y1 = V1, y2 = V2)
grid_white <- cbind(as.data.frame(as.matrix(grid[1:2]) %*% white_mat),
as.data.frame(as.matrix(grid[3:4]) %*% white_mat),
grid[5]) |> setNames(names(grid))
ggplot(X_white, aes(y1, y2, color = group)) +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, linewidth = I(wt)),
colour = "gray50", data = grid_white) +
geom_point() +
stat_ellipse(geom = "polygon", aes(fill = group), level = 0.68, alpha = 0.2) +
theme_void(16) +
coord_equal(xlim = c(-6.25, 6.25), ylim = c(-6.25, 6.25))

We can see that the covariance ellipses are now approximately circular. (The reason why they are not exactly circular is that though they were drawn from a bivariate normal distribution with the same covariance, neither sample actually has that exact covariance - this is why we had to take their mean covariance for our eigen() decomposition).
Now we can rotate this around the origin so that the group means and the origin all lie along a single horizontal line. We can do this by using the normalized vector between the two group means as a basis for a rotation matrix.
Z <- as.matrix(X_white[1:2])
v <- colMeans(Z[X_white$group == "B", ]) - colMeans(Z[X_white$group == "A", ])
v_len <- sqrt(sum(v^2))
u <- v / v_len
rotate_mat <- t(rbind(u, c(-u[2], u[1])))
X_rot <- (Z %*% rotate_mat) |>
as.data.frame() |>
setNames(c("y1", "y2"))
X_rot$group <- X_white$group
grid_rot <- cbind(as.data.frame(as.matrix(grid_white[1:2]) %*% rotate_mat),
as.data.frame(as.matrix(grid_white[3:4]) %*% rotate_mat),
grid[5]) |> setNames(names(grid))
ggplot(X_rot, aes(y1, y2, color = group)) +
geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2, linewidth = I(wt)),
colour = "gray50", data = grid_rot) +
geom_point() +
stat_ellipse(geom = "polygon", aes(fill = group), level = 0.68, alpha = 0.2) +
theme_void(16) +
coord_equal(xlim = c(-6.25, 6.25), ylim = c(-6.25, 6.25))

Now we can draw a line between the group means and project down the centre to get our LD1 density image:
A <- colMeans(X_rot[X_rot$group == "A", 1:2])
B <- colMeans(X_rot[X_rot$group == "B", 1:2])
ggplot(X_rot, aes(y1, y2 + 5, color = group)) +
geom_segment(aes(x = x1, y = y1 + 5, xend = x2, yend = y2 + 5,
linewidth = I(wt)),
colour = "gray50", data = grid_rot) +
geom_point() +
stat_ellipse(geom = "polygon", aes(fill = group), level = 0.68, alpha = 0.2) +
annotate("point", x = c(A[1], B[1]), y = c(A[2], B[2]) + 5) +
annotate("segment", x = A[1], xend = B[1], y = A[2] + 5, yend = B[2] + 5) +
annotate("segment", x = 0, y = 5, xend = 0, yend = 0, linetype = 2) +
geom_density(aes(x = y1, y = after_stat(density) * 4, fill = group),
inherit.aes = FALSE, alpha = 0.5) +
annotate("text", x = c(seq(-5, 5, 2.5)), y = -0.5,
label = seq(-5, 5, 2.5)) +
annotate("segment", x = c(seq(-5, 5, 2.5)), y = -0.2, yend = 0) +
annotate("text", x = 0, y = -1, label = "LD1") +
theme_void(16) +
coord_equal(xlim = c(-5, 5), ylim = c(-1, 9))

To prove that this geometrical interpretation really shows the LDA, we can take the values from an "actual" LDA and compare them to the values we have plotted. We can see they are the same (within the limits of numerical precision expected from different calculation methods)
LDA_values <- predict(lda(group ~ y1 + y2, data = X))$x
our_values <- X_rot$y1
differences <- abs(LDA_values - our_values)
all(differences < 10^-15)
#> [1] TRUE
Now, you could reverse all the transformations and show the original data with the LDA below it, and it would remain accurate. But the discrimination line would no longer be at 90 degrees to the line joining your group means, the bumps on the LDA plot would no longer be orthogonal projections from your point clouds, the scale on the LDA plot would be different from the scale in data space, and it would no longer be parallel to the LDA vector, so the explanatory power of the diagram as a visual representation of LDA would be lost.
I know my sequence of plots here is a lot more complicated to code, but it has the virtue of being a genuine geometric interpretation of LDA that gives a clear intuition of the process.