6
$\begingroup$

Is there a single test one can perform that would be significant if two coefficients in a linear model are both different from 0, but would not be significant if only one of them differs from 0? Yes one can look at the model summary, but this does not give you a p-value for the null hypothesis that one or none are significant. I am aware of methods for the null that neither is significant (e.g., car::linearHypothesis()), but this would be significant if only one coefficient differs from 0, which is not what I'm interested in.

An example is below. I am looking for a method that is significant for y1 but not y2.

set.seed(1234)

dat = MASS::mvrnorm(n = 500,mu = rep(0,4),
                    Sigma = matrix(nrow = 4,byrow = TRUE,
                                    c(1,  .2,  .2,  .3,
                                     .2,   1,  .2,   0,
                                     .2,  .2,   1,   0,
                                     .3,   0,   0,   1) )) |> as.data.frame()

colnames(dat) = c("x1","x2","y1","y2")

summary(lm(y1 ~ x1 + x2,data = dat))

Call:
lm(formula = y1 ~ x1 + x2, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.70450 -0.63760 -0.01951  0.68030  2.58319 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.04704    0.04171   1.128   0.2599    
x1           0.12099    0.04271   2.833   0.0048 ** 
x2           0.25624    0.04254   6.024 3.31e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9324 on 497 degrees of freedom
Multiple R-squared:  0.1052,    Adjusted R-squared:  0.1016 
F-statistic: 29.23 on 2 and 497 DF,  p-value: 9.978e-13


summary(lm(y2 ~ x1 + x2,data = dat))
Call:
lm(formula = y2 ~ x1 + x2, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4124 -0.5711 -0.0436  0.5844  3.2467 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.036480   0.041036  -0.889    0.374    
x1           0.242923   0.042027   5.780 1.32e-08 ***
x2           0.002013   0.041852   0.048    0.962    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9174 on 497 degrees of freedom
Multiple R-squared:  0.06828,   Adjusted R-squared:  0.06454 
F-statistic: 18.21 on 2 and 497 DF,  p-value: 2.328e-08
$\endgroup$
7
  • 1
    $\begingroup$ I don't understand what information you think this would give you. Looking at the results, in the first model both variables are significant, in the second one only one. If a test behaved exactly as you say you'd like it to behave, it would add zero information, because all the information you need to see this is already there. $\endgroup$ Commented Aug 10 at 20:04
  • 1
    $\begingroup$ I would like a single p-value for the test. This would allow, for example, for multiple test correction for this type of hypothesis across multiple outcomes. $\endgroup$ Commented Aug 10 at 20:17
  • 2
    $\begingroup$ I'm doubting myself but I don't see why the maximum of both p-values wouldn't work. Sure it's not necessarily uniform under the Null-Hypothesis but that's normal because it's a composite (see here stats.stackexchange.com/q/58929/341520 or here: statmodeling.stat.columbia.edu/2023/04/14/…) $\endgroup$ Commented Aug 10 at 21:50
  • 4
    $\begingroup$ Investigate intersection-union and union-intersection tests. You're after null that's a union of one-parameter nulls (you're in the null if any of the nulls hold) and an alternative that's an intersection of their complements (you're only in the alternative if all the one-parameter nulls are false). An intersection-union test would reject if all the one-parameter tests reject. Do each test at $\alpha$. The overall p-value should be the lowest overall $\alpha$ at which you'd reject all tests; that is indeed the larger of the component one-parameter p-values. NB: This can be conservative. ctd.. $\endgroup$ Commented Aug 10 at 23:00
  • 2
    $\begingroup$ ctd... Casella and Berger has a section on both kinds of test. There's an introductory discussion of the intersection-union test here (uploaded by the author). $\endgroup$ Commented Aug 10 at 23:03

1 Answer 1

1
$\begingroup$

Based on the excellent tips from the comments, I will try to answer my own question. The goal here is to have a test which is significant only when two variables are both independently associated with an IV - they both add something unique, beyond what is explained by their shared variance. One way to test this is with an intersection-union test (IUT). We look at both p-values in the model, and the p-value of the test is the larger of the p-values. I'm also going to expand this to also include the p-values when the variables are entered separately, because I want to rule-out collider-bias (i.e., the variables must also be associated with the outcome regardless of whether or not the other is included). So we have 4 p-values, but no multiple-test correction is needed across the 4 as we choose the least significant one as the p-value of our test.

Here I'll simulate the distribution of p-values under various nulls, as well as when the alternative is true. We'll see that this is a non-uniform, but conservative, null distribution, and as such BH-FDR correction can be used, but other approaches that make stronger assumptions about the null distribution (e.g., q-value FDR correction) are inappropriate.

# First, a basic simultion
sim_IUT = function(size,x1_x2,x1_y,x2_y){

dat = MASS::mvrnorm(n = size,mu = rep(0,3),
                    Sigma = matrix(nrow = 3,byrow = TRUE,
                                   c(1,     x1_x2,  x1_y,  
                                     x1_x2, 1,      x2_y,  
                                     x1_y,  x2_y,   1    ) )) |> as.data.frame()

colnames(dat) = c("x1","x2","y1")

m_all  = lm(y1 ~ x1 + x2 ,data = dat)
m_x1   = lm(y1 ~ x1      ,data = dat)
m_x2   = lm(y1 ~      x2 ,data = dat)

# The 4 p-values
ps = c(summary(m_all)$coefficients[-1,4],
   summary(m_x1)$coefficients[-1,4],
       summary(m_x2)$coefficients[-1,4])

return(max(ps))

}

Under a 'full' null, the p-value distribution is very conservative, with only a 0.2% false positive rate.

null_ps = sapply(c(1:10000),
             function(X){sim_IUT(size = 500,x1_x2 = 0,x1_y = 0,x2_y = 0)})
sum(null_ps <0.05)/10000
[1] 0.002
hist(null_ps)

Full null conservative p-value distribution

We are protected from false-positives that could arise if only one variable is associated, such as from collider-bias. The null is still conservative, though with a bit of an unusual distribution, with a false positive rate of 2.5%.

null_ps2 = sapply(c(1:10000),function(X){sim_IUT(size = 500,x1_x2 = .3,x1_y = .2,x2_y = 0)})
sum(null_ps2 <0.05)/10000
[1] 0.025
hist(null_ps2)

Collider-bias null

Even so, the p-value distribution looks as-expected when the alternative is true.

true_ps = sapply(c(1:10000),function(X){sim_IUT(size = 500,x1_x2 = .3,x1_y = .1,x2_y = .15)})
sum(true_ps <0.05)/10000
[1] 0.1823
hist(true_ps)

p-value distribution when the alternative is true

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.