Validating t-test results using Python scipy

Question

I have simple Python function:

from scipy.stats import ttest_1samp

def tTest( expectedMean, sampleSet, alpha=0.05 ):
    # T-value and P-value
    tv, pv = ttest_1samp(sampleSet, expectedMean)
    print(tv,pv)
    return pv >= alpha

if __name__ == '__main__':
    # Expected mean is 10
    print tTest(10.0, [99, 99, 22, 77, 99, 55, 44, 33, 20, 9999, 99, 99, 99])

My expectation is that t-test should fail for this sample, as it is nowhere near the expected population mean of 10. However, program produces result:

(1.0790344826428238, 0.3017839504736506)
True

I.e. the p-value is ~30% which is too high to reject the hypothesis. I am not very knowledgeable about the maths behind t-test but I don't understand how this result can be correct. Does anyone have any ideas?

Did you intentionally include the value 9999 in your sample, or did you mean for that to be 99, 99? — Warren Weckesser
– Warren Weckesser, Commented Oct 3, 2018 at 14:48
I intentionally put in some outliers... Am I breaking something because my sample isn't really normally distributed? — sneg
– sneg, Commented Oct 3, 2018 at 14:52
As @MiguelSantos points out in the answer below, that large value results in your sample having a large variance, which gives a low t-statistic, and therefore a high p value. — Warren Weckesser
– Warren Weckesser, Commented Oct 3, 2018 at 14:58
Your standard deviation for that sample is 2753.88 the mean is 834.16 (both rounded) 10.0 is within 1 standard deviation form the mean of your population. Hence you can not reject null hypothesis. Extremely simplified version. — error
– error, Commented Oct 3, 2018 at 14:58
Interesting... I did a simple calculation on paper. Indeed, t-value tv = (mean-expectedMean)/sqrt(var/n) is ~1.123 which is lower than t-value even for alpha of 10% on student's distribution with 12 degrees of freedom. — sneg
– sneg, Commented Oct 3, 2018 at 15:04

Miguel Santos · Accepted Answer · 2018-10-03 15:00:36Z

1

I performed the test using R just to check if the results are the same and they are:

t.test(x=c(99, 99, 22, 77, 99, 55, 44, 33, 20, 9999, 99, 99, 99), alternative = "two.sided", 
mu = 10, paired = FALSE, var.equal = FALSE, conf.level = 0.95)

data:  c(99, 99, 22, 77, 99, 55, 44, 33, 20, 9999, 99, 99, 99)
t = 1.079, df = 12, p-value = 0.3018
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
-829.9978 2498.3055
sample estimates:
mean of x 
 834.1538

You can see that the p-value is 0.3. This is a really interesting problem, I have a lot of issues with Hypothesis testing. First of all the sample size influences a lot, if u have a big sample size, lets say 5000 values, minor deviations from the expected value that you are testing will influence a lot the p-value, and so you will reject the null hypothesis most of the times, having small samples does the opposite. And what is happening here is that you have a high variance in the data.

If you try to replace your data from [99, 99, 22, 77, 99, 55, 44, 33, 20, 9999, 99, 99, 99]

To [99, 99, 99, 99, 100, 99, 99, 99, 99, 100, 99, 100, 100]

So it has a really small variance, your p-value will be a lot smaller, even tho the mean of this one is probably closer to 10.

edited Oct 3, 2018 at 15:00

answered Oct 3, 2018 at 14:53

Miguel Santos

2,0566 gold badges21 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sneg Over a year ago

So is the conclusion here that the sample size is simply too small to conduct a meaningful t-test?

Miguel Santos Over a year ago

The sample size is too small and the variance is too big both of those are pushing the p-value up. The higher the variance in your data the higher the sample size you want to have to be able to perform a good test

Collectives™ on Stack Overflow

Validating t-test results using Python scipy

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related