1

I have simple Python function:

from scipy.stats import ttest_1samp

def tTest( expectedMean, sampleSet, alpha=0.05 ):
    # T-value and P-value
    tv, pv = ttest_1samp(sampleSet, expectedMean)
    print(tv,pv)
    return pv >= alpha

if __name__ == '__main__':
    # Expected mean is 10
    print tTest(10.0, [99, 99, 22, 77, 99, 55, 44, 33, 20, 9999, 99, 99, 99])

My expectation is that t-test should fail for this sample, as it is nowhere near the expected population mean of 10. However, program produces result:

(1.0790344826428238, 0.3017839504736506)
True

I.e. the p-value is ~30% which is too high to reject the hypothesis. I am not very knowledgeable about the maths behind t-test but I don't understand how this result can be correct. Does anyone have any ideas?

5
  • Did you intentionally include the value 9999 in your sample, or did you mean for that to be 99, 99? Commented Oct 3, 2018 at 14:48
  • I intentionally put in some outliers... Am I breaking something because my sample isn't really normally distributed? Commented Oct 3, 2018 at 14:52
  • 1
    As @MiguelSantos points out in the answer below, that large value results in your sample having a large variance, which gives a low t-statistic, and therefore a high p value. Commented Oct 3, 2018 at 14:58
  • 2
    Your standard deviation for that sample is 2753.88 the mean is 834.16 (both rounded) 10.0 is within 1 standard deviation form the mean of your population. Hence you can not reject null hypothesis. Extremely simplified version. Commented Oct 3, 2018 at 14:58
  • Interesting... I did a simple calculation on paper. Indeed, t-value tv = (mean-expectedMean)/sqrt(var/n) is ~1.123 which is lower than t-value even for alpha of 10% on student's distribution with 12 degrees of freedom. Commented Oct 3, 2018 at 15:04

1 Answer 1

1

I performed the test using R just to check if the results are the same and they are:

t.test(x=c(99, 99, 22, 77, 99, 55, 44, 33, 20, 9999, 99, 99, 99), alternative = "two.sided", 
mu = 10, paired = FALSE, var.equal = FALSE, conf.level = 0.95)

data:  c(99, 99, 22, 77, 99, 55, 44, 33, 20, 9999, 99, 99, 99)
t = 1.079, df = 12, p-value = 0.3018
alternative hypothesis: true mean is not equal to 10
95 percent confidence interval:
-829.9978 2498.3055
sample estimates:
mean of x 
 834.1538 

You can see that the p-value is 0.3. This is a really interesting problem, I have a lot of issues with Hypothesis testing. First of all the sample size influences a lot, if u have a big sample size, lets say 5000 values, minor deviations from the expected value that you are testing will influence a lot the p-value, and so you will reject the null hypothesis most of the times, having small samples does the opposite. And what is happening here is that you have a high variance in the data.

If you try to replace your data from [99, 99, 22, 77, 99, 55, 44, 33, 20, 9999, 99, 99, 99]

To [99, 99, 99, 99, 100, 99, 99, 99, 99, 100, 99, 100, 100]

So it has a really small variance, your p-value will be a lot smaller, even tho the mean of this one is probably closer to 10.

Sign up to request clarification or add additional context in comments.

2 Comments

So is the conclusion here that the sample size is simply too small to conduct a meaningful t-test?
The sample size is too small and the variance is too big both of those are pushing the p-value up. The higher the variance in your data the higher the sample size you want to have to be able to perform a good test

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.