One sample & Two Sample t-tests in Python

Question

I have a doubt here on how to work this. New to the world of Stats and Python. A student is trying to decide between two Processing Units. He want to use the Processing Unit for his research to run high performance algorithms, so the only thing he is concerned with is speed. He picks a high performance algorithm on a large data set and runs it on both Processing Units 10 times, timing each run in hours. Results are given in the below lists TestSample1 and TestSample2.

from scipy import stats 
import numpy as nupy
TestSample1 = nupy.array([11,9,10,11,10,12,9,11,12,9])
TestSample2 = nupy.array([11,13,10,13,12,9,11,12,12,11])

Assumption: Both the dataset samples above are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests

First T test One sample t-test Check if the mean of the TestSample1 is equal to zero.

Null Hypothesis is that mean is equal to zero.
Alternate hypothesis is that it is not equal to zero.

Question 2 Given, 1. Null Hypothesis : There is no significant difference between datasets 2. Alternate Hypothesis : There is a significant difference Do two-sample testing and check whether to reject Null Hypothesis or not.

Question 3 - Do two-sample testing and check whether there is significant difference between speeds of two samples: - TestSample1 & TestSample3

He is trying a third Processing Unit - TestSample3.

TestSample3 = nupy.array([9,10,9,11,10,13,12,9,12,12])

Assumption: Both the datasets (TestSample1 & TestSample3) are random, independent, parametric & normally distributed

It is unclear what exactly your question is, but here are some remarks: In any case, a one-sample test is inappropriate for your scenario, because as I understand it your test runs are independent. Also, a null hypothesis of the mean time being zero is clearly nonsensical. It is confusing to mention "differences between datasets", because judging from your initial paragraph, you let all tests run on the same dataset. By the way, the standard convention is to import numpy as np. Maybe you could streamline your question a bit. — Arne
– Arne, Commented Mar 13, 2020 at 8:53
The first question is how to doi a 1 sample tTest on the given sample of 1 and 2. The second is is how to do a 2 sample test on Test samples to check if there is a significant difference in speeds... — NottyHead
– NottyHead, Commented Mar 14, 2020 at 3:55

Arne · Accepted Answer · 2020-03-14 15:21:10Z

Question 1

The way to do this with SciPy would be this:

stats.ttest_1samp(TestSample1, popmean=0)

It is not a useful test to perform in this context though, because we already know that the null hypothesis must be false. Negative times are impossible, so the only way for the population mean of times to be zero would be if every time measured were always zero, which is clearly not the case.

Question 2

Here's how to do a two-sample t-test for independent samples with SciPy:

stats.ttest_ind(TestSample1, TestSample2)

Output:

Ttest_indResult(statistic=-1.8325416653445783, pvalue=0.08346710398411555)

So the t-statistic is -1.8, but its deviation from zero is not formally significant (p = 0.08). This result is inconclusive. Of course it would be better to have more precise measurements, not rounded to hours.

In any case, I would argue that given your stated setting, you do not really need this test either. It is highly unlikely that two different CPU perform exactly the same, and you just want to decide which one to go with. Simply choosing the one with the lower average time, regardless of significance test results, is clearly the right decision here.

Question 3

This is analogous to Question 2.

Collectives™ on Stack Overflow

One sample & Two Sample t-tests in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related