I have two groups of participants, a test group and a control group, both of whom have completed pre/post activity surveys. For the test group, I can match their pre and post surveys to get change at the individual level. However for ethical reasons the control group were anonymised, so although I can see all pre/post scores, I cannot match them so cannot see change at the individual level. I have all raw data. What is the best way to approach the test/control comparison for change? Thank you!
2 Answers
The question is: what do you want to compare?
Since averaging is linear, it is quite clear that the mean change between groups will be unaffected, whether you have information on individual, paired pre- and post-treatment measurements, or just two (identically-sized but unpaired, potentially permuted) sub-cohorts, because:
$$ \mathbb{E}[x - y] = \frac{1}{n} \sum_i (x_i - y_i) = \left(\frac{1}{n} \sum_i x_i\right) - \left(\frac{1}{n} \sum_i y_i\right) = \mathbb{E}[x] - \mathbb{E}[y]. $$
So if you define this difference as the effect size, then you are already good to go, you don't need the pairing.
If, however, you are planning on conducting an analysis that does require the pairing information, you are simply not going to be able to do that. For example, you won't be able to run a paired t-test, which is often one of the things people want to do by default in similar situations.
Whether or not that is a problem may be an entirely different question. However, there is one thing you can do: compare the results of a paired and an unpaired test on the subgroup that does have the necessary pairing information. If these do not differ too much (at least qualitatively leading to the same conclusion), then you likely wouldn't gain much by knowing the paired data for the other subgroup, either.
You can even perform a numerical, randomized simulation whereby you intentionally permute the known pairs, so as to check whether the result of the actual paired test is consistent with a distribution of randomly-paired differences; this will show whether there is a meaningful dependence structure in the data that you would need to worry about.
-
1$\begingroup$ +1 look for permutation test and permutation test for paired data $\endgroup$Ggjj11– Ggjj112025-10-08 16:58:06 +00:00Commented Oct 8 at 16:58
-
$\begingroup$ Thanks, I appreciate your help! $\endgroup$NorthernLight– NorthernLight2025-10-10 11:03:28 +00:00Commented Oct 10 at 11:03
-
$\begingroup$ @NorthernLight You're welcome. You can consider accepting the answer that helped you. $\endgroup$arpad– arpad2025-10-10 13:11:01 +00:00Commented Oct 10 at 13:11
If you do not have the information to establish pairing in the control group, then you do not have it, and can not "conjure it", and therefore can not used paired tests for your analysis.
Note that I do not quite understand why preserving anonymity meant that pairing data was not kept for the control group (however anonimity did not matter for the treatment group?). There are many ways anonimity could have been preserved for both groups, while maintaining pairing (e.g. use unique study ID's, assigned by the same folks who, e.g., collected informed consent...). In any case...
Now you say that you have "pre/post activity surveys.". This feels to me to imply that you used Likert-type items? In this case, do not feel to bad, because computing paired differences on such Likert questionnaires is a highly debatable practice. You can refer to this CV post; I will summarize its main points below.
If you indeed used Likert items, you can not treat this data as interval scale data (it is ordinal scale data!). Performing any arithmetic on such ordinal data is not mathematically valid; it assumes that the distance say, between Strongly agree and Agree, and Agree and Neither agree or disagree (5 to 4, and 4 to 3) is the same; it is clearly not, and actually may be different for different subjects. So any arhitmetic is mathematical nonsense. If you had used a VAS (Visual Analogue Scale), there could have been a case made to treat it as interval scale, but that is not your case, and would be a topic for another post. Likert is purely an ordinal scale (Strongly agree is "more" than agree, and that is all we can say about it; how much "more" is it, we do not know). It is a very poor practice to assign numbers to Likert scale levels, which induces researchers to treat them as interval/ratio scales, which they are definitively not; but that is a rant which I will also leave for another post.
So what can you do with your un-paired, Likert-scale data? The best you can do is compare pre to post, both for Control and Treatment. Hopefully, you can show that the responses did not change much for Control, but changed significantly for Treatment.
How to compare? You have a few options.
- Mood's median test. This is a test of the medians; you can compare pre to post for both groups, or pre to pre, and post to post, or even compare all 4 in a single test (Mood's test can compare n ordinal variables). Depending on your survey, you can do this question by question (but then have to deal with multiple comnparison corrections, which could drastically lower your $\alpha$), or you could aggregate all questions (you may need to rescore some questions, so the scale has the same direction for all), or could aggregate by sub-group of questions on different topics.
- Brunner Munzel test (BMt). It is similar to the Mann-Whitney U test (MWUt), but does not suffer from the Behrens-Fisher problem, hence is preferred. This test can only test 2 variables at a time. But... It is NOT a test of median (as it is sadly too often described: it can only be considered a test of medians under some stringent assumptions, which are unlikely to hold in your case). It is a test of stochastic superiority (not dominance!); Man & Whitney titled their paper "On a test of whether 1 of 2 random variables is stochastically larger than the other". The alternative is that, $P(Treatment>Control)>P(Control>Treatment)$; i.e. the After scores are greater for Treatment than for Control, significantly more often than not. And that may be a good demonstration that the treatment had an effect.
- Kruskal-Wallis test (KWt). It is similar to BMt or MWUt (stochastic superiority), but for more than 2 groups. You can then follow with Dunn's post-hoc test.
Which to use? I would probably use several BM tests. KWt could be OK, but it suffers from the Behrens-Fisher problem, and also from a lack of transitivity. Now, you could try both, and if KWt gives you similar results as BMt (as it should), then these 2 problems do not apply to your data, and you could report only the KWt results.
The above are general comments/hints. There could be more subtle details to handle, but that would depend on details not in the original post (purpose and nature of the treatment, exact type and number of questions in the survey, hypothesis being tested, subjects being tested, etc.)
-
$\begingroup$ This is amazingly helpful, thank you so much! I will do lots of reading and thinking. $\endgroup$NorthernLight– NorthernLight2025-10-10 11:02:49 +00:00Commented Oct 10 at 11:02