What are some statistical tests for exchangeability of a data set?

Question

The representation theorem of de Finetti is seen by some as motivation for the use of Bayesian and/or hierarchical modeling. In some settings, it may be plausible to assume measurements are exchangeable, but in others, this is not necessarily a straightforward assumption. How does one decide if data are exchangeable, for instance, with a test? A cursory search has not yielded much on this topic, but I'd appreciate feedback if such a literature exists.

Kodiologist · Accepted Answer · 2016-08-09 23:31:39Z

10

The theorem in question tells us that exchangeability is equivalent to being conditionally IID. Hence, in practice, data analysts consider the same things when deciding whether observations are exchangeable as when deciding whether they're (conditionally) independent. The basic approach is to treat as a covariate anything that might account for dependencies between observations, and hope that one has conditioned on enough things to make the observations sufficiently close to independence for one's purposes.

If this seems a little slapdash, keep in mind two things:

As with most null hypotheses, it's virtually certain that the observations aren't actually independent. Hence, a hypothesis test would be of dubious value.
Conditionally independent sampling, or something like it, is a basic philosophical requirement for scientific research, and, more generally, learning about the world. Without it, we wouldn't be able to observe anything more than once, and thus, we wouldn't be able to extrapolate beyond the literal facts we've already observed. Ultimately, conditionally independent sampling isn't something we demonstrate or observe but a basic epistemic commitment we have to make in order to reason meaningfully about the real world.

edited Aug 9, 2016 at 23:31

answered Aug 9, 2016 at 22:19

Kodiologist

20.9k2 gold badges45 silver badges80 bronze badges

5

$\begingroup$ (+1) Especially: " Ultimately, conditionally independent sampling isn't something we demonstrate or observe but a basic epistemic commitment we have to make in order to reason meaningfully about the real world. " Cannot be said often enough. $\endgroup$

kjetil b halvorsen
– kjetil b halvorsen ♦

2017-08-29 21:10:34 +00:00
Commented Aug 29, 2017 at 21:10
$\begingroup$ Although this is an insightful answer (+1), it can still be seen as informative whether data show clear evidence against independence or not, because the impact of violations of independence on inference assuming it will depend on how strong the violation actually is (and what it is). True, the H0 of independence will not precisely hold, but the distinction between data that show no evidence against it and data that do is still a meaningful one. (The major implication of 1 is that too large samples will reject the H0 even in case of certain minor violations that are in fact tolerable.) $\endgroup$

Christian Hennig
– Christian Hennig

2025-11-14 10:22:50 +00:00
Commented Nov 14 at 10:22
$\begingroup$ I add that there is no guarantee that the distinction between "data show evidence against independence in a test" vs. "data don't show such evidence" is the same as the distinction between "violation of independence is problematic" and "it is not problematic". But this doesn't mean that the information in such a test is worthless. $\endgroup$

Christian Hennig
– Christian Hennig

2025-11-14 10:25:05 +00:00
Commented Nov 14 at 10:25
$\begingroup$ We don't "have to make this epistemic commitment" all the time as there are options to model dependence, and we may use data to decide whether we use them. (Of course all I wrote applies also to conditional independence, i.e., exchangeability.) $\endgroup$

Christian Hennig
– Christian Hennig

2025-11-14 10:26:26 +00:00
Commented Nov 14 at 10:26

Add a comment |

kjetil b halvorsen · Accepted Answer · 2025-11-13 20:14:00Z

Exchangeability is generally tested by permutation tests (e.g., runs tests) which look at the number of "runs" in the sequence and compare it to its distribution under exchangeability. Remember that under the assumption of exchangeability, all $n!$ permutations of the $n$ observed values are equally probable, and so we can use this fact to simulate the distribution of any "run" statistic under the assumption of exchangeability. Runs tests generally define the "runs" statistic differently according to whether you have discrete/continuous data. For some of these there are exact or approximate distributions under exchangeability that are well-known, and so you can test without simulation. In cases with complicated "run" statistics you can proceed via simulation.

Simulation of runs-test with "runs up and down": One possible test that can be applied with any kind of data (though it is most sensible for data that are at least ordinal) is based on testing the "runs up and down" in an observed set of data values. For an observed sequence of values $x_1, \dotsc, x_n$ the number of "runs up and down" is defined as:

$$R(\boldsymbol{x}) = 1 + \sum_{i=3}^{n} \Big[ \mathbb{I}(x_{i} \geqslant x_{i-1}) \mathbb{I}(x_{i-1} \geqslant x_{i-2}) + \mathbb{I}(x_{i} < x_{i-1}) \mathbb{I}(x_{i-1} < x_{i-2}) \Big].$$

This statistic can be simulated under exchangeability by generating a large number of permutations $\boldsymbol{x}^{(1)}, \dotsc, \boldsymbol{x}^{(k)}$ of the observed sample vector (random ordering) and calculating the corresponding "runs" statistics $r^{(1)}, \dotsc, r^{(k)}$ for these permutations. You can then obtain an estimated p-value for the test by using the distribution of these simulated values to calculate the probability of seeing a run at least as "extreme" as what you actually observed.

Implementation in R: Consider an example where we observe the sample vector:

$$\boldsymbol{x} = (5, 1, 2, 1, 5, 6, 8, 2, 4, 8, 9, 10, 4, 2).$$

We want to test to see if this came from an exchangeable distribution. This particular outcomes has $R(\boldsymbol{x}) = 7$ runs up and down. We can use R to simulate from the distribution of this statistic under exchangeability and perform a runs tests as follows:

# Define the vector of observed values
x <- c(5, 1, 2, 1, 5, 6, 8, 2, 4, 8, 9, 10, 4, 2);

#Define a function to calculate the runs for an input vector
RUNS <- function(x) { n <- length(x);
                      S <- rep(0, n-1);
                      for (i in 1:(n-1)) { S[i] <- (x[i+1] >= x[i]); }
                      1 + sum(S[1:(n-2)] != S[2:(n-1)]); }

#Simulate the runs statistic for k permutations
k <- 10^5;
set.seed(12345);
RR <- rep(0, k);
for (i in 1:k) { x_perm <- sample(x, length(x), replace = FALSE);
                 RR[i] <- RUNS(x_perm); }

#Generate the frequency table for the simulated runs
FREQS <- as.data.frame(table(RR));

#Calculate the p-value of the runs test
R      <- RUNS(x);
R_FREQ <- FREQS$Freq[match(R, FREQS$RR)];
p      <- sum(FREQS$Freq*(FREQS$Freq <= R_FREQ))/k;

# Plot estimated distribution of runs with test
library(ggplot2);
ggplot(data = FREQS, aes(x = RR, y = Freq/k, fill = (Freq <= R_FREQ))) +
geom_bar(stat = 'identity') +
geom_vline(xintercept = match(R, FREQS$RR)) +
scale_fill_manual(values = c('Grey', 'Red')) +
theme(legend.position = 'none') +
labs(title ='Runs Test - Plot of Distribution of Runs',
     subtitle = paste0('(Observed runs is black line, p-value = ', p, ')'),
     x = 'Runs', y = 'Estimated Probability');

This generates the following plot showing the estimate null distribution (under the assumption that the underlying distribution is exchangeable) and the p-value for the test:

In this case we see that the p-value is not very low, and hence there is insufficient evidence to reject the null hypothesis that this vector came from an exchangeable distribution.

Jonas Mueller · Accepted Answer · 2025-11-12 22:26:28Z

As others alluded to, permutation-testing can be useful here. But what exact test statistic you permute matters a lot. The task of determining whether some data are exchangeable (or IID) is theoretically impossible in its general form. From limited data, one can only detect certain classes of alternative hypotheses, which will be determined by your choice of test statistic (as will the power of the resulting statistical test).

Here are some publications on practical statistical tests you can run on a dataset:

Testing Independence of Exchangeable Random Variables

A General Test for Independent and Identically Distributed Hypothesis

Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors

This test is implemented in the cleanlab open-source library I helped build, which will simultaneously check your dataset for all sorts of statistical issues.

Stack Exchange Network

What are some statistical tests for exchangeability of a data set?

3 Answers 3

Your Answer

Linked

Hot Network Questions

What are some statistical tests for exchangeability of a data set?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions