6
$\begingroup$

The representation theorem of de Finetti is seen by some as motivation for the use of Bayesian and/or hierarchical modeling. In some settings, it may be plausible to assume measurements are exchangeable, but in others, this is not necessarily a straightforward assumption. How does one decide if data are exchangeable, for instance, with a test? A cursory search has not yielded much on this topic, but I'd appreciate feedback if such a literature exists.

$\endgroup$

3 Answers 3

10
$\begingroup$

The theorem in question tells us that exchangeability is equivalent to being conditionally IID. Hence, in practice, data analysts consider the same things when deciding whether observations are exchangeable as when deciding whether they're (conditionally) independent. The basic approach is to treat as a covariate anything that might account for dependencies between observations, and hope that one has conditioned on enough things to make the observations sufficiently close to independence for one's purposes.

If this seems a little slapdash, keep in mind two things:

  1. As with most null hypotheses, it's virtually certain that the observations aren't actually independent. Hence, a hypothesis test would be of dubious value.

  2. Conditionally independent sampling, or something like it, is a basic philosophical requirement for scientific research, and, more generally, learning about the world. Without it, we wouldn't be able to observe anything more than once, and thus, we wouldn't be able to extrapolate beyond the literal facts we've already observed. Ultimately, conditionally independent sampling isn't something we demonstrate or observe but a basic epistemic commitment we have to make in order to reason meaningfully about the real world.

$\endgroup$
4
  • 5
    $\begingroup$ (+1) Especially: " Ultimately, conditionally independent sampling isn't something we demonstrate or observe but a basic epistemic commitment we have to make in order to reason meaningfully about the real world. " Cannot be said often enough. $\endgroup$ Commented Aug 29, 2017 at 21:10
  • $\begingroup$ Although this is an insightful answer (+1), it can still be seen as informative whether data show clear evidence against independence or not, because the impact of violations of independence on inference assuming it will depend on how strong the violation actually is (and what it is). True, the H0 of independence will not precisely hold, but the distinction between data that show no evidence against it and data that do is still a meaningful one. (The major implication of 1 is that too large samples will reject the H0 even in case of certain minor violations that are in fact tolerable.) $\endgroup$ Commented Nov 14 at 10:22
  • $\begingroup$ I add that there is no guarantee that the distinction between "data show evidence against independence in a test" vs. "data don't show such evidence" is the same as the distinction between "violation of independence is problematic" and "it is not problematic". But this doesn't mean that the information in such a test is worthless. $\endgroup$ Commented Nov 14 at 10:25
  • $\begingroup$ We don't "have to make this epistemic commitment" all the time as there are options to model dependence, and we may use data to decide whether we use them. (Of course all I wrote applies also to conditional independence, i.e., exchangeability.) $\endgroup$ Commented Nov 14 at 10:26
5
$\begingroup$

Exchangeability is generally tested by permutation tests (e.g., runs tests) which look at the number of "runs" in the sequence and compare it to its distribution under exchangeability. Remember that under the assumption of exchangeability, all $n!$ permutations of the $n$ observed values are equally probable, and so we can use this fact to simulate the distribution of any "run" statistic under the assumption of exchangeability. Runs tests generally define the "runs" statistic differently according to whether you have discrete/continuous data. For some of these there are exact or approximate distributions under exchangeability that are well-known, and so you can test without simulation. In cases with complicated "run" statistics you can proceed via simulation.

Simulation of runs-test with "runs up and down": One possible test that can be applied with any kind of data (though it is most sensible for data that are at least ordinal) is based on testing the "runs up and down" in an observed set of data values. For an observed sequence of values $x_1, \dotsc, x_n$ the number of "runs up and down" is defined as:

$$R(\boldsymbol{x}) = 1 + \sum_{i=3}^{n} \Big[ \mathbb{I}(x_{i} \geqslant x_{i-1}) \mathbb{I}(x_{i-1} \geqslant x_{i-2}) + \mathbb{I}(x_{i} < x_{i-1}) \mathbb{I}(x_{i-1} < x_{i-2}) \Big].$$

This statistic can be simulated under exchangeability by generating a large number of permutations $\boldsymbol{x}^{(1)}, \dotsc, \boldsymbol{x}^{(k)}$ of the observed sample vector (random ordering) and calculating the corresponding "runs" statistics $r^{(1)}, \dotsc, r^{(k)}$ for these permutations. You can then obtain an estimated p-value for the test by using the distribution of these simulated values to calculate the probability of seeing a run at least as "extreme" as what you actually observed.

Implementation in R: Consider an example where we observe the sample vector:

$$\boldsymbol{x} = (5, 1, 2, 1, 5, 6, 8, 2, 4, 8, 9, 10, 4, 2).$$

We want to test to see if this came from an exchangeable distribution. This particular outcomes has $R(\boldsymbol{x}) = 7$ runs up and down. We can use R to simulate from the distribution of this statistic under exchangeability and perform a runs tests as follows:

# Define the vector of observed values
x <- c(5, 1, 2, 1, 5, 6, 8, 2, 4, 8, 9, 10, 4, 2);

#Define a function to calculate the runs for an input vector
RUNS <- function(x) { n <- length(x);
                      S <- rep(0, n-1);
                      for (i in 1:(n-1)) { S[i] <- (x[i+1] >= x[i]); }
                      1 + sum(S[1:(n-2)] != S[2:(n-1)]); }

#Simulate the runs statistic for k permutations
k <- 10^5;
set.seed(12345);
RR <- rep(0, k);
for (i in 1:k) { x_perm <- sample(x, length(x), replace = FALSE);
                 RR[i] <- RUNS(x_perm); }

#Generate the frequency table for the simulated runs
FREQS <- as.data.frame(table(RR));

#Calculate the p-value of the runs test
R      <- RUNS(x);
R_FREQ <- FREQS$Freq[match(R, FREQS$RR)];
p      <- sum(FREQS$Freq*(FREQS$Freq <= R_FREQ))/k;

# Plot estimated distribution of runs with test
library(ggplot2);
ggplot(data = FREQS, aes(x = RR, y = Freq/k, fill = (Freq <= R_FREQ))) +
geom_bar(stat = 'identity') +
geom_vline(xintercept = match(R, FREQS$RR)) +
scale_fill_manual(values = c('Grey', 'Red')) +
theme(legend.position = 'none') +
labs(title ='Runs Test - Plot of Distribution of Runs',
     subtitle = paste0('(Observed runs is black line, p-value = ', p, ')'),
     x = 'Runs', y = 'Estimated Probability'); 

This generates the following plot showing the estimate null distribution (under the assumption that the underlying distribution is exchangeable) and the p-value for the test:

enter image description here

In this case we see that the p-value is not very low, and hence there is insufficient evidence to reject the null hypothesis that this vector came from an exchangeable distribution.

$\endgroup$
4
$\begingroup$

As others alluded to, permutation-testing can be useful here. But what exact test statistic you permute matters a lot. The task of determining whether some data are exchangeable (or IID) is theoretically impossible in its general form. From limited data, one can only detect certain classes of alternative hypotheses, which will be determined by your choice of test statistic (as will the power of the resulting statistical test).

Here are some publications on practical statistical tests you can run on a dataset:

Testing Independence of Exchangeable Random Variables

A General Test for Independent and Identically Distributed Hypothesis

Detecting Dataset Drift and Non-IID Sampling via k-Nearest Neighbors

  • This test is implemented in the cleanlab open-source library I helped build, which will simultaneously check your dataset for all sorts of statistical issues.
$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.