Multiple comparisons with uncertain number of tests

Question

In metagenomics one typically collects data representing gene/species counts (or their proportion) in an individual. One then performs comparisons for every gene/species in groups of individuals affected by different conditions, with the aim to determine which of the genes/species are characteristic of a specific condition. This necessitates correction for multiple comparisons / false discovery rate.

One of the complicating issues here is that with increasing the depth of metagenomic sequencing one is bound to discover more and more species, increasing the number of comparisons, which renders the correction for multiple testing somewhat arbitrary. Alternatively, a common practice is to filter the species with low prevalence, thus reducing the number of comparisons.

I have recently found works that forgo the correction for multiple comparisons altogether - instead simply selecting features passing certain threshold, and then ranking them in terms of importance (using some kind of regression or machine learning approach.) This struck me as particularly odd due to the use of the unadjusted p-values for the initial feature selection, although I suppose this contradiction could be bypassed using Bayesian approach.

I would like to get a clearer perspective on this issue: both frequentist and Bayesian: when/whether/how to correct for multiple comparisons, when the number of features is variable or could be subject to prior selection/filtering.

(Worth mentioning that other complication is the compositionality of the data, due to the finite sequencing depth and/or working with relative abundances. Perhaps, independent tests are simply not applicable here... but they are likely to persist, since the number of features is often larger than the number of samples, and needs to be reduced.)

Can you clarify what your question is? It sounds like you're asking for an overview of Frequentist and Bayesian multiplicity adjustments. That would probably be too broad for a question here. — Eli
– Eli, Commented 14 hours ago
The first thing to have in mind is that tests are asymmetric and treat the null and alternative hypothesis in different ways. The sharper multiple comparison corrections are, the more power to detect differences you lose. So this is a trade-off, and therefore there is no general recommendation. The researchers need to decide what exactly they want to control, and this comes at a cost. Furthermore it is very legitimate to wonder whether you need hypothesis tests at all. Choosing an optimal subset of features, say, for prediction, is not governed by the standard error probabilities of tests. — Christian Hennig
– Christian Hennig, Commented 14 hours ago
It is puzzling that this question suggests the reason or basis for multiple comparisons is the number of features--but that has nothing at all to do with it. The objective is to control an overall decision rate, and this depends on how many decisions you will make with the data and (more subtly) on how interdependent those decisions might be. Given that the decisions are the fundamental objective of the analysis, how is it possible that you could not know how many are being made? Are you really probing for opinions about sequential or adaptive procedures? — whuber
– whuber ♦, Commented 13 hours ago
@whuber I am looking for assistance that professional statisticians could give to a non-statctician, not for a technical discussion aming experts (which I am not.) If somebody can clarify this issue, suggest references, or more rigirous procedures - I am a taker. — Roger V.
– Roger V., Commented 12 hours ago
BTW, I don't see what's "technical" about pointing out that in multiple comparisons considerations the number of decisions matters instead of the number of variables. That seems pretty straightforward and basic, requiring no jargon or special concepts to communicate or appreciate. — whuber
– whuber ♦, Commented 11 hours ago

Michael Lew · Accepted Answer · 2025-11-21 20:31:42Z

4

Following on from the interchange between Roger V and @Whuber, there seems to be an easy answer. I am very glad that no answers to the question have been given prior to it coming out that the research in question is preliminary, exploratory, hypothesis generating research. It has long been a frustration to me that so many questions regarding 'corrections' for multiplicity are answered with no knowledge of the type of research in question.

Where research is directed at hypothesis generation there is never a need to 'correct' or adjust the statistical results for multiple tests. That is because there will naturally be future tests of the generated hypotheses (the most interesting ones anyway) with new data from potentially better, or more efficient, experimental designs before any definitive 'decisions' will be made.

The multiple testing in the original exploratory research will not lead to any inflation of false positive errors or false discoveries because it is the follow-on experiments that lead to the 'decisions' and thus, errors. (I would suggest that the 'error' in choosing to do fresh experiments to test a hypothesis that comes from the preliminary exploratory research cannot be an error in the sense of frequentist type I and type II errors.)

answered 11 hours ago

Michael Lew

18.5k2 gold badges46 silver badges79 bronze badges

$\begingroup$ This is reasonable, +1, but I take an intermediate view as hinted in one of my comments to the question: if we develop too many hypotheses--and experience tells us most of them will be false--there is a real cost to pursuing them all. Whence there's a need even during exploration to winnow and prioritize the options. But the very meaning of "prioritize" implies there is some system for evaluating the merits of the hypotheses (as well as the costs of testing them), and that goes to your initial point that there cannot be a solely statistical solution to that problem. $\endgroup$

whuber
– whuber ♦

2025-11-21 20:56:33 +00:00
Commented 11 hours ago
$\begingroup$ @Whuber Yes, I agree. When preliminary research turns up a lot of hypotheses that might be tested with new experiments the researchers have to decide which of them are most interesting or most compelling on the basis of all of the available information. Usually they will not follow up on everything. That is not often enough described by statisticians. $\endgroup$

Michael Lew
– Michael Lew

2025-11-22 01:18:32 +00:00
Commented 6 hours ago
$\begingroup$ Thank you, this clarifies a lot. Could you suggest a relevant book or an article? $\endgroup$

Roger V.
– Roger V.

2025-11-22 07:48:45 +00:00
Commented 22 mins ago

Add a comment |

Stack Exchange Network

Multiple comparisons with uncertain number of tests

1 Answer 1

Your Answer

Hot Network Questions

Multiple comparisons with uncertain number of tests

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions