I am looking for a practical(!) way to create many different random partitions of a large set, and then to identify efficiently into which partition some elements of the set belong.
Details
$\Omega$ is a Cartesian product of $i=1,\ldots,d$ finite sets $\Omega_i=\{1,\ldots,k_i\}$ and $ S\subset\Omega$ a subset. I would like to
- Create random partitions of $\Omega$ into parts $A_1, \ldots,A_m$ (i.e. the sets are mutually disjoint and their union is $\Omega$), where each part has roughly the same number of elements.
- Determine for each element $x\in S$ the number $j$ such that $x\in A_j.$
A natural way to implement this is finding functions $f_t:\Omega\rightarrow \{1,\ldots,m\}$ which are easy to evaluate and have the property that the pre-images $f_t^{-1}(j)$ have roughly equal size. One example for functions which have exactly those properties are the projections $\pi_i:\Omega\rightarrow \Omega_i.$ My problem is that there are only $d$ of those projections, but I need many $f_t$.
I am not looking for theoretical solutions but for practical, implementable ones. The big issue is the large size of $\Omega$ due to the combinatorial explosion of elements. To provide some rough idea of order of magnitudes: The number of factors is maybe double digits, as well as the cardinality of the factors $\Omega_i$; so $\Omega$ is huge ($>10^{20}$ say) but the size of $S$ is much smaller, e.g. in the 100 thousands. Partitions with $m=2$ are completely sufficient, larger $m$ just nice to have. I would like to be able to generate maybe 1000 to 10'000 functions $f_t.$ For close $t$ the functions should not be too correlated.
All randomness is understood with respect to the counting measure, i.e. where probability of $ A\subset\Omega$ is the number of elements in $A$ divided by the number of elements in $\Omega$.
Questions
- Are Hash-functions good candidates for my $f_t$?
- What family of hash functions would be best suited?
- And how would I generate those hash functions?
Or is there another, better approach to the problem?