We need find an intensity transformation $h$ s.t. $p_h(z) = \frac{1}{256}$ for all $z \in [0, 256]$.
This means the CDF (or integral) $F_h$ of $h$ is $F_h(z) = \int_0^z p_h(z')dz' = \int_0^z \frac{1}{256} = \frac{z}{256}$. I.e., it's (not surprisingly) linearly growing from 0 to 1.
Okay now, we have reformulated the problem in terms of the CDF and can work with that.
Let $z = h(y)$, where $y=f(x)$ for some pixel $x$.
$y$ is distributed according to the CDF $F_f$.
$F_h(z) = \int_0^z p_h(z')dz'$ can also be interpreted as the probability $Pr(z'=h(y') \in [0, z])$.
This is the same as $Pr(h^{-1}(z')=y'\in [h^{-1}(0), h^{-1}(z)])$, at least under the assumption that $h$ is invertible.
The latter probability is nothing else than $\int_{h^{-1}(0)}^{h^{-1}(z)} p_f(y')dy' $. For the assumption $h^{-1}(0) = 0$, we get $F_f(h^{-1}(z))$.
Let's recap what we got so far:
Under the constraints that $h$ is invertible and $h^{-1}(0) = 0$ (or equivalently $h(0) = 0$), we get:
$\frac{z}{256} = F_h(z) = F_f(h^{-1}(z))$
Now, we can solve s.t. we write this in terms of $h$, not $h^{-1}$.
This gives $h(F_f^{-1}(\frac{z}{256})) = z$.
Now, we can see how we must define $h$ s.t. it will fulfill this equation:
$h$ must first undo $F_f^{-1}$, and then undo the division by 256. This would give $z=z$, which is true.
So, $h(y) = (F_f^{-1})^{-1}(y)\cdot 256 = F_f(y)\cdot 256$.
Now, just make sure that all assumptions that we made on the way about $h$ are indeed true:
- $h$ is invertible: True, because it's a composition of F_f(y) and multiplication by 256, both of which are invertible.
- $h(0)=0$: True, because $F_f(0)=0$
Therefore, $\mathbf{h(y) = F_f(y)\cdot 256}$ is actually the correct and unique result.
The intuition behind this transformation is that intensities $y$ are mapped to its corresponding CDF value $F_f(y)\in [0,1]$ and then scaled to the range [0, 256]. The $F_f(y)$ can be interpreted as the percentage of pixels with intensities in the image that are at most $y$. It then makes sense to map the pixel with intensity $y$ to the value corresponding to this percentage. If we do that, each intensity level is used evenly.
Note that this only is true in the ideal case where each pixel maps to a different intensity value. If we have multiple pixels with the same intensity, they will also be transformed to the same value through $h$, and the resulting CDF of the transformed pixel intensities is not exactly a linear function, but more like an approximate linear function that looks like a staircase, with jumps depending on the number of pixels with the same original intensity value. An extreme case is if you have a black image, then histogram equalization will not spread out the pixel intensities at all.