The following diagram is from Experimental Realization of a Quantum Autoencoder: The Compression of Qutrits via Machine Learning

How this setup works:
What will be obtained from this set up is the state vector of the input light (I don't use the term 'photon' because it might not be necessary) just before entering stage (c) and its state vector when exiting stage (c). Due to the |H> |V> two different polarization states and two different optical paths (upper one and lower one), there are three bases states (notice that this beam displacer only deflects |v> polarized light so not four bases), so its state vector is 3-dimensional at the input of (c) and is expected to be 2-dimensional at the output of (c) because the autoencoder part is supposed to compress the state vector such that the upper path basis is always 0 valued.
My question is, why is it necessary to use single photon for this experiment?
In order to distinguish the statistics of different state vectors, I can just use something to measure the intensity of the output light at the two optical path; specifically for the lower path at output, I use another beam displacer to separate V and H light and measure their intensities separately. With knowing those intensities, I can get the coefficients of each basis and hence known exactly the state vector, or can I?