Under the x86-16 architecture the segments are massively overlapping. They have a distance of 16 bytes (but a size of 64KiB). Each memory location can be addressed by 4096 combinations of segment address and offset.
The actual memory location can be calculated as
location = 16 * segment + offset
Putting 0 into SS generally makes the first 64KiB of memory accessible. Multiplying BIOSSEG with 16 and using it as the offset is the actual trick. It sets the offset address of the stack, which is stored in the stack pointer, to the beginning of the BIOS data segment. The stack extends downwards from there.
The start of the BIOS data is at memory location 400h = 16 * 40h + 0h.
So 40h:0h is equivalent to 0h:400h. But you can only use the second notification for the stack (of the two shown, you could e.g. also use 1h:3f0h or 10h:300h to the same effect) to get a positive offset address (for SP) and being able to access addresses downwards.
Normally the IVT (Interrupt Vector Table) is stored at the beginning of the memory. It contains 256 adresses (for the 256 interrupts) with 4 bytes each, taking 256 * 4 bytes = 400h.
So theoretically there would be no room for a stack.
Either the IVT is moved for this system in some way (if this is possible) or the higher interrupts are overwritten by the stack. The higher interrupts are only used by software, whereas the CPU and hardware typically use interrupts 0h to fh and 70h to 77h, making the upper 88h interrupts potentially overwritable, if not called by any software with the int instruction.
So the stack could have 4 bytes * 88h = 520 bytes.
Typically the 256 bytes beginning at 30h:0 are used at POST and early bootup for a stack, overwriting the interrupt vectors for interrupts C0h to FFh.
Alternatively hardware interrupts can be temporarily deactivated by a flag except the non-maskable interrupt (NMI), which is int 2h.
SPbe set this way? Perhaps they do both: Keeping the stack at a 'valid' address at the higher interrupts, but also writing the BDA.