In principle, the diagram has all the bits and pieces in block diagram kind of level and it might even work in practice to output something that resembles audio.
The actual implementation is just very poor and is missing a couple of obvious things it should have, so it might be low in volume and have poor audio quality, at least if compared to what it could be with a proper design.
First of all, at high level, the op-amp isn't strong enough to drive the speaker, and the DC bias is passed on to speaker, which needs to be AC coupled. Technically there should be an amplifier capable of driving the speaker.
Then the actual op-amp and component values to bias the op-amp and the gain it has may not be very suitable for the old LM324. At 5V supply, it will have input voltage range between 0V and 3V only, so bias should be 1.5V. On the other hand, at any appreciable level of current, the output range will be between 1V and 3.25V, so DC bias should be about 2.125V.
Summing two 5V signals together at 1x gain means the output will clip.
It is not meaningful to calculate why the values are not very good, as they need to be recalculated anyway, so it may make sense to just recalculate the values appropriately from scratch, and then check if what AI told you makes sense or not.
As said it will make also sense to AC couple the input, but frankly, the LM324 is so weak that even the MCU PWM pins might drive the speaker with more current, and mixing the audio could be done by just wiring the PWM outputs together with resistors. It may make sense to exchange the LM324 to something newer as well.
There are some weird things such as one opamp used as buffer for reference voltage, but as the reference voltage goes only to one high impedance node, the input of second op-amp, the first op-amp can be removed without any functional difference.