From https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
It says "-ftree-slp-vectorize: Perform basic block vectorization on trees. This flag is enabled by default at -O2 and by -ftree-vectorize, -fprofile-use, and -fauto-profile."
However it seems I have to pass a flag explicitly to turn on SIMD. Did I mis undertand something here? It is enabled at -O3 though.
-O3is "too aggressive" and/or often not faster. Clang enables auto-vectorization at-O2, so GCC before 12 looked bad by comparison if people benchmark at-O2. Simple loops on modern CPUs benefit significantly from SIMD.