For monitoring memory bandwidth, there is pcm-memory on the Intel platform and AMDuProf on the AMD platform.
How do they calculate memory bandwidth usage? Which PMUs were used?
Is it using 1024 or 1000 base for GB/MB/KB calculation? I think 1000 base is more suitable, but I'm not sure.