5

I want to write a shell script/command which uses commonly-available binaries, the /sys fileystem or other facilities to calculate the theoretical maximum bandwidth for the RAM available on a given machine.

Notes:

  • I don't care about latency, just bandwidth.
  • I'm not interested in the effects of caching (e.g. the CPU's last-level cache), but in the bandwidth of reading from RAM proper.
  • If it helps, you may assume a "vanilla" Intel platform, and that all memory DIMMs are identical; but I would rather you not make this assumption.
  • If it helps, you may rely on root privileges (e.g. using sudo)
7
  • which bandwidth are you interested in? CPU <--> RAM? I/O <--> RAM? and by RAM do we mean Virtual Memory or direct access to physical memory? What about L3 (or last) cache? Did you have a look at superuser.com/questions/827207/… ? Commented Jul 20, 2018 at 14:13
  • @diginoise: I asked about the RAM, not the CPU cache. I meant how much you can read from RAM to everywhere on the system; typically this would be how much you can read from the different memory banks to the various CPU sockets on the system. Commented Jul 20, 2018 at 14:24
  • Are you wanting to benchmark, like with time dd if=/dev/zero of=/dev/null bs=1g count=200 or something? If not, the [benchmarking] tag doesn't make sense. Commented Oct 27, 2018 at 1:58
  • You say you want the "theoretical" max bandwidth, which means not a benchmark, but rather reading the DRAM parameters and bus speed and simply multiplying out the resultant bandwidth (probably looking up the number of memory channels based on the CPU model). If you do want a benchmark, STREAM is one de-facto standard. Various benchmark packages offer their own memory bandwidth tests. TinyMemBench is another. Commented Oct 28, 2018 at 0:33
  • @BeeOnRope: I see what you mean. I'm dropping the [benchmarking] tag. Commented Oct 28, 2018 at 7:46

2 Answers 2

3

I'm not aware of any standalone tool that does it, but for Intel chips only, if you know the "ARK URL" for the chip, you could get the maximum bandwidth using a combination of a tool to query ARK, like curl, and something to parse the returned HTML, like xmllint --html --xpath.

For example, for my i7-6700HQ, the following works:

curl -s 'https://ark.intel.com/products/88967/Intel-Core-i7-6700HQ-Processor-6M-Cache-up-to-3_50-GHz' | \
xmllint --html --xpath '//li[@class="MaxMemoryBandwidth"]/span[@class="value"]/span/text()' - 2>/dev/null

This returns 34.1 GB/s which is the maximum theoretical bandwidth of my chip.

The primary difficulty is determining the ARK URL, which doesn't correspond in an obvious way to the CPU brand string. One solution would be to find the CPU model on an index page like this one, and follow the link.

This gives you the maximum theoretical bandwidth, which can be calculated as (number of memory channels) x (trasfer width) x (data rate). The data rate is the number of transfers per unit time, and is usually the figure given in the name of the memory type, e.g., DDR-2133 has a data rate of 2133 million transfers per second. Alternately you can calculate it as the product of the bus speed (1067 MHz in this case) and the data rate multiplier (2 for DDR technologies).

For my CPU, this calculation gives 2 memory channels * 8 bytes/transfer * 2133 million transfers/second = 34.128 GB/s, consistent with the ARK figure.

Note that theoretical maximum as reported by ARK might be lower or higher than the theoretical maximum on your particular system for various reasons, including:

  • Fewer memory channels populated than the maximum number of channels. For example, if I only populated one channel on my dual channel system, theoretical bandwidth would be cut in half.
  • Not using the maximum speed supported RAM. My CPU supports several RAM types (DDR4-2133, LPDDR3-1866, DDR3L-1600) with varying speeds. The ARK figure assumes you use the fastest possible supported RAM, which is true in my case, but may not be true on other systems.
  • Over or under-clocking of the memory bus, relative to the nominal speed.

Once you get the correct theoretical figure, you won't actually reach this figure in practice, due to various factors including the following:

  • Inability to saturate the memory interface from one or more cores due to limited concurrency for outstanding requests, as described in the section "Latency Bound Platforms" in this answer.
  • Hidden doubling of bandwidth implied by writes that need to read the line before writing it.
  • Various low-level factors relating the DRAM interface that prevents 100% utilization such as the cost to open pages, the read/write turnaround time, refresh cycles, and so on.

Still, using enough cores and non-termporal stores, you can often get very close to the theoretical bandwidth, often 90% or more.

Sign up to request clarification or add additional context in comments.

4 Comments

How do I determine the correct URL for a different Intel CPU?
@einpoklum - I'm not aware of any simple way. The per-CPU page names follow some structure, but it does vary from family to family (e.g., some mention the cache size, etc). If you really wanted to do this, you'd probably want to scrape all the product URLs (e.g., from the index pages), then do a fuzzy search e.g., for the model number, rather than trying to generate the URL directly. With a few rules this might produce something approaching a reliable result.
That seems to be the max bandwidth for all cores, or? Can I calculate the bandwidth for one core? In my test, there was a non-linear relationship though...
@fabian - it depends what you mean. You can get a core's "share" of the bandwidth by dividing the CPU bandwidh by the core count, but of course if the other cores aren't using their share you could get more than that from a single core. Intel doesn't publish those numbers, but you can find then via test or in CPU reviews.
1

@einpoklum you should have a look at Performance Counter Monitor available at https://github.com/opcm/pcm. It will give you the measurements that you need. I do not know if it supports kernel 2.6.32

Alternatively you should also check Intel's EMON tool which promises support for kernels as far back as 2.6.32. The user guide is listed at https://software.intel.com/en-us/download/emon-user-guide, which implies that it is available for download somewhere on Intel's software forums.

1 Comment

While I appreciate the link, I was after an answer that uses binaries already available on most systems, not something I need to download and build (which in some case I don't have the ability to di).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.