0

My single threaded program allocates and initializes a volatile block of memory on an unspecified hardware architecture. It then writes into this block in a loop using a stride equal to the cache line size (usually 64 bytes). Each write can either transfer a single byte (1 byte), or an entire long (8 bytes).

To be clear, the total number of writes is fixed. Only the number of bytes per write can vary.

There are no reads, no other threads and no other stuff is going on. Should I expect a performance difference between these cases?


My expectation is that there will be none. I believe this depends on the formalities of the bus transport. If the bus has a minimum chunk size of at least 64 bits, then both cases map to the same physical transfer execution. Else, there could be a small difference as the program is clearly memory throughput bound. I believe virtually all common computing hardware has a bus width larger than 64 bits.

1
  • On 64-bit CPU, there will be overall no performance difference between writing 1 byte or 8 bytes once aligned, at least in usual cases (i.e. not pathological ones). Both will access a whole cache line modify it and store it. Please note that a 8-byte region crossing 2 cache-lines can be significantly longer to read/written. Generally, data is aligned to avoid this issue (often by default in most programming languages). Please note there can be other significant performance impact regarding how data is exactly read/written (i.e. exact sequence) on the exact target architecture. Commented Oct 8 at 17:06

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.