Thursday, April 20, 2023

DDR memory latency

DDR memory latency not including memory controller for a single READ command.

READ to an already active row
tCL

READ with memory idle
tRCD >> tCL

READ to a different row in the same bank as an already active row
tRP >> tRCD >> tCL

READ right before or during a refresh
tRFC >> tRCD >> tCL

READ to a different row in the same bank in which a row was just activated
tRAS >> tRP >> tRCD >> tCL

Timings are measured in clock cycles. Therefore DDR4-3200 CL16 has the same memory latency as DDR4-4000 CL20. Same for say DDR5-6000 CL30 and DDR5 7600 CL38. 

To get the ns latency of a timing just divide the timing by the memory clock in GHz(which is half the data rate). For example:

DDR4-3200 CL16
memory clock = 1.6GHz
16/1.6GHz = 10ns

DDR5-7200 CL34
memory clock = 3.6GHz
34/3.6 = 9.44ns

At this point it might seem that as long as the memory timing latency in ns is the same the memory clock has no impact on overall memory latency. However this is not true because the RAM is not directly connected to the CPU cores. It is connected to a memory controller. The memory controller adds it's own latency to the latency of the RAM increasing the overall core to memory latency. How much latency the memory controller adds depends on it's design and clock speed. The clock of the memory controller is directly related to the memory clock. Usually in ratios like 1:1 1:2 and 1:4. Therefore to reduce the latency from the memory controller it is necessary to raise the memory clock.

The memory controller itself is also not usually directly connected to the CPU cores. There is typically an interconnect between the cores and memory controller which adds even more latency. Again how much latency is added depends on the design of the interconnect and it's clock speed. So again the latency can be reduced by raising the clock speed of the interconnect. The interconnect may or may not have a fixed ratio relationship with the memory controller clock potentially with a latency penalty for having to buffer data.

Finally how quickly the cores can process the data arriving from the memory will somewhat influence any software memory latency test.

Basically on intel
more core clock >> less latency
more ring clock >> less latency
more MC clock >> less latency
lower timings >> less latency *

on AMD Ryzen 7000
more core clock >> less latency
more IF clock >> less latency **
more MC clock >> less latency
lower timings >> less latency *

on AMD Ryzen 5000
more core clock >> less latency
more IF clock >> less latency
more MC clock >> less latency
synchronized IF and MC clock >> less latency
lower timings >> less latency *

 

* tREFI is the one timing that you need to increase in order to reduce memory latency. This is because increasing the tREFI reduces the probability that a READ command will have to wait for a refresh to complete as refreshes will be less frequent.

** Ryzen 7000 doesn't seem to have IF to MC synchronization because the IF runs at around 2GHz while the memory controller runs at 2.4-3.2GHz(3.2GHz if you're lucky) so the data beween the MC and IF is always buffered and there's no "synchronization bonus"