A new method called 'Tailslayer' has emerged to overcome a memory flaw that has existed for 60 years and reduce latency.



Google researcher Laurie Wired has released 'Tailslayer,' a library designed to overcome a DRAM flaw that has existed for 60 years.

GitHub - LaurieWired/tailslayer: Library for reducing tail latency in RAM reads · GitHub

https://github.com/LaurieWired/tailslayer

Your RAM Has a 60 Year Old Design Flaw. I Bypassed It. - YouTube


DRAM, widely used as computer memory, distinguishes between '0' and '1' based on the presence or absence of electric charge stored in a capacitor. Since the electric charge in the capacitor is lost over time, it is necessary to perform a process called 'refresh,' which involves reading and rewriting the charge, at regular intervals.

During the refresh process, memory cannot be accessed. If the CPU attempts to access memory while the DRAM is refreshing, a delay of several hundred nanoseconds to several microseconds will occur. This problem has existed for a long time, ever since DRAM was invented in 1966.

While it's difficult for most people to grasp just how long a few hundred nanoseconds is, in terms of CPU clock speed, it's equivalent to wasting thousands of clock cycles. This has become a problem in fields like finance, where real-time performance is particularly important. You can understand what a few hundred nanoseconds means internally in a computer by reading the following article.

What programmers should know about 'internal PC communication speeds' - GIGAZINE



Tailslayer duplicates data to multiple independent DRAM channels when writing to memory. When reading, it sends read commands to all the addresses of the duplicated data and uses the first data that can be read. This means that even if data cannot be read from memory that is being refreshed, long waiting times will not occur unless one of the multiple data locations is being refreshed.

To devise a method for writing to 'multiple independent DRAM channels,' Wired uses statistical timing measurements to identify how the memory controller handles physical addresses. He states that he uses channel scrambling offsets to write data so that it is placed on different channels.

The difference in latency when using Tailslayer on the AMD EPYC 9255 is shown in the figure below. The horizontal axis represents latency, and the vertical axis represents the frequency with which that latency occurs. The red dotted line shows the latency distribution when Tailslayer is not used. The blue line shows the distribution when Tailslayer is used, with four levels of intensity indicating the range from simply duplicating to



The results of testing various CPU and memory combinations, including AMD, Intel, and Graviton, are shown in the figure below. In all cases, it can be seen that latency spikes can be avoided when the system is duplicated to six copies using Tailslayer.



According to Wired, latency at the 99.99th percentile could be reduced by up to 1/15th. He stated that this could be used in areas where even extremely small delays can have a significant impact, such as high-frequency trading (HFT).

in Hardware,   Software, Posted by log1d_ts