How bad is the memory latency of CXL?

Release Date: 05-10, 2023

  If Astera is to be trusted, the latency is not as bad as you might expect. The company's Leo CXL memory controller is designed to accept standard DDR5 memory DIMMs at up to 5600 MT/s. They claim customers can expect latencies roughly equivalent to accessing memory on a second CPU, a NUMA hop. This puts it around 170-250 nanoseconds. In fact, that's how these memory modules show up to the operating system in the system.Most CXL memory controllers add around 200 nanoseconds of latency, plus or minus a few dozen nanoseconds depending on the distance from the device to the CPU as additional re-timers are added, explained Tavallaei. This is consistent with what other early CXL adopters have seen. Alan Benjamin, CEO of GigaIO, told The Next Platform that most CXL memory expansion modules they have seen have latencies around 250 nanoseconds, not 170.

  However, as Tavallaei notes, this still represents an improvement for four- or eight-socket systems where applications may be dealing with multiple NUMA hops simply to access memory. (To be fair, IBM and Intel have added more and faster links between CPUs to reduce hop counts and latency per hop.)

  That said, many chipmakers are quick to point out that the CXL ecosystem is still in its early days. Kurtis Bowman, who serves on the CXL board for AMD, told The Next Platform that many early CXL concept validations and products are based on FPGA or first-generation ASICs that have not yet been optimized for latency. He expects latency to improve significantly over time.

  If CXL vendors can achieve latency on par with multi-socket systems outside of the showroom demos, it should largely eliminate the need for application- or OS-specific tuning to take advantage of them. Well, at least for memory expansion. As we've seen with Optane, CXL memory tiering will almost certainly require some level of OS or application support.

  That's becoming increasingly important as slots get larger and it becomes harder to fit more DIMMs on the board. There are dual-socket systems that can accommodate 32 DIMMs, but that's not scalable as chipmakers add more channels to meet increasing core-count bandwidth demands.

  We've seen that to some degree with AMD's Genoa chip, which increased memory channel count to 12 but at launch only supported one DIMM per channel, limiting the number of DIMMs in a dual-socket configuration to 24. Even if you could connect two DIMMs to each channel, we're told installing 48 DIMMs in a standard chassis is impractical.

  Things get more complicated when you want to connect memory at greater distances, such as across racks, because the latency incurred by electrical or optical interconnects must be factored in. But for CXL memory expansion inside the chassis, latency doesn't seem to be as much of a concern as some feared.