CXL-based Key-Value Cache Storage for LLM Serving preview page 1

Astera Labs

CXL-based Key-Value Cache Storage for LLM Serving

Pages

7

Time to read

18 mins

Publication

10/28/24

Language

English

Summary

This research article explores the use of Compute Express Link (CXL) memory for storing key-value (KV) cache in large language model (LLM) serving systems. The study demonstrates that CXL-CPU interconnect can match CPU-GPU performance, enabling significant improvements in batch size and GPU utilization. The findings highlight the potential of CXL memory to alleviate memory constraints and enhance throughput, making it a promising solution for high-demand LLM applications.

Astera Labs

CXL-based Key-Value Cache Storage for LLM Serving

Summary

Get the Full Copy