
Astera Labs
CXL-based Key-Value Cache Storage for LLM Serving
Pages
7
Time to read
18 mins
Publication
Language
English

Pages
7
Time to read
18 mins
Publication
Language
English
This research article explores the use of Compute Express Link (CXL) memory for storing key-value (KV) cache in large language model (LLM) serving systems. The study demonstrates that CXL-CPU interconnect can match CPU-GPU performance, enabling significant improvements in batch size and GPU utilization. The findings highlight the potential of CXL memory to alleviate memory constraints and enhance throughput, making it a promising solution for high-demand LLM applications.