Vdura
AI Storage Reference Design for GPU-Accelerated AI
Pages
27
Time to read
36 mins
Publication
Language
English
Pages
27
Time to read
36 mins
Publication
Language
English
This white paper presents the AI Storage Reference Design, developed to address the challenges associated with integrating components for building AI systems of varying scales. It outlines the infrastructure requirements necessary for training large AI models, which involve distributing massive datasets across multiple GPUs and nodes. The document details the AI reference architecture, which includes a scalable unit consisting of compute nodes equipped with AMD Instinct MI300 series GPUs and high-speed networking components. It explains the data processing phases, including data ingestion, preparation, and the use of Extract-Transform-Load (ETL) pipelines. The paper also discusses the importance of high-performance compute, scalable storage, and robust networking for AI workloads, particularly for Large Language Models. Furthermore, it provides a prescriptive full-rack reference design optimized for AI workloads, detailing components, configurations, and network connectivity essential for effective deployment and scaling of AI infrastructure.