Presidio
Storm Reply Utilizes Amazon Instances for LLM Inference
Pages
3
Time to read
6 mins
Publication
Language
English
Pages
3
Time to read
6 mins
Publication
Language
English
This case study details how Storm Reply implemented a cost-effective solution for deploying large language models (LLMs) and Generative AI (GenAI) using Amazon EC2 C7i instances powered by 4th Gen Intel® Xeon® Scalable processors. Storm Reply faced challenges in finding a high-availability hosting environment that could efficiently support its LLM-based solutions for a major energy sector client. The evaluation of GPU-based instances revealed limitations, prompting the exploration of CPU-based instances. The partnership with Intel and AWS facilitated the development of a tailored solution that provided flexibility, scalability, and a favorable price-performance ratio. The study outlines the technical advantages of using Intel's advanced technologies, including the oneAPI Toolkit and Intel® Extension for Pytorch, which enhanced performance for AI workloads. The results indicated that the CPU-based instances matched the price-performance of GPU environments while offering customization options and improved response times, demonstrating the viability of CPU solutions for AI applications.