
Weaviate
Alto Framework for Distributed Compound AI Systems
Pages
14
Time to read
60 mins
Publication
Language
English

Pages
14
Time to read
60 mins
Publication
Language
English
This technical report presents Alto, a framework designed to optimize the execution of compound AI queries through techniques such as streaming and parallelism. The document outlines the challenges faced in applying traditional system optimizations to compound AI applications, which often involve various subcomponents like generative language models and document retrievers. It introduces the concept of nested ancestry, a metadata hierarchy that helps track partial outputs and manage data across different components with varying constraints. The report details how Alto allows developers to express complex dataflow patterns without manually handling routing and aggregation. Additionally, it evaluates Alto's performance against existing frameworks, demonstrating improvements in latency and throughput across several applications. The report concludes with a discussion of the contributions made by Alto, including its programming model that integrates streams as a fundamental data structure, facilitating the efficient processing of partial outputs in compound AI systems.