How Apache Iceberg Enables Multi-Engine Data Platforms
Session Abstract
the session will cover operational best practices, including metadata management, file sizing, compaction strategies, and performance tuning at scale. Attendees will leave with practical guidance for designing &operating open, flexible, multi-engine data architectures built on Apache Iceberg, enabling faster analytics, lower operational flexibility
Session Description
Modern data platforms increasingly rely on multiple compute engines to serve diverse workloads, from batch analytics to interactive SQL and streaming. Without a shared table layer, this flexibility often leads to duplicated data, inconsistent results, and operational complexity.
Apache Iceberg provides a common table abstraction that decouples storage from compute, enabling multiple engines such as Spark, Trino, and Flink to operate safely on the same data. This talk explores the architectural patterns that make multi-engine platforms possible, including metadata-driven concurrency, snapshot isolation, and schema evolution.
We’ll discuss how to choose the right engine for different workloads, how catalogs act as the coordination layer, and what operational practices are required to maintain performance and consistency at scale. Attendees will leave with practical guidance for designing open, multi-engine data architectures built on Apache Iceberg