Constant-Time Aggregations with Star-Tree in OpenSearch
Session Abstract
Discover how OpenSearch breaks linear scaling. Inspired by Apache Pinot, the Star-Tree index moves performance dependency from document count to field cardinality. Learn how we extended Lucene’s DocValues to build multi-dimensional materialized views that deliver sub-second analytics on billion-scale datasets for observability workloads.
Session Description
Traditional distributed search engines face a significant bottleneck: aggregation latency scales linearly with document count. As datasets grow to billions of records, this “scan-on-query” model fails to meet real-time requirements. Inspired by Apache Pinot and star-cube research, OpenSearch introduced the Star-Tree index to decouple performance from raw data volume.
This session dives into the engineering behind this transition. We will explore how we extended Lucene’s DocValuesFormat to support multi-field materialized views directly within segment structures and shifted the performance dependency from total document count to the cardinality of indexed dimensions.
We will detail the implementation of “star nodes”—wildcard structures representing aggregates across all values of a dimension—and how they enable constant-time query pruning. Attendees will learn about the challenges of building a multi-field index in the single-field-centric ecosystems like Lucene, how to fine tune storage versus speed, and why this architectural shift achieved up to 1000x faster queries. Finally, we will discuss operational lessons learned, including limitations on why this structure is limited to append-only workloads and how it bridges the gap between traditional search and OLAP-style analytics.