Operations
Scale
Store

Floe: Policy-Based Table Maintenance for Apache Iceberg

Session Abstract

Iceberg maintenance procedures work. Orchestrating them across hundreds of tables is the problem. Floe is an open-source system that treats maintenance as policy: glob patterns, schedules, and health-driven triggers that gate operations on real table metrics. Supports 7 catalogs, executes via Spark or Trino.

Session Description

Every Iceberg table needs maintenance, but catalogs don’t execute and engines don’t orchestrate. Teams end up with scripts that become DAGs that become technical debt. Nobody knows which tables are healthy, which are overdue, or what ran last.

Floe is an open-source, policy-based maintenance system for Iceberg. Define rules with glob patterns, schedules, and health-driven triggers that gate operations based on real table metrics: small file percentage, snapshot count, delete file ratio, and partition skew. Priority resolves conflicts when patterns overlap. A maintenance debt score ranks tables by urgency so the most critical work runs first within your resource budget.

Floe connects to REST, Polaris, Lakekeeper, Gravitino, DataHub, Hive Metastore, and Nessie catalogs, then delegates execution to Spark or Trino. A built-in dashboard shows table health trends, operation history, and policy coverage.

This talk covers the policy model, health-driven maintenance planning, and a live demo.