Filter by:
Track
Session Type
Store
Stream
OTel + Apache Iceberg: The New Standard for Observability
Talk
Observability is moving from vendor stacks to open standards. This talk presents a design where OpenTelemetry provides collection and semantic context, and Apache Iceberg is the data layer for logs, metrics, and traces. We cover portability, governance, agent investigation, and write-path pitfalls: drift, small files, compaction.
Operations
Scale
GitOps for n8n: Treating Workflows as Code
Talk
n8n-gitops is an open-source CLI that applies GitOps principles to n8n workflows. This talk shows how workflows can be exported, reviewed, versioned, and deployed from Git instead of manually promoted via the UI. Through a live demo, we explore safer deployments, rollbacks, and lessons learned operating automation as code.
Scale
Store
Stream
Turning the database inside out again
Talk
We rethink data systems by putting streams at the center. Expanding on Martin Kleppmann's: Turning the Database Inside Out, this talk shows how Apache Kafka and Apache Iceberg together provide durable storage, indexing, and rich views that eliminate brittle ETL and unify real-time and historical analysis. A new way to see databases—and streams.
Data Science
Operations
Scale
Sunset for the Wild West: Making ML disciplined by default
Talk
Many novel machine learning techniques started as clever hacks that just happened to work, but the demands of building real systems can be at odds with this creative culture. Learn about our open-source stack to improve quality-of-life for ML researchers and infrastructure teams alike — and how their concerns aren't as different as you might think.
Operations
Scale
Society, Ethics & Sustainabilty
Escaping the Cloud: High-Performance AI in your Browser
Talk
Server-side inference is the bottleneck of modern AI, creating costs and privacy hurdles. But what if the solution is scaling down to the browser? This session investigates Client-Side AI using WebGPU, ONNX Runtime, and Transformers.js. We’ll explore the reality of hardware access, model size, and the 2026 trade-offs of browser based execution.
Operations
Scale
Store
Floe: Policy-Based Table Maintenance for Apache Iceberg
Talk
Iceberg maintenance procedures work. Orchestrating them across hundreds of tables is the problem. Floe is an open-source system that treats maintenance as policy: glob patterns, schedules, and health-driven triggers that gate operations on real table metrics. Supports 7 catalogs, executes via Spark or Trino.
Operations
Scale
Stream
Beyond the Hype: When Apache Flink Solves Real Problems
Talk
When does Apache Flink solve real problems versus add complexity? Explore use cases where Flink becomes essential such as fraud detection, CDC, real-time analytics versus when batch or Kafka Streams suffice. Compare stream engines (Flink, Spark) with platforms (Kafka, Pulsar) to confidently decide when streaming delivers value.
Search
Apache Solr 10: What’s Coming up for Vector Search
Talk
With Apache Solr 10 out, there are plenty of goodies coming up for vector-search aficionados.
From scalar and binary quantization to speed up your search and reduce the memory footprint, to early termination and hybrid approaches to navigate the HNSW graph.
Join us if you want to learn about the big steps forward of Apache Solr vector search!
Operations
Scale
Society, Ethics & Sustainabilty
SPRUCE it up! Open Source GreenOps at scale
Talk
<strong>GreenOps</strong> adoption is stalled by missing data from cloud providers. <a href="https://opensourcegreenops.cloud/" target="_blank" rel="noopener noreferrer">SPRUCE</a> is an open-source, scalable platform built on Apache Spark that enriches cloud usage reports with open models to quantify carbon impact, build insights, and help teams reduce both emissions and cloud spend.
Data Science
Operations
The Failures That Don’t Crash: MLOps for AI Agents
Talk
This talk takes four reliability patterns from distributed systems and shows what they look like inside an agent architecture. How to shadow-test an agent. Why your circuit breakers need confidence thresholds. What an eval harness looks like when your system is non-deterministic. And why human oversight degrades faster than anyone admits.
Data Science
Operations
Observability’s Sixth Sense: Detecting Anomalies in Metrics
Talk
In this talk, we look at anomaly detection as a complementary way of working with metrics. Instead of relying on predefined limits, anomaly detection focuses on identifying behavior that deviates from what is normally observed over time. The focus is on how developers can interpret these signals, where anomaly detection is useful, where it is not.
Operations
Store
What you should know about constraints in PostgreSQL 18
Talk
This talk explains how constraints work in Postgres by exploring the pg_constraint catalog and core concepts like table vs. column constraints, constraint triggers, domains and constraint deferrability through SQL queries. It then covers what’s new in Postgres 18 including temporal keys, NOT NULL as a first-class constraint, NOT ENFORCED and more.
Scale
Search
Beyond Grep: Search for Reliable Coding Agents
Talk
Coding agents succeed in verifiable loops (compiler + tests), but large repos still expose retrieval weaknesses.
This session explores how lexical, structural, and semantic search can provide cleaner context for LLMs. We compare tradeoffs and evaluation approaches to improve reliability without inflating token cost.
Data Science
Scale
Store
Writes, 3 ways: Postgres, Apache Kafka® and Apache Iceberg™
Short Talk
Learning new things is hard, but a useful way to think about new things is by comparing them to things you already know. In this talk, we'll compare writes between 3 different popular data services: Postgres, Apache Kafka and Apache Iceberg. In doing so, we'll learn a bit about the evolution of how we've thought of data storage as developers.
Data Science
Scale
Store
Why Choose One: Multi-Engine Analytics with Apache Wayang
Talk
Choosing the best engine for each data task sounds right, but in modern data stacks doing so requires expertise and effort. Apache Wayang, a recently graduated TLP, addresses this by decoupling logical dataflows from execution engines. From big data platforms to SQL and ML engines, Wayang enables cross-platform execution that maximizes performance.
Operations
Scale
Stream
Event-driven Agents with Complex Event Processing in Flink
Talk
Event-driven Agents calling LLMs can be combined with Pattern Recognition and Anomaly Detection in Apache Flink in smart ways to increase cost efficiency, avoid hallucinations and enforce predictable, deterministic behavior. Specifically in a business process context, this architecture provides opportunities for continuous real-time process mining.
Search
Store
Stream
The Agent Era: How AI Agents Are Reshaping Data Platforms
Panel
AI agents have quietly become some of the most demanding users of modern data platforms and most weren't built with them in mind. In this panel, leaders from Snowflake, Elastic, ClickHouse, and Xata share what agentic workloads actually look like in production: what broke, what had to be rebuilt, and where the architecture is heading.
Scale
Search
Context-Aware Segments: Solving the “Scatter-Read” Problem
Talk
Traditional OpenSearch segments are context-blind, scattering data across multiple segments. We introduce Context-Aware Segments (CAS), an architecture that brings "sharding" logic to the segment level. By enforcing document locality during indexing, we slashed query latency and minimized data footprint through superior pruning and compression.
Operations
Scale
Store
How Apache Iceberg Enables Multi-Engine Data Platforms
Talk
the session will cover operational best practices, including metadata management, file sizing, compaction strategies, and performance tuning at scale. Attendees will leave with practical guidance for designing &operating open, flexible, multi-engine data architectures built on Apache Iceberg, enabling faster analytics, lower operational flexibility
Scale
Search
Store
From OLTP to OLAP: Is PostgreSQL Eating Analytics Too?
Short Talk
Can PostgreSQL become a serious analytics engine? With emerging columnar extensions, PostgreSQL is pushing beyond OLTP into OLAP territory. This talk explores the current columnar landscape, architectural trade-offs, and how far PostgreSQL can go compared to analytical engines like ClickHouse.
Store
DuckDB beyond the notebook
Talk
Most people know DuckDB as a fast analytics tool for notebooks and scripts. But embedded OLAP enables much more: browser-based analytics via WebAssembly, serverless data processing, and lightweight data apps — without heavy infrastructure. This talk shows how DuckDB changes the way we build data-driven applications.
Search
Reviving phonetic algorithms for better search relevance
Short Talk
Fuzzy search is a double-edged sword: it fixes typos but drowns users in noise on large corpora. At INA, we revived ancient phonetic algorithms to improve relevance. This session compares fuzzy vs. phonetic search on a massive archive, showing how "sounding right" beats "spelling close."
People & Community
Search
How to Survive the Vortex of LLM Change
Short Talk
The LLM ecosystem changes faster than most teams can adapt. This talk shares our experience and the practical lessons we’ve learned while building an intelligent search product in a world where models, tools, and best practices constantly evolve.
People & Community
Society, Ethics & Sustainabilty
AI Can Contribute. It Can’t Lead.
Short Talk
Today, AI writes code, reviews PRs, answers questions. Some communities ban it, others label it. Most will accept it eventually. But AI won't show up to community calls for two years. It won't mentor your next maintainer. We're losing maintainers faster than we're replacing them. Stop fighting it. Start investing in what it can't replace: people.
Scale
Search
Store
C++ Search for Database Kernels: Built In, Not Bolted On
Talk
IResearch is an Apache 2.0 C++ search engine built to live inside databases. We'll benchmark it against leading open-source search engines, show why vectorized scoring is the next frontier for information retrieval engines, share the mistakes we made over a decade of development and explore how database-native search fits modern query execution.
People & Community
Society, Ethics & Sustainabilty
OpenSearch Software Foundation: 1 Year of Open Governance
Short Talk
In this presentation, we will talk through moving a major open source project into a foundation and the benefits of open governance, and a vendor-neutral home has proven through a sustained growth in community contributions.
Scale
Stream
What If We’ve Been Scaling Stream Processing Wrong All Along
Talk
We’ve normalised extraordinary inefficiency in stream processing. Thousands of events/sec don't justify repartition storms, serialization overhead, state migration. This talk explores a different path: Kafka Streams DSL, adopt Flink-like exactly-once semantics, Project Loom, and challenging the assumption that stream processing must be distributed.
Data Science
Store
Stream
Kafi Streams: Complex Stream Processing Made Simple
Talk
You can finally stop caring about co-partitioning, state stores and eventual consistency. Kafi Streams, built on (Py)DBSP, treats streaming like batch — strongly consistent, no special concepts. An Open Source Python library for the 80% of use cases that don't need extreme scale. Fully incremental stream processing for everyone, from day one.
Stream
Dynamic Broker-Side Filtering for Kafka
Talk
KAFKA-6020 has been open for 7 years. This talk demos broker-side filtering for Kafka with sub-millisecond latency (p99 < 25ms). Live demo with working code shows how it reduces network costs, simplifies consumers, and enables new use cases. Real-world validation from financial services and logistics deployments.
Data Science
Scale
Search
From Legacy Search to Vespa: What a Real PoC Taught Us
Short Talk
For years, Germany’s largest classifieds website relied on a search-first relevance approach because structured data was sparse. This talk shares how we introduced Vespa in the Motors category, enriched signals with embeddings and extracted attributes, and migrated step by step; what worked, what failed, and which lessons only a real PoC reveals.
Scale
Search
Stream
From Inverted Index to Columnar Vectorized Execution Search
Talk
Search engines are converging with analytical data systems. This talk explores how columnar data layouts, SIMD-accelerated execution, and bulk-oriented processing are reshaping search internals. We examine where traditional models fall short and how hardware-aware techniques from analytics engines are defining the next search infrastructure.
Data Science
Search
Text-to-Struct: Fine-tuning SLMs for Query Intent
Talk
Hybrid search fails on complex intent: vector search misses constraints, keywords miss nuance. This talk explores fine-tuning SLMs for 'Query Understanding'—transforming vague inputs into structured requests. Learn to extract metadata, expand terms, and route intent to build a search engine that does the hard work for your users.
Data Science
Search
Circular Dependency Fixes when Bootstrapping a Golden Set
Short Talk
For a golden set, you need queries. Even if you have them, you can’t judge all docs for each query. Only the top N. How do we rank the top N? See the circular dependency? We’ll talk about ways to untangle it: lexical search, significant terms, training an embedder from scratch, etc. By iteratively refining data and queries, we'll get there.
Data Science
Operations
Stream
Apache Spark Declarative Pipelines in Action
Talk
Learn Spark 4.1's brand-new Declarative Pipelines, a paradigm shift replacing imperative code with simple declarations. We'll build a real-time data pipeline together, processing streaming ADS-B flight data from tens of thousands of aircraft overhead.
People & Community
Society, Ethics & Sustainabilty
Mentoring In Open Source in the Age of AI
Talk
Open source mentorship changed overnight with AI tools. Contributors submitted polished code they couldn’t explain, making learning harder to assess. This talk shares what we learned mentoring Outreachy contributors—what failed, what worked, and what we’re still figuring out.
Operations
Store
Stream
Keeping data private in real-time pipelines
Talk
Real-time data is awesome… until you realize it’s leaking names, emails, and locations. In this talk, you’ll learn how to keep streaming data private, from simple masking to tricks that beat re-identification. All with live demos and some juicy real-world stories.
Operations
Correctness Too Cheap To Meter: Formal Verification and LLMs
Talk
Formal methods are powerful tools to verify software systems' correctness and reliability. However, manually writing system specs is time-consuming and hard to maintain. LLMs can help with this burden.
We'll share new research into tools to automate formal methods workflows and learnings from how LLMs currently perform.
Scale
Search
Agentic Retrieval: Building Self-Optimizing Search Systems
Talk
Relevance feedback loops used to take months. AI agents can now compress the process to seconds. This talk explores agentic retrieval: systems where agents adjust scoring models, schema, and indexing in real time. Learn how to build retrieval infrastructure with verifiable APIs that enable agents to optimize their own search context.
Search
Stream
The Three-Body Problem of Inverse Hybrid Search
Talk
When users expect alerts for new products matching an uploaded image, the problem becomes inverse hybrid search. Unlike top-K search, alerting must guarantee fetch-all semantics: zero missed matches across all saved searches, combining vector similarity, boolean filters, and lexical signals. We show why this breaks traditional scaling intuition.