PS DEV


Sessions

Sessions – PS DEV Sessions – PS DEV

Filter by:

Track
Session Type
Store
Stream

OTel + Apache Iceberg: The New Standard for Observability

Talk
Observability is moving from vendor stacks to open standards. This talk presents a design where OpenTelemetry provides collection and semantic context, and Apache Iceberg is the data layer for logs, metrics, and traces. We cover portability, governance, agent investigation, and write-path pitfalls: drift, small files, compaction.
Operations
Scale

GitOps for n8n: Treating Workflows as Code

Talk
n8n-gitops is an open-source CLI that applies GitOps principles to n8n workflows. This talk shows how workflows can be exported, reviewed, versioned, and deployed from Git instead of manually promoted via the UI. Through a live demo, we explore safer deployments, rollbacks, and lessons learned operating automation as code.
Scale
Store
Stream

Turning the database inside out again

Talk
We rethink data systems by putting streams at the center. Expanding on Martin Kleppmann's: Turning the Database Inside Out, this talk shows how Apache Kafka and Apache Iceberg together provide durable storage, indexing, and rich views that eliminate brittle ETL and unify real-time and historical analysis. A new way to see databases—and streams.
Operations
Store
Stories

10x CouchDB Performance Gains for a AAA Game Launch

Talk
All software benchmarks and claims of performance are carefully crafted lies and this talk is no different. Instead of giving you a quick “do steps one, two, three for a magic speedup”, we aim to explain how we arrived at the changes we made and how we rigorously tested those changes to make sure we understand their impact.
Data Science
Operations
Scale

Sunset for the Wild West: Making ML disciplined by default

Talk
Many novel machine learning techniques started as clever hacks that just happened to work, but the demands of building real systems can be at odds with this creative culture. Learn about our open-source stack to improve quality-of-life for ML researchers and infrastructure teams alike — and how their concerns aren't as different as you might think.
Data Science
Store
Stories

Building Schema-Free Applications with RDF

Talk
RDF was designed for the semantic web, but it turns out to be a perfect fit for systems where structure emerges from user interaction, not upfront design. This talk covers how to build applications entirely on RDF triples, translate natural language to SPARQL with small, open source language models, and discover implicit knowledge in user input.
Operations
Stories
Stream

Streamling: Lightweight, Extensible Streaming on DataFusion

Talk
Apache DataFusion is moving beyond batch into streaming. We built Streamling, a Rust streaming engine that uses DataFusion planning and Arrow RecordBatch streams for real-time SQL/WASM transforms. This talk covers how we built it, highlights key features (FFI plugins, WASM transforms, and dynamic tables), and shares production lessons.
Operations
Scale
Society, Ethics & Sustainabilty

Escaping the Cloud: High-Performance AI in your Browser

Talk
Server-side inference is the bottleneck of modern AI, creating costs and privacy hurdles. But what if the solution is scaling down to the browser? This session investigates Client-Side AI using WebGPU, ONNX Runtime, and Transformers.js. We’ll explore the reality of hardware access, model size, and the 2026 trade-offs of browser based execution.
Operations
Scale
Store

Floe: Policy-Based Table Maintenance for Apache Iceberg

Talk
Iceberg maintenance procedures work. Orchestrating them across hundreds of tables is the problem. Floe is an open-source system that treats maintenance as policy: glob patterns, schedules, and health-driven triggers that gate operations on real table metrics. Supports 7 catalogs, executes via Spark or Trino.
Operations
Scale
Stream

Beyond the Hype: When Apache Flink Solves Real Problems

Talk
When does Apache Flink solve real problems versus add complexity? Explore use cases where Flink becomes essential such as fraud detection, CDC, real-time analytics versus when batch or Kafka Streams suffice. Compare stream engines (Flink, Spark) with platforms (Kafka, Pulsar) to confidently decide when streaming delivers value.
Operations
Scale
Society, Ethics & Sustainabilty

SPRUCE it up! Open Source GreenOps at scale

Talk
<strong>GreenOps</strong> adoption is stalled by missing data from cloud providers. <a href="https://opensourcegreenops.cloud/" target="_blank" rel="noopener noreferrer">SPRUCE</a> is an open-source, scalable platform built on Apache Spark that enriches cloud usage reports with open models to quantify carbon impact, build insights, and help teams reduce both emissions and cloud spend.
Data Science
Operations

The Failures That Don’t Crash: MLOps for AI Agents

Talk
This talk takes four reliability patterns from distributed systems and shows what they look like inside an agent architecture. How to shadow-test an agent. Why your circuit breakers need confidence thresholds. What an eval harness looks like when your system is non-deterministic. And why human oversight degrades faster than anyone admits.
Data Science
Operations

Observability’s Sixth Sense: Detecting Anomalies in Metrics

Talk
In this talk, we look at anomaly detection as a complementary way of working with metrics. Instead of relying on predefined limits, anomaly detection focuses on identifying behavior that deviates from what is normally observed over time. The focus is on how developers can interpret these signals, where anomaly detection is useful, where it is not.
Operations
Store

What you should know about constraints in PostgreSQL 18

Talk
This talk explains how constraints work in Postgres by exploring the pg_constraint catalog and core concepts like table vs. column constraints, constraint triggers, domains and constraint deferrability through SQL queries. It then covers what’s new in Postgres 18 including temporal keys, NOT NULL as a first-class constraint, NOT ENFORCED and more.
Data Science
Scale
Store

Writes, 3 ways: Postgres, Apache Kafka® and Apache Iceberg™

Short Talk
Learning new things is hard, but a useful way to think about new things is by comparing them to things you already know. In this talk, we'll compare writes between 3 different popular data services: Postgres, Apache Kafka and Apache Iceberg. In doing so, we'll learn a bit about the evolution of how we've thought of data storage as developers.
Data Science
Scale
Store

Why Choose One: Multi-Engine Analytics with Apache Wayang

Talk
Choosing the best engine for each data task sounds right, but in modern data stacks doing so requires expertise and effort. Apache Wayang, a recently graduated TLP, addresses this by decoupling logical dataflows from execution engines. From big data platforms to SQL and ML engines, Wayang enables cross-platform execution that maximizes performance.
Operations
Scale
Stream

Event-driven Agents with Complex Event Processing in Flink

Talk
Event-driven Agents calling LLMs can be combined with Pattern Recognition and Anomaly Detection in Apache Flink in smart ways to increase cost efficiency, avoid hallucinations and enforce predictable, deterministic behavior. Specifically in a business process context, this architecture provides opportunities for continuous real-time process mining.
Data Science
Stories

Let LLMs Wander: Engineering RL Environments

Talk
What if, instead of learning only from examples, Language Models could explore crafted Environments, little worlds where they can act and improve autonomously? Join me to see how <strong>Reinforcement Learning Environments</strong> work, how to build them with open-source tools, and how to use them to <strong>evaluate and train LLMs/Agents</strong>.
Data Science
Society, Ethics & Sustainabilty
Stories

No 0-day required, just target the AI coding assistant!

Talk
Discover how attackers can manipulate AI coding assistants through hidden text, typosquatting and code errors. Learn to detect concealed instructions and set up trusted dependencies to keep unsafe code out of your environment.
Operations
Scale
Store

How Apache Iceberg Enables Multi-Engine Data Platforms

Talk
the session will cover operational best practices, including metadata management, file sizing, compaction strategies, and performance tuning at scale. Attendees will leave with practical guidance for designing &operating open, flexible, multi-engine data architectures built on Apache Iceberg, enabling faster analytics, lower operational flexibility
Store

DuckDB beyond the notebook

Talk
Most people know DuckDB as a fast analytics tool for notebooks and scripts. But embedded OLAP enables much more: browser-based analytics via WebAssembly, serverless data processing, and lightweight data apps — without heavy infrastructure. This talk shows how DuckDB changes the way we build data-driven applications.
People & Community
Society, Ethics & Sustainabilty

AI Can Contribute. It Can’t Lead.

Short Talk
Today, AI writes code, reviews PRs, answers questions. Some communities ban it, others label it. Most will accept it eventually. But AI won't show up to community calls for two years. It won't mentor your next maintainer. We're losing maintainers faster than we're replacing them. Stop fighting it. Start investing in what it can't replace: people.
People & Community
Society, Ethics & Sustainabilty

OpenSearch Software Foundation: 1 Year of Open Governance

Short Talk
In this presentation, we will talk through moving a major open source project into a foundation and the benefits of open governance, and a vendor-neutral home has proven through a sustained growth in community contributions.
Scale
Stream

What If We’ve Been Scaling Stream Processing Wrong All Along

Talk
We’ve normalised extraordinary inefficiency in stream processing. Thousands of events/sec don't justify repartition storms, serialization overhead, state migration. This talk explores a different path: Kafka Streams DSL, adopt Flink-like exactly-once semantics, Project Loom, and challenging the assumption that stream processing must be distributed.
Data Science
Store
Stream

Kafi Streams: Complex Stream Processing Made Simple

Talk
You can finally stop caring about co-partitioning, state stores and eventual consistency. Kafi Streams, built on (Py)DBSP, treats streaming like batch — strongly consistent, no special concepts. An Open Source Python library for the 80% of use cases that don't need extreme scale. Fully incremental stream processing for everyone, from day one.
Stream

Dynamic Broker-Side Filtering for Kafka

Talk
KAFKA-6020 has been open for 7 years. This talk demos broker-side filtering for Kafka with sub-millisecond latency (p99 < 25ms). Live demo with working code shows how it reduces network costs, simplifies consumers, and enables new use cases. Real-world validation from financial services and logistics deployments.
Data Science
People & Community
Stories

AI in the physical world: from observation to discovery

Short Talk
In 2026, AI is moving beyond digital tasks into the physical world. It increasingly interacts with instruments, experiments, and real-world data. Physicists stand at this frontier, using deep learning, LLMs, and agents to analyze nature itself. What have we learned about AI when it meets reality?
Data Science
Store
Stories

Ultraviolet: Turn Hidden Document Data into an AI Advantage

Short Talk
Every PDF hides a world of structure, metadata and embedded signals that can silently influence AI based processing. With ultraviolets, we reveal how those can be exploited for malicious purposes and even become powerful tools for smarter applications. Designing for both humans and machines become a vital aspect of AI experience design.
Operations
Scale
Stories

Time-Traveling Agents: That Rewind, Retry, Recover

Short Talk
Enterprises need AI that won’t hallucinate, break rules, or cause revenue loss. This talk introduces Time-Traveling Agents; LLM systems built on event sourcing and replay, letting teams rewind decisions, inject fixes, and guarantee safe, compliant automation at scale.
Data Science
Operations
Stories

How to Tell If Your Agent Used the Right Stuff

Talk
Many so-called “agent failures” are actually context failures in disguise. In this session, we’ll explore how to tell whether your agent truly saw and used the right context, using techniques like tracing and attribution, golden datasets for context-aware evaluation, and targeted probes to test retrieval quality.
Data Science
Society, Ethics & Sustainabilty
Stories

Low-Resource Languages as Stress Tests for NLP Data

Short Talk
Low-resource languages expose weaknesses in NLP systems that are often hidden by benchmark data. Drawing on experience annotating fieldwork data, this talk shows how ambiguity and annotation decisions reveal fundamental data quality issues relevant to real-world NLP pipelines.
Data Science
Operations
Stream

Apache Spark Declarative Pipelines in Action

Talk
Learn Spark 4.1's brand-new Declarative Pipelines, a paradigm shift replacing imperative code with simple declarations. We'll build a real-time data pipeline together, processing streaming ADS-B flight data from tens of thousands of aircraft overhead.
People & Community
Society, Ethics & Sustainabilty

Mentoring In Open Source in the Age of AI

Talk
Open source mentorship changed overnight with AI tools. Contributors submitted polished code they couldn’t explain, making learning harder to assess. This talk shares what we learned mentoring Outreachy contributors—what failed, what worked, and what we’re still figuring out.
Operations
Store
Stream

Keeping data private in real-time pipelines

Talk
Real-time data is awesome… until you realize it’s leaking names, emails, and locations. In this talk, you’ll learn how to keep streaming data private, from simple masking to tricks that beat re-identification. All with live demos and some juicy real-world stories.
Operations

Correctness Too Cheap To Meter: Formal Verification and LLMs

Talk
Formal methods are powerful tools to verify software systems' correctness and reliability. However, manually writing system specs is time-consuming and hard to maintain. LLMs can help with this burden. We'll share new research into tools to automate formal methods workflows and learnings from how LLMs currently perform.