Kafi Streams: Complex Stream Processing Made Simple
Session Abstract
You can finally stop caring about co-partitioning, state stores and eventual consistency. Kafi Streams, built on (Py)DBSP, treats streaming like batch — strongly consistent, no special concepts. An Open Source Python library for the 80% of use cases that don’t need extreme scale. Fully incremental stream processing for everyone, from day one.
Session Description
I will unveil Kafi Streams, an Open Source library for complex stream processing inspired by Kafka Streams but built on top of PyDBSP, a pure Python implementation of Feldera’s novel “Database Stream Processing” theory.
Why would we need yet another stream processing library? One whose name sounds so strikingly similar to the most popular stream processing library on the planet?
Because existing stream processing libraries are too complex, even Kafka Streams. Their engines have been prematurely optimized for maximum scale, not simplicity. You cannot easily do stream processing without understanding concepts like streams vs. tables, co-partitioning, windowing (hopping, tumbing, sessions…), state stores etc. – all these “leaky abstractions” still prevailing in the stream processing world. It is them that keep stream processing in a niche.
On the contrary, Kafi Streams aims at making stream processing simple. It does not (yet) aim for extreme scale and performance. But to enable complex stream processing with full support for joins, aggregations et al. for the less performance-heavy 80% of use cases in non-tech companies like mine, Migros, a $30B+ revenue retailer.
With Kafi Streams, anyone can do complex stream processing, even those who have never done it before. Right from the start. Because with DBSP as our basis, streaming is no different from batch any longer. Simple. Deterministic. Not just eventually but strongly consistent. Just like anybody coming from outside the streaming world would always have hoped.