Flo: a Semantic Foundation for Progressive Stream Processing
Abstract
Streaming systems are present throughout modern applications, processing continuous data in real-time. Existing streaming languages have a variety of semantic models and guarantees that are often incompatible. Yet all these languages are considered "streaming" -- what do they have in common? In this paper, we identify two general yet precise semantic properties: streaming progress and eager execution. Together, they ensure that streaming outputs are deterministic and kept fresh with respect to streaming inputs. We formally define these properties in the context of Flo, a parameterized streaming language that abstracts over dataflow operators and the underlying structure of streams. It leverages a lightweight type system to distinguish bounded streams, which allow operators to block on termination, from unbounded ones. Furthermore, Flo provides constructs for dataflow composition and nested graphs with cycles. To demonstrate the generality of our properties, we show how key ideas from representative streaming and incremental computation systems -- Flink, LVars, and DBSP -- have semantics that can be modeled in Flo and guarantees that map to our properties.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2024
- DOI:
- arXiv:
- arXiv:2411.08274
- Bibcode:
- 2024arXiv241108274L
- Keywords:
-
- Computer Science - Programming Languages;
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing