NSDI '22 - Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems

Показать описание

NSDI '22 - Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems

Peter Kraft, Fiodar Kazhamiaka, Peter Bailis, and Matei Zaharia, Stanford University

We present data-parallel actors (DPA), a programming model for building distributed query serving systems. Query serving systems are an important class of applications characterized by low-latency data-parallel queries and frequent bulk data updates; they include data analytics systems like Apache Druid, full-text search engines like ElasticSearch, and time series databases like InfluxDB. They are challenging to build because they run at scale and need complex distributed functionality like data replication, fault tolerance, and update consistency. DPA makes building these systems easier by allowing developers to construct them from purely single-node components while automatically providing these critical properties. In DPA, we view a query serving system as a collection of stateful actors, each encapsulating a partition of data. DPA provides parallel operators that enable consistent, atomic, and fault-tolerant parallel updates and queries over data stored in actors. We have used DPA to build a new query serving system, a simplified data warehouse based on the single-node database MonetDB, and enhance existing ones, such as Druid, Solr, and MongoDB, adding missing user-requested features such as load balancing and elasticity. We show that DPA can distribute a system in less than 1K lines of code (greater than 10× less than typical implementations in current systems) while achieving state-of-the-art performance and adding rich functionality.

Рекомендации по теме

NSDI '22 - Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems

NSDI '22 - Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems

Paper #105. Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems

NSDI '22 - Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers

Scalable architectures and system design #programming #data #systemdesign

How to Build a Reliable, Scalable Parallel Filesystem Solution using Cloud - SDC India 2018

Tutorial: SHARPv2: In-Network Scalable Streaming Hierarchical Aggregation and Reduction Protocol

USENIX ATC '19 - PARTISAN: Scaling the Distributed Actor Runtime

NSDI '24 - Leo: Online ML-based Traffic Classification at Multi-Terabit Line Rate

Expressing High Performance Irregular Computations on the GPU

Trouble-shooting the Data Plane in OVS - Rohith Basavaraja, Jan Scheurich

20171020 - BBR TCP

Seajure Talk: Entity Matching at Vamperity (Amperity's alter ego)

ASPLOS'20 - Session 13A - HMC: Model Checking for Hardware Memory Models

NSDI '22 - IA-CCF: Individual Accountability for Permissioned Ledgers

IEEE Talk | Connected and automated mobility by Robert J. Piechocki.

NSDI '18 - Distributed Network Monitoring and Debugging with SwitchPointer

NSDI '21 - Finding Invariants of Distributed Systems: It's a Small (Enough) World After Al...

'Software Performance: A Shape Not a Number' by Kay Ousterhout

Collective Communications

ASPLOS'22 - Session 4B - Clio: A Hardware-Software Co-Designed Disaggregated Memory System

Collective Action for Collections

[EN] re:publica 2022: Der resiliente Staat: Folgen des Ukrainekriegs für das digitale Deutschland

USENIX ATC '19 lightning talk: Multi-Queue Fair Queueing

OSDI '20 - KungFu: Making Training in Distributed Machine Learning Adaptive