filmov
tv
NSDI '22 - Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems
![preview_player](https://i.ytimg.com/vi/1_OA63YZtNg/maxresdefault.jpg)
Показать описание
NSDI '22 - Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems
Peter Kraft, Fiodar Kazhamiaka, Peter Bailis, and Matei Zaharia, Stanford University
We present data-parallel actors (DPA), a programming model for building distributed query serving systems. Query serving systems are an important class of applications characterized by low-latency data-parallel queries and frequent bulk data updates; they include data analytics systems like Apache Druid, full-text search engines like ElasticSearch, and time series databases like InfluxDB. They are challenging to build because they run at scale and need complex distributed functionality like data replication, fault tolerance, and update consistency. DPA makes building these systems easier by allowing developers to construct them from purely single-node components while automatically providing these critical properties. In DPA, we view a query serving system as a collection of stateful actors, each encapsulating a partition of data. DPA provides parallel operators that enable consistent, atomic, and fault-tolerant parallel updates and queries over data stored in actors. We have used DPA to build a new query serving system, a simplified data warehouse based on the single-node database MonetDB, and enhance existing ones, such as Druid, Solr, and MongoDB, adding missing user-requested features such as load balancing and elasticity. We show that DPA can distribute a system in less than 1K lines of code (greater than 10× less than typical implementations in current systems) while achieving state-of-the-art performance and adding rich functionality.
Peter Kraft, Fiodar Kazhamiaka, Peter Bailis, and Matei Zaharia, Stanford University
We present data-parallel actors (DPA), a programming model for building distributed query serving systems. Query serving systems are an important class of applications characterized by low-latency data-parallel queries and frequent bulk data updates; they include data analytics systems like Apache Druid, full-text search engines like ElasticSearch, and time series databases like InfluxDB. They are challenging to build because they run at scale and need complex distributed functionality like data replication, fault tolerance, and update consistency. DPA makes building these systems easier by allowing developers to construct them from purely single-node components while automatically providing these critical properties. In DPA, we view a query serving system as a collection of stateful actors, each encapsulating a partition of data. DPA provides parallel operators that enable consistent, atomic, and fault-tolerant parallel updates and queries over data stored in actors. We have used DPA to build a new query serving system, a simplified data warehouse based on the single-node database MonetDB, and enhance existing ones, such as Druid, Solr, and MongoDB, adding missing user-requested features such as load balancing and elasticity. We show that DPA can distribute a system in less than 1K lines of code (greater than 10× less than typical implementations in current systems) while achieving state-of-the-art performance and adding rich functionality.