filmov
tv
Bay Area AI: Succinct: Enabling Queries on Compressed Data w/ Anurag Khandelwal
Показать описание
-----
Cloud services today need to perform fast, interactive queries on large data volumes. Several recent studies have shown that data is growing faster than memory capacity, making in-memory query execution increasingly challenging. At UC Berkeley, we have built Succinct, a distributed data store that overcomes this problem by enabling a wide range of interactive queries (e.g., search, random access, range queries, and even regular expressions) directly on compressed data.
Besides its ability to execute queries on compressed data, Succinct differs from existing data stores along several dimensions. First, Succinct unifies several powerful data models (key-value stores, document stores, tables, etc.) using a single interface. Second, Succinct enables applications to choose a desired compression factor, allowing applications to use larger memory for improved performance. Finally, Succinct allows applications to change the compression factor on the fly, enabling new approaches to handling skewed query distributions, time-varying loads, and failure tolerance.
In this talk, I will describe Succinct's design, implementation and semantics. Succinct is completely open-sourced, and we have also released Succinct as a library that simplifies integration of Succinct data structures and techniques with existing data stores.
Scalæ By the Bay 2016 conference
-- is held on November 11-13, 2016 at Twitter, San Francisco, to share the best practices in building data pipelines with three tracks:
* Functional and Type-safe Programming
* Reactive Microservices and Streaming Architectures
* Data Pipelines for Machine Learning and AI
Cloud services today need to perform fast, interactive queries on large data volumes. Several recent studies have shown that data is growing faster than memory capacity, making in-memory query execution increasingly challenging. At UC Berkeley, we have built Succinct, a distributed data store that overcomes this problem by enabling a wide range of interactive queries (e.g., search, random access, range queries, and even regular expressions) directly on compressed data.
Besides its ability to execute queries on compressed data, Succinct differs from existing data stores along several dimensions. First, Succinct unifies several powerful data models (key-value stores, document stores, tables, etc.) using a single interface. Second, Succinct enables applications to choose a desired compression factor, allowing applications to use larger memory for improved performance. Finally, Succinct allows applications to change the compression factor on the fly, enabling new approaches to handling skewed query distributions, time-varying loads, and failure tolerance.
In this talk, I will describe Succinct's design, implementation and semantics. Succinct is completely open-sourced, and we have also released Succinct as a library that simplifies integration of Succinct data structures and techniques with existing data stores.
Scalæ By the Bay 2016 conference
-- is held on November 11-13, 2016 at Twitter, San Francisco, to share the best practices in building data pipelines with three tracks:
* Functional and Type-safe Programming
* Reactive Microservices and Streaming Architectures
* Data Pipelines for Machine Learning and AI