How Netflix Delivers Key-Value and Time-Series Storage at Any Scale - Joseph Lynch & Vidhya Arvind

Показать описание

How Netflix Delivers Key-Value and Time-Series Storage at Any Scale - Joseph Lynch & Vidhya Arvind, Netflix

At Netflix, Apache Cassandra's bread and butter workloads are wide-column storage for Key-Value and Time-Series use cases, but for those that operate at scale they know that if you just structure your tables naively your clusters will become unstable as partitions or columns grow in size. In this talk, we show how to design reliable APIs and lay out Key-Value and Time-Series data in Apache Cassandra for petabyte scale datasets. For example, most Key-Value data is small, but for large partitions we present a novel technique to dynamically bucket data, providing users with fast access for small data and linearly scalable latency for large values. Next, we will show how to lay out Time-Series datasets with table sharding, time and random bucketing so that large partitions are automatically split while maintaining aggressive latency goals. In addition, since we use tables for data expiration rather than compaction, we can get up to 2x more storage out of the same disk space. By combining fully-idempotent APIs, novel table layouts, bucketing algorithms, and compression schemes Netflix has been able to scale Apache Cassandra usage orders of magnitude further than we could before.