USENIX ATC '22 - Building a High-performance Fine-grained Deduplication Framework...

Показать описание

USENIX ATC '22 - Building a High-performance Fine-grained Deduplication Framework for Backup Storage with High Deduplication Ratio

Xiangyu Zou and Wen Xia, Harbin Institute of Technology, Shenzhen; Philip Shilane, Dell Technologies; Haijun Zhang and Xuan Wang, Harbin Institute of Technology, Shenzhen

Fine-grained deduplication, which first removes identical chunks and then eliminates redundancies between similar but non-identical chunks (i.e., delta compression), could exploit workloads' compressibility to achieve a very high deduplication ratio but suffers from poor backup/restore performance. This makes it not as popular as chunk-level deduplication thus far. This is because allowing workloads to share more references among similar chunks further reduces spatial/temporal locality, causes more I/O overhead, and leads to worse backup/restore performance.

In this paper, we address issues for different forms of poor locality with several techniques, and propose MeGA, which achieves backup and restore speed close to chunk-level deduplication while preserving fine-grained deduplication's significant deduplication ratio advantage. Specifically, MeGA applies (1) a backup-workflow-oriented delta selector to address poor locality when reading base chunks, and (2) a delta-friendly data layout and "Always-Forward-Reference" traversing in the restore workflow to deal with the poor spatial/temporal locality of deduplicated data.

Evaluations on four datasets show that MeGA achieves a better performance than other fine-grained deduplication approaches. In particular, compared with the traditional greedy approach, MeGA achieves a 4.47–34.45 times higher backup performance and a 30–105 times higher restore performance while maintaining a very high deduplication ratio.

Рекомендации по теме

USENIX ATC '22 - Building a High-performance Fine-grained Deduplication Framework...

USENIX ATC '22 - Building a High-performance Fine-grained Deduplication Framework...

USENIX ATC '22 - Riker: Always-Correct and Fast Incremental Builds from Simple Specifications

USENIX ATC '22 - BBQ: A Block-based Bounded Queue for Exchanging Data and Profiling

USENIX ATC '22 - CRISP: Critical Path Analysis of Large-Scale Microservice Architectures

USENIX ATC '22 - Cachew: Machine Learning Input Data Processing as a Service

USENIX ATC '22 - High Throughput Replication with Integrated Membership Management

USENIX ATC '22 - Zero-Change Object Transmission for Distributed Big Data Analytics

USENIX ATC '22 - Whale: Efficient Giant Model Training over Heterogeneous GPUs

USENIX ATC '22 - Hardening Hypervisors with Ombro

USENIX ATC '22 - Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memor...

USENIX ATC '22 - Serving Heterogeneous Machine Learning Models on Multi-GPU Servers...

USENIX ATC '22 - uKharon: A Membership Service for Microsecond Applications

USENIX ATC '22 - FpgaNIC: An FPGA-based Versatile 100Gb SmartNIC for GPUs

USENIX ATC '22 - DepFast: Orchestrating Code of Quorum Systems

USENIX ATC '19 - Measure, Then Build

USENIX ATC '22 - Campo: Cost-Aware Performance Optimization for Mixed-Precision Neural Network....

USENIX ATC '22 - Co-opting Linux Processes for High-Performance Network Simulation

USENIX ATC '22 - RunD: A Lightweight Secure Container Runtime for High-density Deployment...

USENIX ATC '22 - Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL.....

USENIX ATC '22 - CoVA: Exploiting Compressed-Domain Analysis to Accelerate Video Analytics

USENIX ATC '22/OSDI '22 Joint Keynote Address - Surprise-Inspired Networking

USENIX ATC '22 - PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training

USENIX ATC '22/OSDI '22 Joint Keynote Address - The Computing and Information Science...

USENIX ATC '22 - Sift: Using Refinement-guided Automation to Verify Complex Distributed Systems