Efficient Encrypted Deduplication for Balancing Storage and Confidentiality

balancing storage efficiency and data n.w
1 / 15
Embed
Share

Explore the innovative approach of Tunable Encrypted Deduplication (TED) that harmonizes storage efficiency and data confidentiality. Learn about the encryption primitives, key derivation process, and the design overview of TED, enabling a balance between trade-offs in outsourced storage environments.

  • Encrypted Deduplication
  • Data Confidentiality
  • Storage Efficiency
  • Encryption Primitives
  • TED

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Balancing Storage Efficiency and Data Confidentiality with Tunable Encrypted Deduplication Jingwei Li*, Zuoru Yang#, Yanjing Ren*, Patrick P. C. Lee#, Xiaosong Zhang* *University of Electronic Science and Technology of China (UESTC) #The Chinese University of Hong Kong (CUHK) EuroSys 2020 1

  2. Deduplication Deduplication coarse-grained compression Units: chunks (fixed- or variable-size) Stores only one copy of duplicate chunks Storage space saved by 5/12 = 42%! 2

  3. Encrypted Deduplication Augments deduplication with encryption for data confidentiality Application: outsourced storage Storage Provider Deduplication Encryption Which crypto primitive should be used? Encryption 3

  4. Encryption Primitives Symmetric-key encryption (SKE) Derives a random key for chunk encryption/decryption Ensures confidentiality, but prohibits deduplication of duplicate chunks Message-locked encryption (MLE) [Bellare et al., Eurocrypt 13] Derives a deterministic key from chunk content Supports deduplication, but leaks frequency distribution of plaintext chunks [Li et al., DSN 17] Pose a dilemma of choosing the right cryptographic primitive 4

  5. Our Contributions TED: a tunable encrypted deduplication primitive for balancing trade-off between storage efficiency and data confidentiality Includes three new designs Minimizes frequency leakage via a configurable storage blowup factor TEDStore: encrypted deduplication prototype based on TED TED incurs only limited performance overhead Extensive trace-driven analysis and prototype experiments 5

  6. Main Idea Key derivation with three inputs: chunk M, current frequency f, and balance parameter t f t Hash Function M K K = H(M || f/t ) f: cumulative and increases with number of duplicates of M t: controls maximum allowed number of duplicate copies for a ciphertext chunk Special cases: t = 1 SKE t MLE 6

  7. Design Overview Key Manager Clients Provider Deduplication Chunk Chunk TED builds on server-aided MLE architecture in DupLESS [Bellare et al., Security 13] Key generation by key manager to prevent offline brute-force attacks 7

  8. Questions Q1: How does the key manager learn chunk frequencies? Low overhead required even for many chunks Q2: How does the key manager generate keys for chunks? Distinct sequences of ciphertext chunks required for identical files Q3: How should the balance parameter t be configured in practice? Adaptive for different workloads 8

  9. Sketch-based Frequency Counting Count-Min Sketch +1 H1(M) +1 f = minimum counter indexed by (i, Hi(M)) M r rows +1 +1 Hr(M) w counters per row Key manager estimates f via Count-Min Sketch [Cormode 2005] Fixed memory usage with provable error bounds Client sends short hashes {Hi(M)} to key manager Key manager cannot readily infer M from short hashes 9

  10. Probabilistic Key Generation Selects K uniformly from candidate keys derived from 0, 1, , f/t Enables probabilistic encryption on identical files Maintains deduplication effectiveness Reason: f is cumulative; keys derived from 0, 1, , f/t -1 have been used to encrypt some old copies of M Already encrypted chunks K {K0, K1, K2, K3} K3 K2 K1 K0 K2 K1 K0 M M M M M M M M Processing sequence 10

  11. Automated Parameter Configuration Configure t by solving optimization problem, given: Frequency distribution for a batch of plaintext chunks Affordable storage blowup b over exact deduplication Goal: minimize frequency leakage Quantify frequency leakage by Kullback-Leibler distance (KLD) KLD: relative entropy to uniform distribution A lower KLD implies higher robustness against frequency analysis Configure t from the returned optimal frequency distribution of ciphertext chunks 11

  12. Evaluation TEDStore realizes TED in encrypted deduplication storage ~4.5K line of C++ code in Linux Trace analysis FSL: file system snapshots (42 backups; 3.08TB raw data) MS: windows file system snapshots (30 backups; 3.91TB raw data) Prototype experiments Local 10 GbE cluster 12

  13. Trade-off Analysis (FSL Dataset) Schemes MLE SKE MinHash [Li et al., DSN 17] Basic TED (varying t) Full TED (varying b) Basic TED and Full TED effectively balance trade-off Full TED readily configures actual storage blowup 13

  14. Prototype Experiments Fast Secure Steps (MD5, AES-128) (SHA-256, AES-256) 0.8ms 2.6ms 0.4ms 0.04ms 0.1ms 4.9ms Chunking Fingerprinting Hashing Key Seeding Key Derivation Encryption Computational time per 1MB of uploads 1.7ms 0.01ms 0.07ms 3.7ms TED operations TED incurs limited overhead (7.2% for Fast; 6.1% for Secure) More results in paper: TED achieves ~30X key generation speedup over existing approaches Multi-client upload/download performance 14

  15. Conclusion TED: encrypted deduplication primitive that enables controllable trade-off between storage efficiency and data confidentiality Sketch-based frequency counting Probabilistic key generation Automated parameter configuration Source code: http://adslab.cse.cuhk.edu.hk/software/ted 15

Related


More Related Content