The Dynamic Cuckoo Filter for Dynamic Set Representation

Download Presenatation
the dynamic cuckoo filter n.w
1 / 31
Embed
Share

The Dynamic Cuckoo Filter is introduced to address the challenges of representing large-scale dynamic sets, offering flexibility, reliability, and efficient operations. This article discusses the emergence of dynamic sets, related works like Bloom filters and Cuckoo filters, and the advantages of the Dynamic Cuckoo Filter in set representation.

  • Cuckoo Filter
  • Dynamic Set
  • Bloom Filter
  • Set Representation
  • Data Structures

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. The Dynamic Cuckoo Filter Authors: Publisher: Presenter: Date: Hanhua Chen, Liangyi Liao, Hai Jin, Jie Wu 2017 ICNP Tzu-Chieh, Lin 106/11/08 National Cheng Kung University CSIE Computer & Internet Architecture Lab 1

  2. INTRODUCTION The emergence of large-scale dynamic sets in real applications creates stringent requirements for approximate set representation structures. (1) The capacity of the set representation structures should support flexibly extending or reducing to cope with dynamically changing of set size. (2) The set representation structures should support reliable delete operation. Existing techniques for approximate set representation e.g., the cuckoo filter, the Bloom filter and its variants cannot meet both the requirements of a dynamic set. 2 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  3. RELATED WORK Standard Bloom filter Counting Bloom filter Cuckoo filter Dynamic Bloom filter 3 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  4. Bloom filter(BF) 4 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  5. Counting Bloom filter(CBF) Counting Bloom filter (CBF) [9] replaces each bit of an BF with a counter of s bits to support item deletion. However, the space cost of a CBF is s times larger than the SBF. 5 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  6. Cuckoo Hash A basic cuckoo hash table consists of an array of buckets where each item has two candidate buckets determined by hash functions h1(x) and h2(x). The lookup procedure checks both buckets to see if either contains this item. Support insert and delete. 6 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  7. Cuckoo Filter(CF) An array of buckets Each bucket has b entries Fingerprint based matching Relocation when collision ?= ?? ?(?) 7 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  8. Dynamic Bloom Filter Linked list of s Counting Bloom Filters (CBF). Extends capacity by appending new building blocks of CBFs. Drawback: unreliable deletion. 8 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  9. Multiple Address Problem in DBF Give up deleting the element found in multiple building blocks. Raises up false positives. 9 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  10. 10 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  11. Dynamic Cuckoo Filter DCF uses Cuckoo Filter(CF) as building block. Extends capacity by appending new building blocks. Monopolistic fingerprint avoid multiple address problem. 11 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  12. 12 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  13. Insert The algorithm keeps two pointers, curCF and nextCF. The curCF points to the current CF and if the curCF is full, then new CF building block will be allocated and assigned to curCF. If the number of relocation reaches the specified maximum value, denote as MNK, the algorithm will record the last fingerprint kicked out, denote as victim. In order to avoid insert failure, the victim will be keep inserted into the nextCF iteratively. 13 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  14. Insert m i n k o l m i g j n k o l j 1(?) e f g h e f q h x 2(?) p q r s p x r s 14 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  15. Query Membership testing with a DCF needs to probe every CF in the DCF, i.e., 2bs entries, in the worst case. The time complexity of the membership query operation of a DCF and a CF are O(bs) and O(b), respectively. 15 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  16. Query m i g j n k o l 1(?) e f q h x 2(?) p x r s 16 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  17. Delete The deletion of an item x needs to first perform a membership query operation. If a corresponding fingerprint x is found, then the matched fingerprints will be removed. The time complexity of the delete operation is the same as those of the membership query operation, i.e., O(bs) for a DCF and O(b) for a CF. 17 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  18. Delete m i g j n k o l m i g j n k o l 1(?) x e f q h e f q h 2(?) p x r s p r s 18 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  19. Compact The compact operation of the DCF iteratively moves the fingerprints from sparse CFs to their counterpart buckets in other denser CFs. 19 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  20. Compact CF1(dense) CF2(sparse) m i g j n k o l m i t e u p g j n k o l t e f q h f q h u p x r s x r s 20 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  21. False Positive Rate The false positive rate is defined as the probability that at least one CF among all the CFs reports a false positive for x. 21 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  22. Multiple Value Diverse items have very low probability to be inserted with the same fingerprint in the same bucket address in the DCF design. Considering the case that the items x and y share the same bucket address and happen to collide in the same fingerprint ( x = y). Even if the DCF encounters multiple value, removing one matched fingerprints does not lead to redundant information left thanks to the monopolistic fingerprint. This guarantees that no false negative is introduced. 22 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  23. Dulpicates Assuming the item x has been inserted twice, there must be two copies of the fingerprint x inserted in the DCF. Obviously, deleting item x thoroughly requires removing fingerprint x twice. If inserting duplicated items is not allowed, the DCF can filter the duplicate by performing a query operation before insertion. This guarantees that items can be removed thoroughly by removing fingerprint once. At the same time, it might introduce slight false negatives. 23 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  24. Dulpicates Considering the case that we mentioned in multiple value, items x and y share the same bucket address and happen to collide in the same fingerprint ( x = y). When x and y both need to be inserted into the DCF, only one copy of the fingerprint is inserted to avoid duplicates. After inserting a single copy, a possible false negative may occur. For example, item y will no longer be found if we delete item x. 24 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  25. OPTIMIZATION 25 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  26. 26 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  27. PERFORMANCE We choose SHA1 to generate hash values. In the DCF implementation, we use the highest 32 bits and the lowest 32 bits to represent the fingerprint and one of the bucket address, respectively. In the experiment, we set the false positive rates of both the DCF and the DBF to a fixed value of 1.17 10 2. In the first experiment we fix the parameter s at six for both the DCF and the DBF, and configure both the DCF and the DBF with the space optimized parameter settings when varying the size of a dynamic set from zero to 64,512.. The second experiment evaluate the influence of the parameter s. We fix the set cardinality at 46,080 and vary the value of s to examine how it affects the operation performance 27 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  28. PERFORMANCE - Insert 28 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  29. PERFORMANCE - Query 29 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  30. PERFORMANCE - Delete 30 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  31. PERFORMANCE - Compact 31 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

Related


More Related Content