Pseudorandom Graphs in Data Structures Overview

Pseudorandom Graphs in Data Structures Overview
Slide Note
Embed
Share

The concept of pseudorandom graphs in data structures, this content delves into the common assumption in DS about free access to a completely random function. It discusses the need for explicit and efficient hash functions and highlights the main results and techniques involved in analyzing pseudorandom graphs to save on random bits needed to sample for solutions like Cuckoo Hashing and The Power of Two Choices.

  • Data Structures
  • Pseudorandom Graphs
  • Hash Functions
  • Cuckoo Hashing
  • Efficiency

Uploaded on Mar 07, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Pseudorandom Graphs in Data Structures Omer Reingold Ron Rothblum Udi Wieder MSR-SVC Weizmann MSR-SVC

  2. (Pseudo-)Randomness in Data Structures Common assumption in DS: free access to a completely random function. Typically assume that random function requires no space to store and takes unit time to evaluate. Large body of work dedicated for removing this assumption by constructing explicit and efficient hash functions.

  3. Save on random bits needed to sample This Work Main result: small explicit and efficient family of hash functions suffices for two solutions to the dictionary problem: Cuckoo Hashing The Power of Two Choices Using [MRRR14] Use variant of [CRSW11] construction: Seed length: (log? loglog?) Evaluation time: ( loglog?2) Main Technique: Analyzing pseudorandom graphs.

  4. Prior Work Hash Functions for 2-choice and Cuckoo Hashing Seed Length ?(log2?) Evaluation Time ?(log?) log?-wise independence

  5. Prior Work Hash Functions for 2-choice and Cuckoo Hashing Seed Length ?(log2?) Evaluation Time ?(log?) log?-wise independence ?(log?) ?(1) Best possible ( dream )

  6. Prior Work Hash Functions for 2-choice and Cuckoo Hashing Seed Length ?(log2?) Evaluation Time ?(log?) log?-wise independence ?(??) ?(1) [Seigel04] Hash Function ?(log?) ?(1) Best possible ( dream )

  7. Prior Work Hash Functions for 2-choice and Cuckoo Hashing Seed Length ?(log2?) Evaluation Time ?(log?) log?-wise independence ?(??) ?(1) [Seigel04] Hash Function ?( log log n2) ?(log? loglog?) This Work ?(log?) ?(1) Best possible ( dream )

  8. Cuckoo Hashing [PR04] A data structure for representing a set ? of ? items taken from a universe ?. Supports inserts and lookup. Create a table ? of size ? = ?(?) Sample functions ,?:? [?]. Invariant: item ? ? resides either in ?[ (?)] or ?[?(?)]

  9. Cuckoo Hashing [PR04] If invariant is maintained then: Lookup: ? 1 worst-case. Need to search only 2 locations When is the invariant maintained?

  10. The Cuckoo Graph Set ? ? containing ? keys ,? ? {1, ,?} Every node of the graph can store 2 items S is successfully stored if Every connected subgraph of size k has less than 2k edges Insertion time is bounded by size of the component. Space is linear, more wasteful than standard cuckoo hashing Edge ( (?),?(?)) for every ? ?

  11. Main Technical Result Lemma: If ,? ?????? then for every set ? of ? items whp every connected sub-graph of ? ,?(?) is small and sparse. ?(log?) and ?(1) in expectation. ? < 2 |?|

  12. The power of 2-choices Throw ? balls sequentially into ? bins by: Sampling two bins at random Place the ball in the less loaded of the two. Theorem [ABKU]: The max. load is loglog? w.h.p. Hashing: for ball ?, sample bins as 1? , 2? . Theorem: If induced graph has small and sparse components then load is ?(loglog?)

  13. The [CRSW] Hash Function Idea: gradually increasing independence. log? bits (log?)/2 bits (log?)/4 bits (log?)/2? bits ? 1(?) 2(?) ?(?) Each ? is 2?-wise (almost) independent. ?(loglog?)functions ? log? loglog? seed. [MRRR14]: ?( loglog?2) evaluation time.

  14. The [CRSW] Hash Function [CRSW]: Hash ? items into ? = ?(?) bins whp each bin has ?(log?) balls. W.h.p. More generally, every prefix of ? bits of the output, balances into 2?bins. log? bits (log?)/2 bits (log?)/4 bits (log?)/2? bits ? 1(?) 2(?) ?(?)

  15. High Level Idea Consider some connected graph (?,?) 1. Show that if |?| log? or |?| > 2|?| then whp (?,?) does not appear in ? ,?? . 2. Take union bound over all such graphs

  16. If we had log n-wise independence Consider some connected graph (?,?) 1. Show that if |?| log? or |?| > 2|?| then whp (?,?) does not appear in ? ,?? . Every specific mapping of (V,E) into the cuckoo graph is unlikely.

  17. If we had high independence ,? need to be correct for all edges. All edges are independent (?) ? ?(?)

  18. Main idea Look at blocks of size ? The prefix of ,? determines assignment to blocks. The suffix determines the assignment within a block. Prefix and suffix are independent! ? vertices ? vertices ? vertices (log?)/2 bits (log?)/4 bits (log?)/2? bits ? vertices 1(?) 2(?) ?(?)

  19. Proof Sketch How to bound the prob. of a mapping of ?,? into ? ,?(?)? View the mapping into blocks as a load balancing instance by the prefix of the hash function. Implies the degree of each block is bounded. ? vertices ? vertices Under this assumption, all mappings inside the blocks are independent by the suffix of the functions. ? vertices

  20. Concluding Remarks ? wise independence is not the catch-all notion for explicit hash functions. Load balancing is a strong property of functions that could be used in creative ways. Can we get rid of the loglog? factor in the seed length?

More Related Content