Lower bounds for approximate membership dynamic data structures

Lower bounds for  approximate membership dynamic data structures
Slide Note
Embed
Share

Information theory's powerful role in proving lower bounds for data structures. Discusses static and dynamic data structures, including the approximate set membership problem and Bloom filters. Explores how these structures represent subsets of a large universe with space-saving benefits.

  • Data Structures
  • Information Theory
  • Dynamic Data
  • Approximate Membership
  • Bloom Filters

Uploaded on Feb 15, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Lower bounds for approximate membership dynamic data structures Shachar Lovett IAS Ely Porat Bar-Ilan University Synergies in lower bounds, June 2011

  2. Information theoretic lower bounds Information theory is a powerful tool to prove lower bounds, e.g. in data structures Study size of data structure (unlimited access) Static d.s.: pure information theory Dynamic d.s.: communication game

  3. Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

  4. Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

  5. Approximate set membership Large universe U Represent subset S U U ~S Query: is x S? S Data structure representing S approximately: If x S: answer YES always If x S: answer NO with high probability Why approximately? To save space

  6. Applications Storage (or communication) is costly, but a small false positive error can be tolerated Original applications (70 s): dictionaries, databases Bloom filters Nowadays: mainly network applications

  7. Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

  8. Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} 0 0 0 0 0 0 0 0 0 0 Bit array of length m

  9. Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x1)=4 0 0 0 1 0 0 0 0 0 0 Bit array of length m

  10. Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x2)=1 1 0 0 1 0 0 0 0 0 0 Bit array of length m

  11. Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x3)=4 1 0 0 1 0 0 0 0 0 0 Bit array of length m

  12. Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? 1 0 0 1 0 0 0 0 0 0 Bit array of length m

  13. Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? h(y)=3 1 0 0 1 0 0 0 0 0 0 Bit array of length m

  14. Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? NO h(y)=3 1 0 0 1 0 0 0 0 0 0 Bit array of length m

  15. Bloom filters: analysis S={x1,x2, ,xn} hash 1 0 0 1 0 0 0 0 0 0 Bit array of length m Query: y S? If y S: returns YES always If y S: returns NO with probability #0's m 1.4 m n Error : Error : m 1.4log (1/ ) n (repetition) 2

  16. Known bounds Upper bounds (e.g. algorithms) Bloom filter: Improvements: [Porat-Matthias 03, Arbitman-Naor-Segev 10] Lower bounds: information theoretic: Can be matched by static data structures [Charles-Chellapilla 08,Dietzfelbinger-Pagh 08,Porat 08] This work: dynamic d.s. 1. log (1/ ) 4 log (1 / n )n + m m 2 ( ) O n 2 log (1/ )n m 2 / ) + ( ) n log (1 m n 2

  17. Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

  18. Static lower bounds Static settings: insert + query m bits Query: y Insert: x1, ,xn Yao s min-max principle: prove lower bound for deterministic data structure, randomized inputs

  19. Static lower bounds m bits Query: y Insert: x1, ,xn Deterministic data structure: compression maps all sets to a small family of sets Input: random set Accept set: Properties: Small memory: No false negatives: Few false positives: = ) { , x , } S ( x 1 { : y Query y I n S = = ( , ( ) ) } A nsert S true # ( ) A S S [| ( )|] S A S E_ 2m ( ) A S | | U Optimal setting: TInsert T = = ( ) { ( ) ( )} A S Inser S t

  20. Static lower bounds m bits Query: y Insert: x1, ,xn U A(S) S = | | | | S n U Set S, Represented by ( ) S A , E\ [| ( A S )|] | | S U S Goal: show #A(S) large

  21. Static lower bounds m bits Query: y Insert: x1, ,xn Properties: ( ), A S S [| ( )|] A S E_ | | U S Assume that If then # S = #{ | ( ) | ( )| A S | A S A S # ( ) A S | U = U ( ) A S A S A | | S | = ( | n | ( )| | } |) U A U n n (1/ ) General case: convexity

  22. Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

  23. Dynamic lower bounds Basic dynamic settings: two inserts + query Break inputs to k, n-k chunks m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y

  24. Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Accepting sets: Properties: | ( ( , E ( ( , A A x , ), , , ) x x x + 1 1 k k n A A x x , , ( ( , A A x , x , ), , , ) x x x , x x x | + )| n 1 1 ), 1 n k k n , | U + 1 1 k k General approach: analyze size of accepting sets Sets A(x1, ,xk) can t be too small (covering) Sets A(A(x1, ,xk),xk+1, ,xn) can t be too large (error) These yield the trivial lower bound again

  25. Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Method of typical inputs On a typical input: A(x1, ,xk) not too small A(A(x1, ,xk),xk+1, ,xn) not too large Inputs uncorrelated with data structure: | ( , A x , )| x 1 | | { , , } ( , x A , )| ( ) k x x x n k + 1 1 k n k | U Yields an improved lower bound (note: typical can be 1% of inputs)

  26. Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Functional inequality: ( ) n k ( 1 )( ) = 1 / m k m k 2 , 2 Free parameter: k how to break input Optimal choice: log (1/ ) n m n + ( ) 2 log(1/ ) Extension: break input into more parts Doesn t seem to help much

  27. Summary Approximate membership problem Static algorithms match static information theoretic lower bound: log (1/ ) m n 2 This work: new dynamic information theoretic lower bound n n + + log (1/ ) ( ) log (1/ ) ( ) n m n O 2 2 log(1/ ) THANK YOU!

More Related Content