Lower bounds for approximate membership dynamic data structures
Information theory's powerful role in proving lower bounds for data structures. Discusses static and dynamic data structures, including the approximate set membership problem and Bloom filters. Explores how these structures represent subsets of a large universe with space-saving benefits.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Lower bounds for approximate membership dynamic data structures Shachar Lovett IAS Ely Porat Bar-Ilan University Synergies in lower bounds, June 2011
Information theoretic lower bounds Information theory is a powerful tool to prove lower bounds, e.g. in data structures Study size of data structure (unlimited access) Static d.s.: pure information theory Dynamic d.s.: communication game
Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds
Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds
Approximate set membership Large universe U Represent subset S U U ~S Query: is x S? S Data structure representing S approximately: If x S: answer YES always If x S: answer NO with high probability Why approximately? To save space
Applications Storage (or communication) is costly, but a small false positive error can be tolerated Original applications (70 s): dictionaries, databases Bloom filters Nowadays: mainly network applications
Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds
Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} 0 0 0 0 0 0 0 0 0 0 Bit array of length m
Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x1)=4 0 0 0 1 0 0 0 0 0 0 Bit array of length m
Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x2)=1 1 0 0 1 0 0 0 0 0 0 Bit array of length m
Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x3)=4 1 0 0 1 0 0 0 0 0 0 Bit array of length m
Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? 1 0 0 1 0 0 0 0 0 0 Bit array of length m
Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? h(y)=3 1 0 0 1 0 0 0 0 0 0 Bit array of length m
Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? NO h(y)=3 1 0 0 1 0 0 0 0 0 0 Bit array of length m
Bloom filters: analysis S={x1,x2, ,xn} hash 1 0 0 1 0 0 0 0 0 0 Bit array of length m Query: y S? If y S: returns YES always If y S: returns NO with probability #0's m 1.4 m n Error : Error : m 1.4log (1/ ) n (repetition) 2
Known bounds Upper bounds (e.g. algorithms) Bloom filter: Improvements: [Porat-Matthias 03, Arbitman-Naor-Segev 10] Lower bounds: information theoretic: Can be matched by static data structures [Charles-Chellapilla 08,Dietzfelbinger-Pagh 08,Porat 08] This work: dynamic d.s. 1. log (1/ ) 4 log (1 / n )n + m m 2 ( ) O n 2 log (1/ )n m 2 / ) + ( ) n log (1 m n 2
Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds
Static lower bounds Static settings: insert + query m bits Query: y Insert: x1, ,xn Yao s min-max principle: prove lower bound for deterministic data structure, randomized inputs
Static lower bounds m bits Query: y Insert: x1, ,xn Deterministic data structure: compression maps all sets to a small family of sets Input: random set Accept set: Properties: Small memory: No false negatives: Few false positives: = ) { , x , } S ( x 1 { : y Query y I n S = = ( , ( ) ) } A nsert S true # ( ) A S S [| ( )|] S A S E_ 2m ( ) A S | | U Optimal setting: TInsert T = = ( ) { ( ) ( )} A S Inser S t
Static lower bounds m bits Query: y Insert: x1, ,xn U A(S) S = | | | | S n U Set S, Represented by ( ) S A , E\ [| ( A S )|] | | S U S Goal: show #A(S) large
Static lower bounds m bits Query: y Insert: x1, ,xn Properties: ( ), A S S [| ( )|] A S E_ | | U S Assume that If then # S = #{ | ( ) | ( )| A S | A S A S # ( ) A S | U = U ( ) A S A S A | | S | = ( | n | ( )| | } |) U A U n n (1/ ) General case: convexity
Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds
Dynamic lower bounds Basic dynamic settings: two inserts + query Break inputs to k, n-k chunks m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y
Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Accepting sets: Properties: | ( ( , E ( ( , A A x , ), , , ) x x x + 1 1 k k n A A x x , , ( ( , A A x , x , ), , , ) x x x , x x x | + )| n 1 1 ), 1 n k k n , | U + 1 1 k k General approach: analyze size of accepting sets Sets A(x1, ,xk) can t be too small (covering) Sets A(A(x1, ,xk),xk+1, ,xn) can t be too large (error) These yield the trivial lower bound again
Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Method of typical inputs On a typical input: A(x1, ,xk) not too small A(A(x1, ,xk),xk+1, ,xn) not too large Inputs uncorrelated with data structure: | ( , A x , )| x 1 | | { , , } ( , x A , )| ( ) k x x x n k + 1 1 k n k | U Yields an improved lower bound (note: typical can be 1% of inputs)
Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Functional inequality: ( ) n k ( 1 )( ) = 1 / m k m k 2 , 2 Free parameter: k how to break input Optimal choice: log (1/ ) n m n + ( ) 2 log(1/ ) Extension: break input into more parts Doesn t seem to help much
Summary Approximate membership problem Static algorithms match static information theoretic lower bound: log (1/ ) m n 2 This work: new dynamic information theoretic lower bound n n + + log (1/ ) ( ) log (1/ ) ( ) n m n O 2 2 log(1/ ) THANK YOU!