Lower bounds for approximate membership dynamic data structures

Slide Note

Information theory's powerful role in proving lower bounds for data structures. Discusses static and dynamic data structures, including the approximate set membership problem and Bloom filters. Explores how these structures represent subsets of a large universe with space-saving benefits.

kariann Follow

Uploaded on Feb 15, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Lower bounds for approximate membership dynamic data structures Shachar Lovett IAS Ely Porat Bar-Ilan University Synergies in lower bounds, June 2011

Information theoretic lower bounds Information theory is a powerful tool to prove lower bounds, e.g. in data structures Study size of data structure (unlimited access) Static d.s.: pure information theory Dynamic d.s.: communication game

Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

Approximate set membership Large universe U Represent subset S U U ~S Query: is x S? S Data structure representing S approximately: If x S: answer YES always If x S: answer NO with high probability Why approximately? To save space

Applications Storage (or communication) is costly, but a small false positive error can be tolerated Original applications (70 s): dictionaries, databases Bloom filters Nowadays: mainly network applications

Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} 0 0 0 0 0 0 0 0 0 0 Bit array of length m

Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x1)=4 0 0 0 1 0 0 0 0 0 0 Bit array of length m

Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x2)=1 1 0 0 1 0 0 0 0 0 0 Bit array of length m

Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} h(x3)=4 1 0 0 1 0 0 0 0 0 0 Bit array of length m

Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? 1 0 0 1 0 0 0 0 0 0 Bit array of length m

Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? h(y)=3 1 0 0 1 0 0 0 0 0 0 Bit array of length m

Bloom filters S={x1,x2, ,xn} Hash function h:U {1, ,m} Query: y S? NO h(y)=3 1 0 0 1 0 0 0 0 0 0 Bit array of length m

Bloom filters: analysis S={x1,x2, ,xn} hash 1 0 0 1 0 0 0 0 0 0 Bit array of length m Query: y S? If y S: returns YES always If y S: returns NO with probability #0's m 1.4 m n Error : Error : m 1.4log (1/ ) n (repetition) 2

Known bounds Upper bounds (e.g. algorithms) Bloom filter: Improvements: [Porat-Matthias 03, Arbitman-Naor-Segev 10] Lower bounds: information theoretic: Can be matched by static data structures [Charles-Chellapilla 08,Dietzfelbinger-Pagh 08,Porat 08] This work: dynamic d.s. 1. log (1/ ) 4 log (1 / n )n + m m 2 ( ) O n 2 log (1/ )n m 2 / ) + ( ) n log (1 m n 2

Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

Static lower bounds Static settings: insert + query m bits Query: y Insert: x1, ,xn Yao s min-max principle: prove lower bound for deterministic data structure, randomized inputs

Static lower bounds m bits Query: y Insert: x1, ,xn Deterministic data structure: compression maps all sets to a small family of sets Input: random set Accept set: Properties: Small memory: No false negatives: Few false positives: = ) { , x , } S ( x 1 { : y Query y I n S = = ( , ( ) ) } A nsert S true # ( ) A S S [| ( )|] S A S E_ 2m ( ) A S | | U Optimal setting: TInsert T = = ( ) { ( ) ( )} A S Inser S t

Static lower bounds m bits Query: y Insert: x1, ,xn U A(S) S = | | | | S n U Set S, Represented by ( ) S A , E\ [| ( A S )|] | | S U S Goal: show #A(S) large

Static lower bounds m bits Query: y Insert: x1, ,xn Properties: ( ), A S S [| ( )|] A S E_ | | U S Assume that If then # S = #{ | ( ) | ( )| A S | A S A S # ( ) A S | U = U ( ) A S A S A | | S | = ( | n | ( )| | } |) U A U n n (1/ ) General case: convexity

Talk overview Approximate set membership problem Bloom filters (simple near-optimal solution) Lower bounds static case New dynamic lower bounds

Dynamic lower bounds Basic dynamic settings: two inserts + query Break inputs to k, n-k chunks m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y

Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Accepting sets: Properties: | ( ( , E ( ( , A A x , ), , , ) x x x + 1 1 k k n A A x x , , ( ( , A A x , x , ), , , ) x x x , x x x | + )| n 1 1 ), 1 n k k n , | U + 1 1 k k General approach: analyze size of accepting sets Sets A(x1, ,xk) can t be too small (covering) Sets A(A(x1, ,xk),xk+1, ,xn) can t be too large (error) These yield the trivial lower bound again

Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Method of typical inputs On a typical input: A(x1, ,xk) not too small A(A(x1, ,xk),xk+1, ,xn) not too large Inputs uncorrelated with data structure: | ( , A x , )| x 1 | | { , , } ( , x A , )| ( ) k x x x n k + 1 1 k n k | U Yields an improved lower bound (note: typical can be 1% of inputs)

Dynamic lower bounds m bits m bits Insert: x1, ,xk Insert: xk+1, ,xn Query: y Functional inequality: ( ) n k ( 1 )( ) = 1 / m k m k 2 , 2 Free parameter: k how to break input Optimal choice: log (1/ ) n m n + ( ) 2 log(1/ ) Extension: break input into more parts Doesn t seem to help much

Summary Approximate membership problem Static algorithms match static information theoretic lower bound: log (1/ ) m n 2 This work: new dynamic information theoretic lower bound n n + + log (1/ ) ( ) log (1/ ) ( ) n m n O 2 2 log(1/ ) THANK YOU!

Lower bounds for approximate membership dynamic data structures

Download Presentation

Presentation Transcript

Related

More Related Content