Hashing in Randomized Algorithms - Overview and Applications

1 / 24

Embed Share

Explore the concept of hashing in randomized algorithms, understand its problem definition, solutions, collision handling, and why hashing works effectively in practice. Learn about the efficiency, guarantees, and complexities associated with hashing in maintaining data structures.

blanchette_a Follow

Uploaded on Apr 12, 2025 | 3 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Randomized Algorithms CS648 Lecture 11 Hashing - I 1

Problem Definition ? = 1,2, ,? called universe ? ?and ? = |?| ? ? Examples: ? = 1018, ? = 103 Aim Maintain a data structure for storing ? to support the search query : Does? ?? for any given ? ?.

Solutions Solutions with worst case guarantees Solution for static? : Array storing ? in sorted order Solution for dynamic? : Height Balanced Search trees (AVL trees, Red-Black trees, ) Time per operation: O(log ?), Space: O(?) Alternative: Time per operation: O(1), Space: O(?) Solutions used in practice with no worst case guarantees Hashing.

Hashing Answering a Query: Does? ?? 1. ? ?(?); 2. Search the list stored at ?[?]. Hash table: Elements of ? ?: an array of size ?. Hash function ? : ? [?] ? 0 1 Properties of ? : ? ? computable in O(1) time. Space required by ?: O(1). ? ? How many bits needed to encode ? ?

Collision Definition: Two elements ?,? ?are said to collide under hash function ?if ? ? = ? ? ? 0 1 Worst case time complexity of searching an item ? : No. of elements in ? colliding with ?. A Discouraging fact: No hash function can be found which is good for all ?. Proof: At least ?/? elements from ? are mapped to a single index in ?. ? ?

Collision Definition: Two elements ?,? ?are said to collide under hash function ?if ? ? = ? ? ? 0 1 Worst case time complexity of searching an item ? : No. of elements in ? colliding with ?. A Discouraging fact: No hash function can be found which is good for all ?. Proof: At least ?/? elements from ? are mapped to a single index in ?. ?/? ? ?

Hashing A very popular heuristic since 1950 s Achieves O(1) search time in practice Worst case guarantee on search time: O(?) Question: Can we have a hashing ensuring O(1) worst case guarantee on search time. O(?) space. Expected O(?) preprocessing time. The following result gave an answer in affirmative Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984.

WHY DOES HASHING WORK SO WELL IN PRACTICE ?

Why does hashing work so well in Practice ? Question: What is the simplest hash function ? : ? [?] ? Answer: ? ? = ? ??? ? Hashing works so well in practice because the set ? is usually a uniformly random subset of ?. Let us give a theoretical reasoning for this fact.

Why does hashing work so well in Practice ? 1 2 Let ?1,?2, ,?? denote ? elements selected randomly uniformly from ? to form ?. Question: What is expected number of elements colliding with?1? Answer: Let ?1takes value ?. P(?? collides with ?1) = ?? ? ? ? How many possible values can ?? take ? How many possible values can collide with ? ? ? + ? ? + 2? ? 1 ? + 3? m

Why does hashing work so well in Practice ? 1 2 Let ?1,?2, ,?? denote ? elements selected randomly uniformly from ? to form ?. Question: What is expected number of elements colliding with?1? Answer: Let ?1takes value ?. P(?? collides with ?1) = ? ? ? Values which may collide with ? under the hash function ? ? = ? ??? ? ? ? ? 1 Expected number of elements of ? colliding with ?1 = ? + ? ? + 2? ? ? = ? 1(? 1) = ? 1 for ? = ?(?) ? + 3? m

Why does hashing work so well in Practice ? Conclusion ? ? = ? ??? ?works so well because for a uniformly random subset of ?, the expected number of collision at an index of ? is O(1). 1. 2. It is easy to fool this hash function such that it achieves O(s) search time. (do it as a simple exercise). This makes us think: How can we achieve worst case O(1) search time for a given set ?.

HOW TO ACHIEVE WORST CASE O(1) SEARCH TIME

Key idea to achieve worst case O(1) search time Observation: Of course, no single hash function is good for every possible ?. But we may strive for a hash function which is good for a given ?. A promising direction: Find out a set of hash functions Hsuch that For any given ?, many of them are good. The notion of goodness is captured formally by Universal hash family in the following slide. Select a function randomly from H and try for ?.

UNIVERSAL HASH FAMILY

Universal Hash Family Definition: A collection ? of hash-functions is said to be universal if there exists a constant ? such that for any ?,? ?, ? ? ?? ?? ? ? = ? ? Fact: Set of all functions from ? to [?] is a universal hash family (do it as homework). Question: Can we use the set of all functions as universal hash family in real life ? Answer: No. There are ??possible functions. Every pair of them must differ in at least one bit. At least one of them will require ?log? bits to encode. So the space occupied by a randomly chosen hash function is too large . Question: Does there exist a Universal hash family whose hash functions have a compact encoding?

Universal Hash Family Definition: A collection ? of hash-functions is said to be universal if there exists a constant ? such that for any ?,? ?, ? ? ?? ?? ? ? = ? ? There indeed exist many c-Universal hash families with compact hash function Example: Let ??: ? [?] defined as ??? = ?? ??? ? ??? ? ? = ?? ? ? ? ?} is ?-universal. This looks complicated. In the next class we shall show that it is very natural and intuitive. For today s lecture, you don t need it

STATIC HASHING WORST CASE O(1) SEARCH TIME

The Journey One Milestone in Our Journey: A perfect hash function using hash table of size O(?2) Tools Needed: Universal Hash Family where ? is a small constant Elementary Probability

Perfect hashing using O(??) space Let ? be Universal Hash Family. Let ? : the number of collisions for ? when ? ?? ? Question: What is ?[?] ? ??,?= ? if ? ? = ?(?) otherwise ? ? = ??,? ?<? ??? ?,? ? ? ? = ?[??,?] ?<? ??? ?,? ? = ?[??,?= ?] ?<? ??? ?,? ? ? ? ?<? ??? ?,? ? =? ? ?(? ?) ?

Perfect hashing using O(??) space Let ? be Universal Hash Family. Let ? : the number of collisions for ? when ? ?? ? ? ? ?(? ?) ? Lemma1: ?[?] = Question: How large should ? be to achieve no collision ? Question: How large should ? be to achieve ? ? =? ? ? Answer: Pick ? = ???.

Perfect hashing using O(??) space Let ? be Universal Hash Family. Let ? : the number of collisions for ? when ? ?? ? ? ? ?(? ?) ? Lemma1: ?[?] = Observation: ? ? ? ?when ? = ???. Question: What is the probability of no collision when ? = ???? Answer: No collision ? = ? P(No collision ) = P(? = ?) = ? P(? ?) ? ? ? =? ? Use Markov s Inequality to bound it.

Perfect hashing using O(??) space Let ? be Universal Hash Family. Lemma2:For ? = ???, there will be no collision with probability at least 1 2. Algorithm1: Perfect hashing for ? Repeat 1. Pick ? ?? ; 2. ? the number of collisions for ? under ?. Until ? = ?. Theorem: A perfect hash function can be computed for ? in expected O(??) time. Corollary: A hash table occupying O(??) space and worst case O(?) search time.

Hashing with O(?) space and O(1) worst case search time We have completed almost 90% of our journey. To achieve the goal of O(?) space and worst case O(?) search time, here is the sketch (the details will be given in the beginning of the next class) Use the same hashing scheme as used in Algorithm1 except that use ? = O(?). Of course, there will be collisions. Use an additional level of hash tables to take care of collisions. In the next class: We shall complete our algorithm for hashing with O(?) space and O(1) worst case search time We shall present a very natural way to design various Universal Hash Families.

Hashing in Randomized Algorithms - Overview and Applications

Download Presentation

Presentation Transcript

Related

More Related Content