Trade-offs in Hash Coding: Analyzing Computational Factors
Hash coding, Trade-offs, Computational analysis
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Space/Time Trade-offs in Hash Coding with Allowable Errors Burton H Bloom Communications of the ACM, Vol. 13, Issue. 7, pp. 422-426, 1970 Presenter: Yen-Yu Chen Date: Aug.27, 2024
Abstract(1/3) In this paper trade-offs among certain computational factors in hash coding are analyzed. The paradigm problem considered is that of testing a series of messages one-by-one for membership in a given set of messages. Two new hash-coding methods are examined and compared with a particular conventional hash-coding method. The computational factors considered are the size of the hash area (space), the time required to identify a message as a nonmember of the given set (reject time), and an allowable error frequency. 2
Abstract(2/3) The new methods are intended to reduce the amount of space required to contain the hash-coded information from that associated with conventional methods. The reduction in space is accomplished by exploiting the possibility that a small fraction of errors of commission may be tolerable in some applications, in particular, applications in which a large amount of data is involved and a core resident hash area is consequently not feasible using conventional methods. 3
Abstract(3/3) In such applications, it is envisaged that overall performance could be improved by using a smaller core resident hash area in conjunction with the new methods and, when necessary, by using some secondary and perhaps time-consuming test to catch the small fraction of errors associated with the new methods. An example is discussed which illustrates possible areas of application for the new methods. Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time. 4
Problem set 8 8 9 7 algorithm 5 3 negative positive 5
Conventional Hash-Coding Method set ?message PRNG function: ?() ?bit 8 9 7 5 3 > ? 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 ? + 1bit 6
Conventional Hash-Coding Method set ?message add 7 1. ?bit 2. ?(7) = 9 8 9 7 5 3 > ? 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 1 7 ? + 1bit 7
Conventional Hash-Coding Method set ?message 1. add 8 ?bit 2. ?(8) = 0 8 9 7 5 3 > ? 0 1 2 3 4 5 6 7 8 9 1 8 0 0 0 0 0 0 0 0 1 7 ? + 1bit 8
Conventional Hash-Coding Method set ?message 1. add 9 ?bit 2. ?(9) = 6 8 9 7 5 3 > ? 0 1 2 3 4 5 6 7 8 9 1 8 0 0 0 0 0 1 9 0 0 1 7 ? + 1bit 9
Conventional Hash-Coding Method set ?message 1. add 3 ?bit 2. ?(3) = 6 8 9 7 3. ?(6) = 0 5 4. ?(0) = 2 3 > ? 0 1 2 3 4 5 6 7 8 9 1 8 0 1 3 0 0 0 1 9 0 0 1 7 ? + 1bit 10
Conventional Hash-Coding Method set ?message 1. Add 5 ?bit 2. ? 5 = 9 8 3. ?(9) = 6 9 7 4. ? 6 = 0 5 5. ?(0) = 2 3 6. ?(2) = 4 > ? 0 1 2 3 4 5 6 7 8 9 1 8 0 1 3 0 1 5 0 1 9 0 0 1 7 ? + 1bit 11
Conventional Hash-Coding Method set ?message 1. Search 7 ?bit 2. ? 7 = 9 8 9 7 3. positive 5 3 > ? 0 1 2 3 4 5 6 7 8 9 1 8 0 1 3 0 1 5 0 1 9 0 0 1 7 ? + 1bit 12
Conventional Hash-Coding Method set ?message 1. Search 5 ?bit 2. ? 5 = 9 8 3. ?(9) = 6 9 7 4. ? 6 = 0 5. ?(0) = 2 5 3 6. ?(2) = 4 7. positive > ? 0 1 2 3 4 5 6 7 8 9 1 8 0 1 3 0 1 5 0 1 9 0 0 1 7 ? + 1bit 13
Conventional Hash-Coding Method set ?message 1. Search 6 ?bit 2. ? 6 = 0 8 3. ?(0) = 2 9 7 4. ?(2) = 4 5 5. 6. ?(4) = 1 negative 3 > ? 0 1 2 3 4 5 6 7 8 9 1 8 0 1 3 0 1 5 0 1 9 0 0 1 7 ? + 1bit 14
Method 1 set ?message PRNG function: ?() ?bit Encode Hash function: ?() 8 9 7 5 3 > ? 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 ?bit 15
Method 1 set ?message 1. Add 9 ?bit 2. ?(9) = 10010 8 9 7 ?(9) = 6 3. 5 3 > ? 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 1 0010 0 0 0 ?bit 16
Method 1 set ?message 1. Add 3 ?bit 2. ?(3) = 11011 8 9 7 ?(3) = 6 3. 5 ?(6) = 0 4. 3 > ? 0 1 2 3 4 5 6 7 8 9 1 1011 0 0 0 0 0 1 0010 0 0 0 ?bit 17
Method 1 set ?message 1. Add 5 ?bit 2. ?(5) = 11000 8 9 7 ?(5) = 9 3. 5 3 > ? 0 1 2 3 4 5 6 7 8 9 1 1011 0 0 0 0 0 1 0010 0 0 1 1000 ?bit 18
Method 1 false positive set ?message 1. Search 1 ?bit 2. ? 1 = 11011 = ?(3) 8 ?(1) = 9 3. 9 7 ?(9) = 6 4. 5 ?(6) = 0 5. 6. 3 positive > ? 0 1 2 3 4 5 6 7 8 9 1 1011 0 0 0 0 0 1 0010 0 0 1 1000 ?bit 19
Method 2 set ?message 1. Number of d Hash function ?bit 2. ?1(),?2(), ,??() 8 9 7 5 3 ? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1bit 20
Method 2 set ?message 1. Add 3 ?bit 2. ?13 = 2 8 9 7 3. ?23 = 7 5 4. ?33 = 3 3 ? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1bit 21
Method 2 set ?message 1. Add 9 ?bit 2. ?19 = 6 8 9 7 3. ?29 = 14 5 4. ?39 = 12 3 ? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1bit 22
Method 2 set ?message 1. Search 3 ?bit 2. ?13 = 2 8 9 7 3. ?23 = 7 5 4. ?33 = 3 5. positive 3 ? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1bit 23
Method 2 set ?message 1. Search 4 ?bit 2. ?14 = 3 8 9 7 3. ?24 = 5, negative 5 4. ?34 = 7 3 ? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1bit 24
Method 2 false positive set ?message 1. Search 6 ?bit 2. ?16 = 14 8 9 7 3. ?26 = 3 5 4. ?36 = 2 5. positive 3 ? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1bit 25
Fraction of Error Space M: ?? ? =(?? ?) (?? ?) ? ?: ?? = Set: ? 26
Conventional Algorithm Factor Relationship Space M: ??, Array Size: ? bits cells ? + 1 bits each , Fraction of empty cell: ? ? =? ?+1 : cell Space Set 1 ? 3 ? 2 : Reject Time cell T = ? = ? ? + 1 ?+2 ? 1: ConventionalAlgorithm Space/Time trade-offs 27
Relationship ? + 2 3 ? + 2 ? 1 ? = ? log2? + 1 + log2 log2? log2 1 1 ? = ? log2? 1 log2 ? ? ? Normalized Space Measure ? = ? log2?, ? ? , ? ? 28
Example Space: 500,000 words Set: 50,000 words 30
Conditional Probability ? ?? ? Minimize ? = 1 1 1 1 ? ?? ? ? ? =? ?ln2 31