Extensible Hashing in Database Systems
Explore the concept of extensible hashing in database systems through discussions on hash functions, bucket addresses, hash function research, and examples illustrating inserts, overflows, and deletions. Dive into better hash functions, expected keys per bucket, and practical scenarios to deepen your understanding.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 28: Extensible hashing
in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS
Another Index Structure: Hash Tables buckets hash function h h(k) search key k A bucket is typically a disk block (probably with overflow blocks) h(k), 0 k b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory. 3
There are better hash functions There is deep research that shows how to have a good hash function, which always exists. Read Knuth Vol. 3 if you are interested in Good hash function: Expected number of keys/bucket is roughly the same for all buckets 4
Next: example to illustrate inserts, overflows, deletes h(key) 5
EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 d 1 a c b 2 3 6
EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 h(e) = 1 0 d 1 a c b e 2 3 7
EXAMPLE: deletion DELETE: 0 a 1 b c e d 2 f g 3 8
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b c e d 2 f g 3 9
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b c e d 2 f g 3 10
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b c e d h(f) = 3 2 f g 3 11
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b c e d h(f) = 3 2 f g 3 maybe move g up 12
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b c e d h(f) = 3 2 fg 3 13
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b c e d h(f) = 3 h(c) = 1 2 fg 3 14
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b c e d h(f) = 3 d h(c) = 1 2 g f 3 15
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b c e d h(f) = 3 d h(c) = 1 2 g f 3 16
EXAMPLE: deletion DELETE: h(e) = 2 0 a 1 b d h(f) = 3 h(c) = 1 2 g 3 17
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = total # keys that fit # keys used . 18
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = total # keys that fit # keys used . If < 50%, wasting space If > 80%, overflows significantly depend on how good hash function is & on # keys/bucket 19
But: We may not be able to predict the size of the relation The size of the relation may dynamically change 20
But: We may not be able to predict the size of the relation The size of the relation may dynamically change Thus, a fixed hash table may sometimes waste space and may sometimes cause much overflows It would be nice if the hash table can dynamically change. 21
But: We may not be able to predict the size of the relation The size of the relation may dynamically change Thus, a fixed hash table may sometimes waste space and may sometimes cause much overflows It would be nice if the hash table can dynamically change. Some critical issues: (1) the cost of the change should be minimized (2) the old hash values can still be used. 22
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 23
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 24
Extensible hashing: basic ideas (1) Fix a hash function h that can handle a very large hash table, but also use h for smaller hash tables (so the hash value can be used even when the hash table is changed) 26
Extensible hashing: basic ideas (1) Fix a hash function h that can handle a very large hash table, but also use h for smaller hash tables (so the hash value can be used even when the hash table is changed) h(x) 00110101 b is large, and h can handle hash tables of size up to 2b b 27
Extensible hashing: basic ideas (1) Fix a hash function h that can handle a very large hash table, but also use h for smaller hash tables (so the hash value can be used even when the hash table is changed) h(x) 00110101 b is large, and h can handle hash tables of size up to 2b b (2) When the set is small, use only part of the value h(x). 28
Extensible hashing: basic ideas (1) Fix a hash function h that can handle a very large hash table, but also use h for smaller hash tables (so the hash value can be used even when the hash table is changed) h(x) 00110101 b is large, and h can handle hash tables of size up to 2b b use i bits when the hash table is of size 2i i (2) When the set is small, use only part of the value h(x). 29
Extensible hashing: basic ideas (1) Fix a hash function h that can handle a very large hash table, but also use h for smaller hash tables (so the hash value can be used even when the hash table is changed) h(x) 00110101 b is large, and h can handle hash tables of size up to 2b b use i bits when the hash table is of size 2i i (2) When the set is small, use only part of the value h(x). (3) If local population is even sparser, we use even fewer bits of h(x). Adjacent elements in the hash table can be merged. 30
Extensible hashing: basic ideas (1) Fix a hash function h that can handle a very large hash table, but also use h for smaller hash tables (so the hash value can be used even when the hash table is changed) h(x) 00110101 b is large, and h can handle hash tables of size up to 2b b use i bits when the hash table is of size 2i i (2) When the set is small, use only part of the value h(x). (3) If local population is even sparser, we use even fewer bits of h(x). Adjacent elements in the hash table can be merged. (4) Use a directory. h(x)i to buckets 2i 31
Extensible Hashing: General framework # bits used by the buckets (block index) i # bits used by the directory (hash index) j1 00 00 00 01 j2 h i h(x) h(x)i x j2 i ... 11 11 i directory buckets 32
A more concrete example h(x) = a1 a2 a3 a4 b = 4, i = 3 2 0000 0010 0000 000 0001 0010 001 2 0111 0011 010 3 011 1000 1001 100 3 1010 101 110 2 1100 1110 111 1111 33
Constructing an Extensible Hashing for a Given Set S Input: A set S of elements, hash function h into b bits (b is large), 1. make S a group G0 with group index 0; \\ each group Gris associated with a unique string sr of r bits such that all \\ elements in Gr have their hash values with the first r bits equal to sr 2. WHILE there is a group Grthat cannot fit in a single block DO split Gr into two groups Gr+1and Gr+1 using the (r+1)st bit in the hash values of the elements; 3. FOR each group Gr DO put all elements in Gr into a block, with a block index r; 4. the largest block index is the hashing index i; i (1) (2) i 5. make a directory H[00 0 .. 11 1] of size 2i with H[k] pointing to the block for group Gr, if the first r bits of k is equal to the group string sr. 34
Constructing an Extensible Hashing for a Given Set S Input: A set S of elements, hash function h into b bits (b is large), 1. make S a group G0 with group index 0; \\ each group Gris associated with a unique string sr of r bits such that all \\ elements in Gr have their hash values with the first r bits equal to sr 2. WHILE there is a group Grthat cannot fit in a single block DO split Gr into two groups Gr+1and Gr+1 using the (r+1)st bit in the hash values of the elements; 3. FOR each group Gr DO put all elements in Gr into a block, with a block index r; 4. the largest block index is the hashing index i; i (1) (2) i 5. make a directory H[00 0 .. 11 1] of size 2i with H[k] pointing to the block for group Gr, if the first r bits of k is equal to the group string sr. 35
Constructing an Extensible Hashing for a Given Set S Input: A set S of elements, hash function h into b bits (b is large), 1. make S a group G0 with group index 0; \\ each group Gris associated with a unique string sr of r bits such that all \\ elements in Gr have their hash values with the first r bits equal to sr 2. WHILE there is a group Grthat cannot fit in a single block DO split Gr into two groups Gr+1and Gr+1 using the (r+1)st bit in the hash values of the elements; 3. FOR each group Gr DO put all elements in Gr into a block, with a block index r; 4. the largest block index is the hashing index i; i (1) (2) i 5. make a directory H[00 0 .. 11 1] of size 2i with H[k] pointing to the block for group Gr, if the first r bits of k is equal to the group string sr. 36
Constructing an Extensible Hashing for a Given Set S Input: A set S of elements, hash function h into b bits (b is large), 1. make S a group G0 with group index 0; \\ each group Gris associated with a unique string sr of r bits such that all \\ elements in Gr have their hash values with the first r bits equal to sr 2. WHILE there is a group Grthat cannot fit in a single block DO split Gr into two groups Gr+1and Gr+1 using the (r+1)st bit in the hash values of the elements; 3. FOR each group Gr DO put all elements in Gr into a block, with a block index r; 4. the largest block index is the hashing index i; i (1) (2) i 5. make a directory H[00 0 .. 11 1] of size 2i with H[k] pointing to the block for group Gr, if the first r bits of k is equal to the group string sr. 37
Constructing an Extensible Hashing for a Given Set S Input: A set S of elements, hash function h into b bits (b is large), 1. make S a group G0 with group index 0; \\ each group Gris associated with a unique string sr of r bits such that all \\ elements in Gr have their hash values with the first r bits equal to sr 2. WHILE there is a group Grthat cannot fit in a single block DO split Gr into two groups Gr+1and Gr+1 using the (r+1)st bit in the hash values of the elements; 3. FOR each group Gr DO put all elements in Gr into a block, with a block index r; 4. the largest block index is the hashing index i; i (1) (2) i 5. make a directory H[00 0 .. 11 1] of size 2i with H[k] pointing to the block for group Gr, if the first r bits of k is equal to the group string sr. 38
Constructing an Extensible Hashing for a Given Set S Input: A set S of elements, hash function h into b bits (b is large), 1. make S a group G0 with group index 0; \\ each group Gris associated with a unique string sr of r bits such that all \\ elements in Gr have their hash values with the first r bits equal to sr 2. WHILE there is a group Grthat cannot fit in a single block DO split Gr into two groups Gr+1and Gr+1 using the (r+1)st bit in the hash values of the elements; 3. FOR each group Gr DO put all elements in Gr into a block, with a block index r; 4. the largest block index is the hashing index i; i (1) (2) i 5. make a directory H[00 0 .. 11 1] of size 2i with H[k] pointing to the block for group Gr, if the first r bits of k is equal to the group string sr. 39
Constructing an Extensible Hashing for a Given Set S Input: A set S of elements, hash function h into b bits (b is large), 1. make S a group G0 with group index 0; \\ each group Gris associated with a unique string sr of r bits such that all \\ elements in Gr have their hash values with the first r bits equal to sr 2. WHILE there is a group Grthat cannot fit in a single block DO split Gr into two groups Gr+1and Gr+1 using the (r+1)st bit in the hash values of the elements; 3. FOR each group Gr DO put all elements in Gr into a block, with a block index r; 4. the largest block index is the hashing index i; i (1) (2) i 5. make a directory H[00 0 .. 11 1] of size 2i with H[k] pointing to the block for group Gr, if the first r bits of k is equal to the group string sr. 40
Constructing an Extensible Hashing for a Given Set S Input: A set S of elements, hash function h into b bits (b is large), 1. make S a group G0 with group index 0; \\ each group Gris associated with a unique string sr of r bits such that all \\ elements in Gr have their hash values with the first r bits equal to sr 2. WHILE there is a group Grthat cannot fit in a single block DO split Gr into two groups Gr+1and Gr+1 using the (r+1)st bit in the hash values of the elements; 3. FOR each group Gr DO put all elements in Gr into a block, with a block index r; 4. the largest block index is the hashing index i; i (1) (2) i 5. make a directory H[00 0 .. 11 1] of size 2i with H[k] pointing to the block for group Gr, if the first r bits of k is equal to the group string sr. 41
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 0 00000 00101 00110 01000 01001 01010 01011 01100 01101 01110 01111 10000 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11111 42
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 1 00000 00101 00110 01000 01001 01010 01011 01100 01101 01110 01111 1 10000 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11111 43
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 2 00000 00101 00110 2 01000 01001 01010 01011 01100 01101 01110 2 01111 10000 10010 10011 10100 10101 10110 10111 2 11000 11001 11010 11011 11100 11101 11111 44
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 2 00000 00101 00110 2 01000 01001 01010 01011 01100 01101 01110 2 01111 10000 10010 10011 10100 10101 10110 10111 2 11000 11001 11010 11011 11100 11101 11111 45
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 2 00000 00101 00110 3 01000 01001 01010 01011 3 01100 01101 01110 01111 2 10000 10010 10011 10100 10101 10110 10111 2 11000 11001 11010 11011 11100 11101 11111 46
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 2 00000 00101 00110 3 01000 01001 01010 01011 3 01100 01101 01110 01111 2 10000 10010 10011 10100 10101 10110 10111 2 11000 11001 11010 11011 11100 11101 11111 47
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 2 00000 00101 00110 3 01000 01001 01010 01011 3 01100 01101 01110 01111 3 10000 10010 10011 3 10100 10101 10110 10111 3 11000 11001 11010 3 11011 11100 11101 11111 48
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 2 00000 00101 00110 3 01000 01001 01010 01011 3 01100 01101 01110 01111 3 10000 10010 10011 3 10100 10101 10110 10111 3 11000 11001 11010 3 11011 11100 11101 11111 49
An Example (b = 5, h(x) = a1 a2 a3 a4 a5, each block holds 3 tuples) 2 00000 00101 00110 3 01000 01001 01010 01011 3 01100 01101 01110 01111 3 10000 10010 10011 4 10100 4 10101 10110 10111 3 11000 11001 11010 3 11011 11100 11101 11111 50