Extensible Hashing in Database Systems: Techniques and Implementations

csce 608 database systems n.w
1 / 66
Embed
Share

Learn about Extensible Hashing, a dynamic hashing technique used in database systems to handle growth, overflows, and reorganizations efficiently. Explore the general framework, searching process, insertion operations, and how it copes with increasing data volumes.

  • Extensible Hashing
  • Database Systems
  • Dynamic Hashing
  • Index Structure
  • Growth Handling

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 29: Extensible hashing: deletion

  2. in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS

  3. Another Index Structure: Hash Tables buckets hash function h h(k) search key k A bucket is typically a disk block (probably with overflow blocks) h(k), 0 k b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory. 3

  4. How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 4

  5. Extensible Hashing: General framework # bits used by the buckets (block index) i # bits used by the directory (hash index) j1 00 00 00 01 j2 h i h(x) h(x)i x j2 i ... 11 11 i directory buckets 5

  6. Extensible hashing: Searching input: a search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. read in the disk block B with the address H[m]. 6

  7. Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF B has room THEN add t in B 4. ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k]and H[2k+1] both equal to that in the old H[k], 0 k 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b h(x) m i b h(x) mj i 7

  8. Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF B has room THEN add t in B 4. ELSE let j be the block index of B IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k]and H[2k+1] both equal to that in the old H[k], 0 k 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b h(x) m i 8

  9. Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF B has room THEN add t in B 4. ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k]and H[2k+1] both equal to that in the old H[k], 0 k 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b h(x) m i 9

  10. Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF B has room THEN add t in B 4. ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k]and H[2k+1] both equal to that in the old H[k], 0 k 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b h(x) m i i > j b h(x) mj i 10

  11. Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF B has room THEN add t in B 4. ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k]and H[2k+1] both equal to that in the old H[k], 0 k 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b h(x) m i i > j b h(x) mj i 11

  12. Extensible hashing: Insertion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF B has room THEN add t in B 4. ELSE let j be the block index of B \\ B has no room for t IF i = j THEN {double the size of H to 2i+1, i = i + 1; let the pointers in the new H[2k]and H[2k+1] both equal to that in the old H[k], 0 k 2i-1} split B U{t} into B0 and B1 (with block index j+1) using the j+1st bit, let H[mj0**] point to B0 and H[mj1**] point to B1. b h(x) m i i > j b h(x) mj i 12

  13. Insertion in Extensible Hashing 13

  14. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 h(k) = 1100 i =1 0 Insert: 1 1 c1001 k1100 14

  15. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 h(k) = 1100 i =1 0 Insert: g1010 1 1 c1001 k1100 15

  16. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 h(k) = 1100 i =1 0 Insert: g1010 1 1 c1001 k1100 g1010 16

  17. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 h(k) = 1100 i =1 0 Insert: g1010 1 1 c1001 k1100 g1010 Split the block, and increase the block index 17

  18. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 00 i =1 0 01 Insert: g1010 1 10 1 c1001 k1100 11 g1010 Split the block, and increase the block index if the block index is equal to the hash index, first double the directory size 18

  19. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 00 i =1 0 01 Insert: g1010 1 10 2 1 c1001 k1100 11 g1010 Split the block, and increase the block index 2 if the block index is equal to the hash index, first double the directory size 19

  20. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 00 i =1 0 01 Insert: g1010 1 10 2 1 c1001 k1100 11 g1010 Split the block, and increase the block index 2 if the block index is equal to the hash index, first double the directory size 20

  21. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 00 i =1 0 01 Insert: g1010 1 10 2 1 c1001 k1100 11 g1010 Split the block, and increase the block index 2 if the block index is equal to the hash index, first double the directory size 21

  22. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 00 i =1 0 01 Insert: g1010 1 10 2 1 c1001 k1100 11 g1010 Split the block, and increase the block index 2 if the block index is equal to the hash index, first double the directory size k1100 22

  23. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 00 01 Insert: g1010 10 2 c1001 g1010 11 2 k1100 23

  24. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 2 k1100 24

  25. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 d0111 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 2 k1100 25

  26. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 d0111 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 2 k1100 26

  27. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 i = 2 h(k) = 1100 d0111 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 e0000 2 k1100 27

  28. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 e0000 i = 2 h(k) = 1100 d0111 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 e0000 2 k1100 28

  29. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 Split the block, and increase the block index h(d) = 0111 h(e) = 0000 h(g) = 1010 1 b0001 e0000 i = 2 h(k) = 1100 d0111 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 e0000 2 k1100 29

  30. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 Split the block, and increase the block index 2 h(d) = 0111 h(e) = 0000 h(g) = 1010 2 1 b0001 e0000 i = 2 h(k) = 1100 d0111 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 e0000 2 k1100 30

  31. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 Split the block, and increase the block index 2 e0000 b0001 h(d) = 0111 h(e) = 0000 h(g) = 1010 2 1 b0001 d0111 i = 2 h(k) = 1100 d0111 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 e0000 2 k1100 31

  32. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 2 e0000 b0001 h(d) = 0111 h(e) = 0000 h(g) = 1010 2 d0111 i = 2 h(k) = 1100 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 e0000 2 k1100 32

  33. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 2 e0000 b0001 h(d) = 0111 h(e) = 0000 h(g) = 1010 2 d0111 i = 2 h(k) = 1100 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 e0000 a1000 2 k1100 33

  34. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 2 e0000 b0001 h(d) = 0111 h(e) = 0000 h(g) = 1010 2 d0111 i = 2 h(k) = 1100 00 01 Insert: g1010 10 2 d0111 c1001 g1010 11 a1000 e0000 a1000 2 k1100 Split the block, and increase the block index 34

  35. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 i = 3 2 e0000 b0001 h(d) = 0111 000 h(e) = 0000 h(g) = 1010 001 2 d0111 i = 2 h(k) = 1100 010 00 if the block index is equal to the hash index, first double the directory size 01 011 Insert: g1010 10 100 2 d0111 c1001 g1010 11 101 a1000 e0000 110 a1000 2 k1100 Split the block, and increase the block index 111 35

  36. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 i = 3 2 e0000 b0001 h(d) = 0111 000 h(e) = 0000 h(g) = 1010 001 2 d0111 i = 2 h(k) = 1100 010 00 3 01 011 Insert: g1010 10 100 3 2 d0111 c1001 g1010 11 101 a1000 e0000 110 a1000 2 k1100 Split the block, and increase the block index 111 36

  37. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 i = 3 2 e0000 b0001 h(d) = 0111 000 h(e) = 0000 h(g) = 1010 001 2 d0111 i = 2 h(k) = 1100 010 00 3 01 011 a1000 c1001 Insert: g1010 10 100 3 2 d0111 c1001 g1010 g1010 11 101 e0000 110 a1000 2 k1100 111 37

  38. h(a) = 1000 Insertion in Extensible Hashing h(b) = 0001 h(c) = 1001 i = 3 2 e0000 b0001 h(d) = 0111 000 h(e) = 0000 h(g) = 1010 001 2 d0111 i = 2 h(k) = 1100 010 00 3 01 011 a1000 c1001 Insert: g1010 10 100 3 d0111 g1010 11 101 e0000 110 a1000 2 k1100 111 38

  39. Extensible hashing: Deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 39

  40. Extensible hashing: Deletion No merging of blocks Simple, just search then delete Merge blocks and cut directory if possible (Reverse insert procedure) 40

  41. Extensible hashing: Deletion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF t is in B THEN delete t from B ELSE return ( tnot found ); 4. let j be the block index of B; 5. WHILE B has a buddy block B mergeable with B DO move all tuples in B to B; free block B ; set the block index of B to j = j 1 ; mj-1 = the first j-1 bits of mj ; Let all H[mj-1**] point to B ; 6. IF the largest block index jmax is smaller than i THEN construct a new directory H0of size 2jmax ; let the hash index i = jmax ; FOR each string m0of jmax bits DO let H0[m0] = any H[m0**]; 41

  42. Extensible hashing: Deletion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF t is in B THEN delete t from B ELSE return ( tnot found ); 4. let j be the block index of B; 5. WHILE B has a buddy block B mergeable with B DO move all tuples in B to B; free block B ; set the block index of B to j = j 1 ; mj-1 = the first j-1 bits of mj ; Let all H[mj-1**] point to B ; 6. IF the largest block index jmax is smaller than i THEN construct a new directory H0of size 2jmax ; let the hash index i = jmax ; FOR each string m0of jmax bits DO let H0[m0] = any H[m0**]; b h(x) m i 42

  43. Extensible hashing: Deletion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF t is in B THEN delete t from B ELSE return ( tnot found ); 4. let j be the block index of B; 5. WHILE B has a buddy block B mergeable with B DO move all tuples in B to B; free block B ; set the block index of B to j = j 1 ; mj-1 = the first j-1 bits of mj ; Let all H[mj-1**] point to B ; 6. IF the largest block index jmax is smaller than i THEN construct a new directory H0of size 2jmax ; let the hash index i = jmax ; FOR each string m0of jmax bits DO let H0[m0] = any H[m0**]; b h(x) m i b h(x) mj i 43

  44. Extensible hashing: Deletion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF t is in B THEN delete t from B ELSE return ( tnot found ); 4. let j be the block index of B; 5. WHILE B has a buddy block B mergeable with B DO move all tuples in B to B; free block B ; set the block index of B to j = j 1 ; mj-1 = the first j-1 bits of mj ; Let all H[mj-1**] point to B ; 6. IF the largest block index jmax is smaller than i THEN construct a new directory H0of size 2jmax ; let the hash index i = jmax ; FOR each string m0of jmax bits DO let H0[m0] = any H[m0**]; b h(x) m i b h(x) mj i 44

  45. Extensible hashing: Deletion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF t is in B THEN delete t from B ELSE return ( tnot found ); 4. let j be the block index of B; 5. WHILE B has a buddy block B mergeable with B DO move all tuples in B to B; free block B ; set the block index of B to j = j 1 ; mj-1 = the first j-1 bits of mj ; Let all H[mj-1**] point to B ; 6. IF the largest block index jmax is smaller than i THEN construct a new directory H0of size 2jmax ; let the hash index i = jmax ; FOR each string m0of jmax bits DO let H0[m0] = any H[m0**]; b h(x) m i b h(x) mj i Suppose the first j bits of h(x) is mj. Let mj be mj with the last bit flipped. The buddy block B of B is the block pointed by any H[mj **] whose block index is j. Two blocks are mergeable if they can fit into a single block. 45

  46. Extensible hashing: Deletion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF t is in B THEN delete t from B ELSE return ( tnot found ); 4. let j be the block index of B; 5. WHILE B has a buddy block B mergeable with B DO move all tuples in B to B; free block B ; set the block index of B to j = j 1 ; mj-1 = the first j-1 bits of mj ; Let all H[mj-1**] point to B ; 6. IF the largest block index jmax is smaller than i THEN construct a new directory H0of size 2jmax ; let the hash index i = jmax ; FOR each string m0of jmax bits DO let H0[m0] = any H[m0**]; b h(x) m i b h(x) mj i Suppose the first j bits of h(x) is mj. Let mj be mj with the last bit flipped. The buddy block B of B is the block pointed by any H[mj **] whose block index is j. Two blocks are mergeable if they can fit into a single block. 46

  47. Extensible hashing: Deletion input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF t is in B THEN delete t from B ELSE return ( tnot found ); 4. let j be the block index of B; 5. WHILE B has a buddy block B mergeable with B DO move all tuples in B to B; free block B ; set the block index of B to j = j 1 ; mj-1 = the first j-1 bits of mj ; Let all H[mj-1**] point to B ; 6. IF the largest block index jmax is smaller than i THEN construct a new directory H0of size 2jmax ; let the hash index i = jmax ; FOR each string m0of jmax bits DO let H0[m0] = any H[m0**]; b h(x) m i b h(x) mj i Suppose the first j bits of h(x) is mj. Let mj be mj with the last bit flipped. The buddy block B of B is the block pointed by any H[mj **] whose block index is j. Two blocks are mergeable if they can fit into a single block. 47

  48. 1. To find the buddy block B,we need a disk read; 2. The result of merging B and B can be placed in main memory temporarily; Extensible hashing: Deletion so only when B has no mergeable buddy block, we write B back to disk; 3. In the worst case, we may start with j = i, and go all the way up until j = 0; 4. jmax can be computed by looking at the values of H[**]; 5. The total # of disk I/O s = i + 2 (where 2 is for read/write the block B). input: a tuple t with search key x \\ h is the hash function, H is the directory, i is the current hash index. 1. m = the first i bits of h(x); 2. let the block with the address H[m] be B; 3. IF t is in B THEN delete t from B ELSE return ( tnot found ); 4. let j be the block index of B; 5. WHILE B has a buddy block B mergeable with B DO move all tuples in B to B; free block B ; set the block index of B to j = j 1 ; mj-1 = the first j-1 bits of mj ; Let all H[mj-1**] point to B ; 6. IF the largest block index jmax is smaller than i THEN construct a new directory H0of size 2jmax ; let the hash index i = jmax ; FOR each string m0of jmax bits DO let H0[m0] = any H[m0**]; b h(x) m i b h(x) mj i Suppose the first j bits of h(x) is mj. Let mj be mj with the last bit flipped. The buddy block B of B is the block pointed by any H[mj **] whose block index is j. Two blocks are mergeable if they can fit into a single block. 48

  49. Note: Still need overflow chains Example: many records with duplicate keys insert 1100 1 1101 1100 49

  50. Note: Still need overflow chains Example: many records with duplicate keys if we split: 2 insert 1100 For 10** 1 1101 1100 2 1101 1100 For 11** ? 1100 50

Related


More Related Content