Database Systems with Linear Hashing Structures and Algorithms

csce 608 database systems n.w
1 / 78
Embed
Share

Explore the concepts of linear hashing and dynamic hashing schemes in database systems. Learn about extensible hashing, hash tables, search operations, and insertion techniques using hash functions and bucket addresses. Discover how linear hashing handles search and insertion processes efficiently in database management systems.

  • Database Systems
  • Linear Hashing
  • Hash Functions
  • Search Operations
  • Hash Tables

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 30: Algorithms for relational operations

  2. in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS

  3. Another Index Structure: Hash Tables buckets hash function h h(k) search key k A bucket is typically a disk block (probably with overflow blocks) h(k), 0 k b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory. 3

  4. Dynamic Hashing Schemes Extensible Hashing Linear Hashing 4

  5. Linear hashing (Another dynamic hashing scheme) Ideas: (a) Use the same hash function h; (b) Use only part of h when the hash table is smaller (use the i low order bits of h. b h(x) = 01110101 i grows Similar to Extensible hash (c) Hash table size n grows linearly Main difference n 00..0 (|n| = i) (d) Use overflow blocks. 5

  6. Linear hashing b h(x) = 01110101 i grows Hash table size n grows linearly (n is a parameter for the hash structure (backet n is the first unused bucket) b h(x) = h(x)i n 00..0 (|n| = i) i = |n| Where does x go if h(x)i n? Put x in h(x)i 2i-1 (< n)!! (h(x)i 2i-1 = h(x)i with the leading bit 1 replaced with 0) 6

  7. How Do We Search x ? Linear Hashing: Searching input: a search key x \\ h is the hash function, n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. read in the disk block(s) with the address m \\ you should check overflow blocks in the address m. 7

  8. How Do We Insert x ? Linear Hashing: Insertion input: a tuple t with search key x \\ h is the hash function, n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. insert t in the disk block B with the address m; \\ If B is full, you need to use an overflow block. 8

  9. How Do We Delete x ? Linear Hashing: Deletion input: a search key x \\ h is the hash function, n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. delete t in the disk block B with the address m; \\ you may need to check overflow blocks. 9

  10. Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 10

  11. Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 11

  12. Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 12

  13. Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 13

  14. Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 14

  15. Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 15

  16. Shrinking the Hash Table 16

  17. Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) 17

  18. Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 18

  19. Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 19

  20. Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 20

  21. Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 21

  22. Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 22

  23. Linear Hashing: General framework 00 00 00 01 h i x h(x) h(x)i ... i b 1* ** = n no tuples i grow linearly buckets 23

  24. Example b=4 bits, 2 keys/bucket Future growth buckets 0000 1010 0101 1111 00 01 10 24

  25. Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) future growth buckets 0000 1010 0101 1111 00 01 n =10 25

  26. Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 1010 0101 1111 00 01 n =10 26

  27. Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Search(x) If h(x)i < n, then look at bucket h(x)i If h(x)i n, then look at bucket h(x)i 2i -1 27

  28. Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Search(x) If h(x)i < n, then look at bucket h(x)i If h(x)i n, then look at bucket h(x)i 2i -1 28

  29. Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 insert 1101 (can have overflow chains!) Future growth buckets 0000 1010 0101 1111 00 01 n =10 29

  30. Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 insert 1101 (can have overflow chains!) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Insert(x) input: a tuple t with search key x \\ n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. insert t in the disk block B with the address m; \\ If B is full, use an overflow block. 30

  31. Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 insert 1101 (can have overflow chains!) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Insert(x) input: a tuple t with search key x \\ n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. insert t in the disk block B with the address m; \\ If B is full, use an overflow block. 31

  32. Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 insert 1101 1101 (can have overflow chains!) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Insert(x) input: a tuple t with search key x \\ n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. insert t in the disk block B with the address m; \\ If B is full, use an overflow block. 32

  33. Increase size n: b=4 bits, n=10, i=2 0000 1010 0101 1111 n = 10 00 01 33

  34. Increase size n: b=4 bits, n=10, i=2 0000 1010 0101 1111 n = 10 00 01 34

  35. Increase size n: b=4 bits, n=10, i=2 0000 1010 0101 1111 n = 10 00 01 35

  36. Increase size n: b=4 bits, n=11, i=2 0000 1010 0101 1111 1010 01 n = 10 00 36

  37. Increase size n: b=4 bits, n=11, i=2 Future growth buckets 0000 0101 1111 1010 10 00 01 n =11 37

  38. Increase size n: b=4 bits, n=11, i=2 insert1101 1101 Future growth buckets 0000 0101 1111 1010 00 01 n =11 10 38

  39. Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1111 1010 10 00 01 n =11 39

  40. Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1111 1010 10 00 01 n =11 40

  41. Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1111 1010 10 00 01 n =11 41

  42. Increase size n: b=4 bits, n=100, i=3 1101 0000 0101 1111 1010 1111 10 00 01 n =11 42

  43. Increase size n: b=4 bits, n=100, i=3 1101 0000 0101 1010 1111 10 00 01 11 n =100 43

  44. Increase size n: b=4 bits, n=100, i=3 0101 0000 0101 1101 1010 1111 10 00 01 11 n =100 44

  45. Linear hashing: Summary + Can handle growing files - with less wasted space - with no full reorganizations + No indirection like extensible hashing 45

  46. Linear hashing: Summary + Can handle growing files - with less wasted space - with no full reorganizations + No indirection like extensible hashing - Overflow chains 46

  47. Linear Hashing: BAD CASE Very full Very empty Need to move nhere Would waste space 47

  48. Hashing: Summary How it works Dynamic hashing - Extensible - Linear 48

  49. in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS

  50. in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager What is still missing? index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS

Related


More Related Content