
Database Systems with Linear Hashing Structures and Algorithms
Explore the concepts of linear hashing and dynamic hashing schemes in database systems. Learn about extensible hashing, hash tables, search operations, and insertion techniques using hash functions and bucket addresses. Discover how linear hashing handles search and insertion processes efficiently in database management systems.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 30: Algorithms for relational operations
in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS
Another Index Structure: Hash Tables buckets hash function h h(k) search key k A bucket is typically a disk block (probably with overflow blocks) h(k), 0 k b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory. 3
Dynamic Hashing Schemes Extensible Hashing Linear Hashing 4
Linear hashing (Another dynamic hashing scheme) Ideas: (a) Use the same hash function h; (b) Use only part of h when the hash table is smaller (use the i low order bits of h. b h(x) = 01110101 i grows Similar to Extensible hash (c) Hash table size n grows linearly Main difference n 00..0 (|n| = i) (d) Use overflow blocks. 5
Linear hashing b h(x) = 01110101 i grows Hash table size n grows linearly (n is a parameter for the hash structure (backet n is the first unused bucket) b h(x) = h(x)i n 00..0 (|n| = i) i = |n| Where does x go if h(x)i n? Put x in h(x)i 2i-1 (< n)!! (h(x)i 2i-1 = h(x)i with the leading bit 1 replaced with 0) 6
How Do We Search x ? Linear Hashing: Searching input: a search key x \\ h is the hash function, n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. read in the disk block(s) with the address m \\ you should check overflow blocks in the address m. 7
How Do We Insert x ? Linear Hashing: Insertion input: a tuple t with search key x \\ h is the hash function, n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. insert t in the disk block B with the address m; \\ If B is full, you need to use an overflow block. 8
How Do We Delete x ? Linear Hashing: Deletion input: a search key x \\ h is the hash function, n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. delete t in the disk block B with the address m; \\ you may need to check overflow blocks. 9
Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 10
Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 11
Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 12
Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 13
Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 14
Expanding the Hash Table When Do We Expand the Hash Table? needed space available space R = Keep track of: If R > threshold (e.g., 80%) then increase n Linear Hashing: Increasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. read in the disk block(s) B of address n 2i -1 ; 2. split (properly) the tuples in B and put them in the block B and the block B with address n; 3. n = n + 1; 15
Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) 17
Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 18
Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 19
Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 20
Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 21
Shrinking the Hash Table When? When R is smaller than a threshold (e.g., 50%) Linear Hashing: Decreasing Hash Table Size input: the current upper bound n \\ h is the hash function, i = |n| 1. 2. n = n 1; move the tuples in the block(s) of address n to the block(s) of address n 2i -1 (here i is the length of the new n). 22
Linear Hashing: General framework 00 00 00 01 h i x h(x) h(x)i ... i b 1* ** = n no tuples i grow linearly buckets 23
Example b=4 bits, 2 keys/bucket Future growth buckets 0000 1010 0101 1111 00 01 10 24
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) future growth buckets 0000 1010 0101 1111 00 01 n =10 25
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 1010 0101 1111 00 01 n =10 26
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Search(x) If h(x)i < n, then look at bucket h(x)i If h(x)i n, then look at bucket h(x)i 2i -1 27
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used buckets) i = |n| = 2 (# used bits) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Search(x) If h(x)i < n, then look at bucket h(x)i If h(x)i n, then look at bucket h(x)i 2i -1 28
Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 insert 1101 (can have overflow chains!) Future growth buckets 0000 1010 0101 1111 00 01 n =10 29
Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 insert 1101 (can have overflow chains!) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Insert(x) input: a tuple t with search key x \\ n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. insert t in the disk block B with the address m; \\ If B is full, use an overflow block. 30
Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 insert 1101 (can have overflow chains!) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Insert(x) input: a tuple t with search key x \\ n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. insert t in the disk block B with the address m; \\ If B is full, use an overflow block. 31
Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 insert 1101 1101 (can have overflow chains!) Future growth buckets 0000 1010 0101 1111 00 01 n =10 Insert(x) input: a tuple t with search key x \\ n is the current upper bound, i = |n| 1. m = the last i bits of h(x); 2. IF m n THEN m = m 2i-1; 3. insert t in the disk block B with the address m; \\ If B is full, use an overflow block. 32
Increase size n: b=4 bits, n=10, i=2 0000 1010 0101 1111 n = 10 00 01 33
Increase size n: b=4 bits, n=10, i=2 0000 1010 0101 1111 n = 10 00 01 34
Increase size n: b=4 bits, n=10, i=2 0000 1010 0101 1111 n = 10 00 01 35
Increase size n: b=4 bits, n=11, i=2 0000 1010 0101 1111 1010 01 n = 10 00 36
Increase size n: b=4 bits, n=11, i=2 Future growth buckets 0000 0101 1111 1010 10 00 01 n =11 37
Increase size n: b=4 bits, n=11, i=2 insert1101 1101 Future growth buckets 0000 0101 1111 1010 00 01 n =11 10 38
Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1111 1010 10 00 01 n =11 39
Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1111 1010 10 00 01 n =11 40
Increase size n: b=4 bits, n=11, i=2 1101 0000 0101 1111 1010 10 00 01 n =11 41
Increase size n: b=4 bits, n=100, i=3 1101 0000 0101 1111 1010 1111 10 00 01 n =11 42
Increase size n: b=4 bits, n=100, i=3 1101 0000 0101 1010 1111 10 00 01 11 n =100 43
Increase size n: b=4 bits, n=100, i=3 0101 0000 0101 1101 1010 1111 10 00 01 11 n =100 44
Linear hashing: Summary + Can handle growing files - with less wasted space - with no full reorganizations + No indirection like extensible hashing 45
Linear hashing: Summary + Can handle growing files - with less wasted space - with no full reorganizations + No indirection like extensible hashing - Overflow chains 46
Linear Hashing: BAD CASE Very full Very empty Need to move nhere Would waste space 47
Hashing: Summary How it works Dynamic hashing - Extensible - Linear 48
in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS
in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager What is still missing? index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS