
Efficient Techniques for Searching and Managing Tables in Computer Science
Learn about different methods such as sequential search, binary search, and hashing for efficient table searching. Understand the concepts of hash functions, collisions, and handling sparse tables with hash tables, separate chaining, and buckets. Dive into practical applications like managing files on disk and implementing hash tables with separate chaining.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records Sometimes the key is the whole record Searching a table Given a key k and a table T = (k1,i1), , (kn,in), find the pair (kj,ij) in T such that k=kj(if it exists) Possible approaches: Sequential search (O(n)): simple, but only effective for small tables Binary search (O(log n)): fast, but table must be sorted Hashing (O(1)) COSC 2P03 Week 11 1
Hash Tables Idea: use a function h such that for every possible key k, h(k) = index of record with key k: O(1) search time Hash Function: maps keys to addresses Build the table using the hash function: if h(k)=1 then put record (k,i) in cell 1 of table (etc) Tables are generally sparse Hash functions should ideally be: 1. Easy to compute, and 2. Ensure different keys are always mapped to different cells Perfect hash functions are not always possible COSC 2P03 Week 11 2
Hashing and Collisions Collision: the effect of more than one key being mapped to the same cell Given kx ky, we have f(kx) = f(ky) Ideally collisions would never happen (this is not realistic) Approaches to dealing with collisions: 1. Allow >1 record to be stored in each table index Buckets: each index is a fixed-size bucket of records Separate chaining: each index has a linked list of records 2. Open addressing: allow only 1 record at each index When a collision occurs, use a collision resolution policy to find a new index for the item, e.g. linear probing etc. COSC 2P03 Week 11 3
Hash tables with buckets Generally used for storing files on disk Each index has a bucket of fixed size (block) Within the bucket, records are in unsorted order To insert record (k,i): Apply hash function h(k) to determine in which bucket (k,i) belongs and add to next empty space in bucket To search for record (k,i): Compute h(k) and read corresponding block from disk Perform linear search of block to find record with key k COSC 2P03 Week 11 4
Hash tables separate chaining Each cell of the table is a separate linked list To insert record (k,i): Apply hash function h(k) to determine in which linked list the record belongs and add to front of list To search for record (k,i): Compute h(k) and access corresponding linked list Perform linear search of linked list to find record with key k COSC 2P03 Week 11 5
Hash table with Open Addressing findPos Linear Probing If a record is hashed to index j, which is already occupied, then look in index j+1, j+2, and put the record in the next available index (each attempt is called a probe) int findPos(int k) // search for index that should store // record with key k { current = hash(k, tableSize); while(array[current] != null && array[current].record.key != k) { current = (current+1) % tableSize; } return current; } COSC 2P03 Week 11 6
Hash Table search int find(int k) // search for record with key k { current = findPos(k); if(isActive(current)) return current; else return -1; } // not found COSC 2P03 Week 11 7
Hash Table insertion void insert(record R) // insert R if not already in table { current = findPos(R.key); if(isActive(current)) // already in hash table return; else { array[current].record = R; array[current].isActive = true; } } COSC 2P03 Week 11 8
Hash table deletion void remove(int k) // delete record with key k { current = findPos(R.key); if(isActive(currentPos)) array[current].isActive = false; // lazy deletion } COSC 2P03 Week 11 9
Open addressing collision resolution policies Linear probing: If a record is hashed to index j, which is already occupied, then look in index j+1, j+2, and put the record in the next available index Clusters can form when items are hashed to the same address Anything hashed to any index within the cluster makes the cluster bigger the bigger it gets, the faster it grows Sparse tables reduce the problem but it still exists This is called primary clustering. COSC 2P03 Week 11 10
Open addressing collision resolution policies Quadratic probing: Attempts to avoid clustering problem by checking indices that are further apart If a record is hashed to index j, which is already occupied, then look in index j+1, j+4, j+9, , and put the record in the next available index. Items hashed to the same index will all check the same sequence of indices (secondary clustering) COSC 2P03 Week 11 11
Open addressing collision resolution policies Double hashing: Attempts to avoid both primary and secondary clustering by using a second hashing function to determine which other indices should be tried after a collision Uses 2 hash functions, h1(k) h2(k) If h1(k) hashes a record to index j, which is already occupied, then use h2(k) as a step size for subsequent probes COSC 2P03 Week 11 12