
Generic Libraries and Hash Dictionaries in Imperative Computation
Learn about generic libraries and hash dictionaries in imperative computation, including discussions on binary search trees, written assignments, programming deadlines, and upcoming exams. Explore the implementation and client interface, along with key functions for hash dictionaries. The lecture delves into structuring chains, keys, insertions, lookups, and new hash dictionaries.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
15-122: Principles of Imperative Computation Lecture 16: Binary Search Trees March 13, 2023
Today Last lecture: Make hash dictionaries generic using void* and function pointers Today s lecture: Recap: Generic libraries Binary Search Trees (BSTs) Announcements: Written assignments are now due every Monday Programming assignments are now due every Thursday Midterm 2 is on Thursday, March 30 1
The Hash Dictionary Library for (chain* p = H->table[i]; p != NULL; p = p->next) if (key_equiv(entry_key(p->data), k)) return p->data; return NULL; } // Implementation-side types typedef struct chain_node chain; struct chain_node { entry data; chain* next; }; struct hdict_header { int size; int capacity; chain*[] table; }; typedef struct hdict_header hdict; Implementation Client Interface // typedef ______* entry; // typedef ______ key; // data != NULL void hdict_insert(hdict* H, entry e) //@requires is_hdict(H) && e != NULL; //@ensures hdict_lookup(H, entry_key(e)) == e; //@ensures is_hdict(H); { key k = entry_key(e); int i = index_of_key(H, k); for (chain* p = H->table[i]; p != NULL; p = p->next) { if (key_equiv(entry_key(p->data), k)) { p->data = e; return; } } chain* p = alloc(chain); p->data = e; p->next = H->table[i]; H->table[i] = p; (H->size)++; } key entry_key(entry e) /*@requires e != NULL; @*/ ; // size >= 0 // capacity > 0 // \length(table) == capacity int key_hash(key k); bool key_equiv(key k1, key k2); // Representation invariant bool is_hdict (hdict* H) { return H != NULL && H->size >= 0 && H->capacity > 0 && is_array_expected_length(H->table, H->capacity) && is_valid_hashtable(H); } Library Interface // typedef ______* hdict_t; hdict_t hdict_new(int capacity) /*@requires capacity > 0; /*@ensures \result != NULL; @*/ @*/ ; // Implementation of interface functions int index_of_key(hdict* H, key k) //@requires is_hdict(H); //@ensures 0 <= \result && \result < H->capacity; { return abs(key_hash(k) % H->capacity); } entry hdict_lookup(hdict_t D, key k) /*@requires D != NULL; /*@ensures \result != NULL || key_equiv(entry_key(\result), k); hdict* hdict_new(int capacity) //@requires capacity > 0; //@ensures is_hdict(\result); { hdict* H = alloc(hdict); H->size = 0; H->capacity = capacity; H->table = alloc_array(chain*, capacity); return H; } @*/ @*/ ; entry hdict_lookup(hdict* H, key k) //@requires is_hdict(H); //@ensures \result == NULL || key_equiv(entry_key(\result), k); { int i = index_of_key(H, k); void hdict_insert(hdict_t D, entry e) /*@requires D != NULL && e != NULL; /*@ensures hdict_lookup(D, entry_key(e)) == e; @*/ ; @*/ // Client type typedef hdict* hdict_t; How What 3
Is this Library Generic? Generic libraries o A single implementation that Allows clients to choose the types of their data Allows multiple instances of the data structure with different data types in the same application It is semi-generic: o It allows clients to choose the types of their data o It doesn t allow multiple instances of the data structure with different data types in the same application Towards fully-generic: o Use void* in the client interface o Upgrade the client definitions and client application 4
Client Interface typedef void* entry; typedef void* key; Upgrading the Client Definitions key entry_key(entry e) /*@requires e != NULL; @*/ ; int key_hash(key k); bool key_equiv(key k1, key k2); // What the client wants to store struct inventory_item { string fruit; // key int quantity; }; // What the client wants struct inventory_item { string fruit; // key int quantity; }; /******* Fulfilling the library interface *******/ key entry_key(entry e) //@requires e != NULL && \hastag(struct inventory_item*, e); //@ensures \result != NULL && \hastag(string*, \result); { struct inventory_item* E = (struct inventory_item*)e; string* K = to_string_ptr(E->fruit); return (key)K; } /******* Fulfilling the library interface *******/ typedef struct inventory_item* entry; typedef string key; key entry_key(entry e) //@requires e != NULL; { return e->fruit; } bool key_equiv(key k1, key k2) //@requires k1 != NULL && \hastag(string*, k1); //@requires k2 != NULL && \hastag(string*, k2); { return string_equal(*(string*)k1, *(string*)k2); } bool key_equiv(key k1, key k2) { return string_equal(k1, k2); } int key_hash(key k) //@requires k != NULL && \hastag(string*, k); { return lcg_hash_string(*(string*)k); } int key_hash(key k) { return lcg_hash_string(k); } 5
Upgrading the Client Application Cast entries and keys before calling the library operations Turn values of type string to string* prior to using them as keys int main () { struct inventory_item* A = make_inventory_item("apple", 20); struct inventory_item* B = make_inventory_item("banana", 10); struct inventory_item* C = make_inventory_item("pumpkin", 50); struct inventory_item* D = make_inventory_item("banana", 20); int main () { struct inventory_item* A = make_inventory_item("apple", 20); struct inventory_item* B = make_inventory_item("banana", 10); struct inventory_item* C = make_inventory_item("pumpkin", 50); struct inventory_item* D = make_inventory_item("banana", 20); hdict_t H = hdict_new(10); hdict_insert(H, A); hdict_insert(H, B); hdict_insert(H, C); assert(hdict_lookup(H, "apple") != NULL); assert(hdict_lookup(H, "lime") == NULL); hdict_insert(H, D); hdict_t H = hdict_new(10); hdict_insert(H, (entry)A); hdict_insert(H, (entry)B); hdict_insert(H, (entry)C); assert(hdict_lookup(H, (key)to_string_ptr("apple")) != NULL); assert(hdict_lookup(H, (key)to_string_ptr("lime")) == NULL); hdict_insert(H, (entry)D); return 0; } return 0; } 6
Generic Hash Dictionaries Let s try it on our examples Linux Terminal # cc0 -dx hdict.c1 produce.c1 produce-main.c1 All produce tests passed! 0 # cc0 -dx hdict.c1 lib/*.c0 words.c1 words-main.c1 All word count tests passed! 0 o We can now put the library before all client files o This means we could merge produce.c1 and produce-main.c1 into a single file Same thing for word.c1 and words-main.c1 7
Generic Hash Dictionaries Let s try combining multiple instances in one application Linux Terminal # cc0 -dx hdict.c1 produce.c1 lib/*.c0 words.c1 combined-main.c1 words.c1:38.1-45.2:error:function 'entry_key' defined more than once previous definition at produce.c1:31.1-38.2 key entry_key(entry x) ... } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compilation failed This still doesn t work! o Both produce.c1 and words.c1 define entry_key And key_equiv and key_hash o This is not allowed in C0/C1 o Even if it were, the library wouldn t know which version to use with what data 8
How to Avoid Duplicate Definitions? This is all we need to do in the client definition file We avoid duplicate client definition function names by renaming them E.g., Rename key_hash as key_hash_produce Similarly for the word count application // What the client wants to store struct inventory_item { string fruit; // key int quantity; }; /******* Fulfilling the library interface *******/ key entry_key_produce(entry e) //@requires e != NULL && \hastag(struct inventory_item*, e); //@ensures \result != NULL && \hastag(string*, \result); { struct inventory_item* E = (struct inventory_item*)e; string* K = to_string_ptr(E->fruit); return (key)K; } But how to tell the library which function to use? bool key_equiv_produce(key k1, key k2) //@requires k1 != NULL && \hastag(string*, k1); //@requires k2 != NULL && \hastag(string*, k2); { return string_equal(*(string*)k1, *(string*)k2); } int key_hash_produce(key k) //@requires k != NULL && \hastag(string*, k); { return lcg_hash_string(*(string*)k); } 9
The Hash Dictionary Library for (chain* p = H->table[i]; p != NULL; p = p->next) if (key_equiv(entry_key(p->data), k)) return p->data; return NULL; } // Implementation-side types typedef struct chain_node chain; struct chain_node { entry data; chain* next; }; struct hdict_header { int size; int capacity; chain*[] table; }; typedef struct hdict_header hdict; Implementation Client Interface typedef void* entry; typedef void* key; // data != NULL void hdict_insert(hdict* H, entry e) //@requires is_hdict(H) && e != NULL; //@ensures hdict_lookup(H, entry_key(e)) == e; //@ensures is_hdict(H); { key k = entry_key(e); int i = index_of_key(H, k); for (chain* p = H->table[i]; p != NULL; p = p->next) { if (key_equiv(entry_key(p->data), k)) { p->data = e; return; } } chain* p = alloc(chain); p->data = e; p->next = H->table[i]; H->table[i] = p; (H->size)++; } key entry_key(entry e) /*@requires e != NULL; @*/ ; // size >= 0 // capacity > 0 // \length(table) == capacity int key_hash(key k); bool key_equiv(key k1, key k2); // Representation invariant bool is_hdict (hdict* H) { return H != NULL && H->size >= 0 && H->capacity > 0 && is_array_expected_length(H->table, H->capacity) && is_valid_hashtable(H); } Library Interface // typedef ______* hdict_t; hdict_t hdict_new(int capacity) /*@requires capacity > 0; /*@ensures \result != NULL; @*/ @*/ ; // Implementation of interface functions int index_of_key(hdict* H, key k) //@requires is_hdict(H); //@ensures 0 <= \result && \result < H->capacity; { return abs(key_hash(k) % H->capacity); } entry hdict_lookup(hdict_t D, key k) /*@requires D != NULL; /*@ensures \result != NULL || key_equiv(entry_key(\result), k); hdict* hdict_new(int capacity) //@requires capacity > 0; //@ensures is_hdict(\result); { hdict* H = alloc(hdict); H->size = 0; H->capacity = capacity; H->table = alloc_array(chain*, capacity); return H; } @*/ @*/ ; entry hdict_lookup(hdict* H, key k) //@requires is_hdict(H); //@ensures \result == NULL || key_equiv(entry_key(\result), k); { int i = index_of_key(H, k); void hdict_insert(hdict_t D, entry e) /*@requires D != NULL && e != NULL; /*@ensures hdict_lookup(D, entry_key(e)) == e; @*/ ; @*/ // Client type typedef hdict* hdict_t; return abs(key_hash(k) % H->capacity); 10
How to Avoid Duplicate Definitions? This is all we need to do in the client definition file We avoid duplicate client definition function names by renaming them E.g., Rename key_hash as key_hash_produce Similarly for the word count application // What the client wants to store struct inventory_item { string fruit; // key int quantity; }; /******* Fulfilling the library interface *******/ key entry_key_produce(entry e) //@requires e != NULL && \hastag(struct inventory_item*, e); //@ensures \result != NULL && \hastag(string*, \result); { struct inventory_item* E = (struct inventory_item*)e; string* K = to_string_ptr(E->fruit); return (key)K; } But how to tell the library which function to use? o By using function pointers! bool key_equiv_produce(key k1, key k2) //@requires k1 != NULL && \hastag(string*, k1); //@requires k2 != NULL && \hastag(string*, k2); { return string_equal(*(string*)k1, *(string*)k2); } int key_hash_produce(key k) //@requires k != NULL && \hastag(string*, k); { return lcg_hash_string(*(string*)k); } 11
0xFFFF OS Accessing the Right Functions main A H 0xBB8 0xD04 hdict_lookup H k i 0xD04 STACK During execution, functions live in the TEXT segment of memory o & allows us to store their addresses and pass them around as function pointers o We can call a function through a pointer to it 0x0AC 1 key_hash k 0x0AC 0xBB8 0x080 20 0xD04 5 3 0x090 10 Idea: Make pointers to the appropriate client functions available to the library o But how to do so? 0 HEAP 1 2 3 4 0x088 50 DATA "apple" "lime" 0x0AC main hdict_lookup key_hash_produce key_hash_wcount TEXT OS 0x0 12
Client Function Types We need to define types for the client functions entry_key_fn key_hash_fn key_equiv_fn These definitions go in the client interface Client Interface Client Interface typedef void* entry; typedef void* key; typedef void* entry; typedef void* key; typedef key entry_key_fn(entry e) /*@requires e != NULL; @*/ ; typedef int key_hash_fn(key k); typedef bool key_equiv_fn(key k1, key k2); key entry_key(entry e) /*@requires e != NULL; @*/ ; int key_hash(key k); bool key_equiv(key k1, key k2); 13
The Hash Dictionary Library for (chain* p = H->table[i]; p != NULL; p = p->next) if (key_equiv(entry_key(p->data), k)) return p->data; return NULL; } // Implementation-side types typedef struct chain_node chain; struct chain_node { entry data; chain* next; }; struct hdict_header { int size; int capacity; chain*[] table; }; typedef struct hdict_header hdict; Client Interface ? typedef void* entry; typedef void* key; // data != NULL void hdict_insert(hdict* H, entry e) //@requires is_hdict(H) && e != NULL; //@ensures hdict_lookup(H, entry_key(e)) == e; //@ensures is_hdict(H); { key k = entry_key(e); int i = index_of_key(H, k); for (chain* p = H->table[i]; p != NULL; p = p->next) { if (key_equiv(entry_key(p->data), k)) { p->data = e; return; } } chain* p = alloc(chain); p->data = e; p->next = H->table[i]; H->table[i] = p; (H->size)++; } typedef key entry_key_fn(entry e) /*@requires e != NULL; @*/ ; typedef int key_hash_fn(key k); typedef bool key_equiv_fn(key k1, key k2); // size >= 0 // capacity > 0 // \length(table) == capacity // Representation invariant bool is_hdict (hdict* H) { return H != NULL && H->size >= 0 && H->capacity > 0 && is_array_expected_length(H->table, H->capacity) && is_valid_hashtable(H); } Library Interface // typedef ______* hdict_t; hdict_t hdict_new(int capacity) /*@requires capacity > 0; /*@ensures \result != NULL; @*/ @*/ ; // Implementation of interface functions int index_of_key(hdict* H, key k) //@requires is_hdict(H); //@ensures 0 <= \result && \result < H->capacity; { return abs(key_hash(k) % H->capacity); } entry hdict_lookup(hdict_t D, key k) /*@requires D != NULL; /*@ensures \result != NULL || key_equiv(entry_key(\result), k); hdict* hdict_new(int capacity) //@requires capacity > 0; //@ensures is_hdict(\result); { hdict* H = alloc(hdict); H->size = 0; H->capacity = capacity; H->table = alloc_array(chain*, capacity); return H; } @*/ @*/ ; entry hdict_lookup(hdict* H, key k) //@requires is_hdict(H); //@ensures \result == NULL || key_equiv(entry_key(\result), k); { int i = index_of_key(H, k); void hdict_insert(hdict_t D, entry e) /*@requires D != NULL && e != NULL; /*@ensures hdict_lookup(D, entry_key(e)) == e; @*/ ; @*/ // Client type typedef hdict* hdict_t; 14
Option I Pass the right client functions to the library functions that use them o So, entry A = make_inventory_item("apple", 20); hdict_insert(H, A); becomes entry A = make_inventory_item("apple", 20); hdict_insert(H, A, &entry_key_produce, &key_hash_produce, &key_equiv_produce); Then do this for every use of hdict_insert and hdict_lookup o We will potentially make mistakes 15
Option II This is all we need to do in the client application file A better option is to pass the right client functions when we create a dictionary o I.e., In hdict_new hdict_t H = hdict_new(10, &entry_key_produce &key_hash_produce, &key_equiv_produce); hdict_t H = hdict_new(10); hdict_new needs to store the client functions in H itself o Modify the internal representation of H o Upgrade the representation invariant of H o Upgrade hdict_new Step 1 16
Modifying the Internal Representation Extend struct hdict_header with three additional fields struct hdict_header { int size; int capacity; chain*[] table; entry_key_fn* key; key_hash_fn* hash; // != NULL key_equiv_fn* equiv; // != NULL }; typedef struct hdict_header hdict; struct hdict_header { int size; int capacity; chain*[] table; }; typedef struct hdict_header hdict; // size >= 0 // capacity > 0 // \length(table) == capacity // != NULL // size >= 0 // capacity > 0 // \length(table) == capacity Storing both data and functions in a struct is a fundamental concept in object-oriented programming o These structs are called objects o The functions are called methods There is a lot more to object-oriented programming however 17
Option II This is all we need to do in the client application file A better option is to pass the right client functions when we create a dictionary o I.e., In hdict_new hdict_t H = hdict_new(10, &entry_key_produce &key_hash_produce, &key_equiv_produce); hdict_t H = hdict_new(10); hdict_new needs to store the client functions in H itself o Modify the internal representation of H o Upgrade the representation invariant of H o Upgrade hdict_new Step 2 18
Upgrading the Representation Invariant A valid hdict cannot have NULL in the added fields bool is_hdict (hdict* H) { return H != NULL && H->size >= 0 && H->capacity > 0 && is_array_expected_length(H->table, H->capacity) && H->key != NULL && H->hash != NULL && H->equiv != NULL && is_valid_hashtable(H); } bool is_hdict (hdict* H) { return H != NULL && H->size >= 0 && H->capacity > 0 && is_array_expected_length(H->table, H->capacity) && is_valid_hashtable(H); } 19
Option II This is all we need to do in the client application file A better option is to pass the right client functions when we create a dictionary o I.e., In hdict_new hdict_t H = hdict_new(10, &entry_key_produce &key_hash_produce, &key_equiv_produce); hdict_t H = hdict_new(10); hdict_new needs to store the client functions in H itself o Modify the internal representation of H o Upgrade the representation invariant of H o Upgrade hdict_new Step 3 20
Upgrading hdict_new hdict_new o Takes the client functions as inputs o Expects them to be non-NULL o Stores them in the added fields of the concrete type hdict* hdict_new(int capacity) //@requires capacity > 0; //@ensures is_hdict(\result); { hdict* H = alloc(hdict); H->size = 0; H->capacity = capacity; H->table = alloc_array(chain*, capacity); return H; } hdict* hdict_new(int capacity, //@requires capacity > 0; //@requires entry_key != NULL && hash != NULL && equiv != NULL; //@ensures is_hdict(\result); { hdict* H = alloc(hdict); H->size = 0; H->capacity = capacity; H->table = alloc_array(chain*, capacity); H->key = entry_key; H->hash = hash; H->equiv = equiv; return H; } entry_key_fn* entry_key, key_hash_fn* hash, key_equiv_fn* equiv) 21
Calling the Client Functions E.g., key_hash Whenever we need a client function, we call the function pointer in the data structure Here, H->hash For example, int index_of_key(hdict* H, key k) //@requires is_hdict(H); //@ensures 0 <= \result && \result < H->capacity; { return abs(key_hash(k) % H->capacity); } int index_of_key(hdict* H, key k) //@requires is_hdict(H); //@ensures 0 <= \result && \result < H->capacity; { return abs((*H->hash)(k) % H->capacity); } This is the same as (*(H->hash))( ) You do this for every call to a client function in the library implementation 22
Is it Generic? Linux Terminal # cc0 -dx hdict.c1 produce.c1 lib/*.c0 words.c1 combined-main.c1 All word count tests passed! All produce tests passed! 0 Yes! 23
Cost Complexity of various implementations of dictionaries o Assuming it contains n entries Unsorted array Array sorted by key Linked list Hash Table O(1) average O(n) O(log n) O(n) lookup O(1) amortized O(1) O(n) O(1) insert average and amortized Hash dictionaries are clearly the best implementation o O(1) lookup and insertion are hard to beat! o Or are they? 25
Cost For lookup, it s O(1) average o We could be (very) unlucky and incur an O(n) cost E.g., If we use a poor hash function For insert, it s O(1) amortized o From time to time, we need to resize the table Then insert costs O(n) Operations like finding the entry with the smallest key would cost O(n) o We have to check every entry Using hash dictionaries is too risky or not good enough for applications that require a guaranteed (short) response time But they are great for applications that don t have such a constraint 26
Goal Develop a data structure that has guaranteed O(log n) worst-case complexity for lookup, insert and find_min o Always! o O(1) would be great, but we can t get that Returns the entry with the smallest key Unsorted array Array sorted by key Linked list Hash Table O(1) average O(n) O(log n) O(n) O(log n) lookup O(1) amortized O(1) O(n) O(1) O(log n) insert average and amortized O(n) O(1) O(n) O(n) O(log n) find_min 27
Getting Started The only O(log n) so far is lookup in sorted arrays Unsorted array Array sorted by key Linked list Hash Table O(1) average O(n) O(log n) O(n) O(log n) lookup O(1) amortized O(1) O(n) O(1) O(log n) insert average and amortized O(n) O(1) O(n) O(n) O(log n) find_min That s binary search o Let s start there 28
Searching for a Number Consider the following sorted array 0 1 2 3 4 5 6 7 8 9 -2 0 4 7 12 19 22 42 65 When searching for a number x using binary search, we always start by looking at the midpoint, index 4 We always look at this element 12 Then, 3 things can happen o x = 12 (and we are done) o x < 12 o x > 12 30
Searching for a Number If x < 12, the next index we look at is necessarily 2 If x > 12, the next index we look at is necessarily 7 12 if x < 12 if x > 12 4 42 Next, we may look at these elements 0 1 2 3 4 5 6 7 8 9 -2 0 4 7 12 19 22 42 65 31
Searching for a Number Assume x < 12, so we look at index 2 o If x = 4, we are done o If x < 4, we necessarily look at index 1 o If x > 4, we necessarily look at index 3 12 if x < 12 if x > 12 4 42 if x < 4 if x > 4 Then, we may look at these elements 0 7 0 1 2 3 4 5 6 7 8 9 -2 0 4 7 12 19 22 42 65 32
Searching for a Number Assume x < 4, so we look at index 1 o If x = 0, we are done o If x < 0, we necessarily look at index 0 12 if x < 12 if x > 12 4 42 if x < 4 if x > 4 0 7 if x < 0 Then, we may look at this element -2 0 1 2 3 4 5 6 7 8 9 -2 0 4 7 12 19 22 42 65 33
Searching for a Number We can map out all possible sequences of elements binary search may examine, for any x We are essentially hoisting the array by its midpoint, its two sides by their midpoint, etc., This is called a decision tree: At every step, it tells us how to decide what to do next 12 if x < 12 if x > 12 4 42 if x < 4 if x > 4 if x < 42 if x > 42 0 7 22 65 if x < 0 if x < 22 -2 19 0 1 2 3 4 5 6 7 8 9 -2 0 4 7 12 19 22 42 65 34
Searching for a Number An array provides direct access to all elements o This is an overkill for binary search o At any point, it needs direct access to at most two elements 12 if x < 12 if x > 12 4 42 if x < 4 if x > 4 if x < 42 if x > 42 0 7 22 65 if x < 0 if x < 22 -2 19 0 1 2 3 4 5 6 7 8 9 -2 0 4 7 12 19 22 42 65 35
Searching for a Number We can achieve the same access pattern by pairing up each element with twopointers o One to each of the two elements that may be examined next 12 4 42 0 7 22 65 -2 19 Arrays gave us more power than needed! We are losing direct access to arbitrary elements o But retaining access to the elements that matter to binary search 36
Towards an Implementation We can capture this idea with this type declaration: typedef struct tree_node tree; struct tree_node { tree* left; int data; tree* right; }; Note that it is note that it is recursive recursive A struct tree_node left data right Or simply a node 12 o A data element o Pointers to the 2 elements we may look at next 4 42 0 7 22 65 This struct is called a node -2 19 This arrangement of data in memory is called a tree 37
typedef struct tree_node tree; struct tree_node { tree* left; int data; tree* right; }; Constructing this Tree left data right T 12 Let s build the first few nodes of this example 4 42 0 7 22 65 tree* T = alloc(tree); T->data = 12; T->left = alloc(tree); T->left->data = 4; T->right= alloc(tree); T->right->data = 42; T->left->left = alloc(tree); -2 19 38
typedef struct tree_node tree; struct tree_node { tree* left; int data; tree* right; }; The End of the Line left data right What should the grey left/right fields point to? 12 4 42 0 7 22 65 -2 19 o NULL Each sequence of left/right pointers works like a NULL-terminated list o A dummy node Not very useful here We used dummy nodes to get direct access to the end of a list 39
Searching Follow the pointer in the left field left data right Search for 7 o 7 < 12: Go left o 7 > 4: Go right o 7 = 7: Found 12 4 42 0 7 22 65 -2 19 We are doing the same steps as binary search Starting from an n-element array, the cost is O(log n) If the tree is obtained as in this example 40
Searching left data right Search for 5 o 5 < 12: Go left o 5 > 4: Go right o 5 < 7: Go left Nowhere to go o Not there 12 4 42 0 7 22 65 -2 19 We are doing the same steps as binary search Starting from an n-element array, the cost is O(log n) If the tree is obtained as in this example 41
Recall our Goal Develop a data structure that has guaranteed O(log n) worst-case complexity for lookup, insert and find_min o Always! o lookup has cost O(log n) Target data structure In this setup O(log n) lookup O(log n) insert O(log n) find_min o What about insert and find_min? 42
Insertion left data right Insert 5 o 5 < 12: Go left o 5 > 4: Go right o 5 < 7: Go left Put it there 12 4 42 0 7 22 65 5 -2 19 We are pursuing the same steps as search and then putting it where it should be o So that we find it when searching for it next time For an n-element array, this costs O(log n) If the tree is obtained as in this example We couldn t get this with sorted arrays 43
Finding the Smallest Key left data right Keep going left o Left from 12 to 4 o Left from 4 to 0 o Left from 0 to -2 Nothing to its left o The smallest key is -2 12 4 42 0 7 22 65 -2 19 Starting from an n-element array, we can go left at most O(log n) times o The cost is O(log n) If the tree is obtained as in this example 44
Recall our Goal Develop a data structure that has guaranteed O(log n) worst-case complexity for lookup, insert and find_min o Always! o lookup, insert and find_min all have cost O(log n) Target data structure In this setup O(log n) lookup O(log n) insert O(log n) find_min 45
Trees 46
Terminology This arrangement of data is called a (binary) tree o Each item in it is called a node o The part of a tree hanging from a node is called a branch Or subtree a node a node A node 12 4 42 A tree 0 7 22 65 -2 19 A branch (or subtree) 47
Terminology The node at the top is called the root of the tree o The nodes at the bottom are the leaves of the tree o The other nodes are called inner nodes Trees grow upside down in Computer Science! The root 12 an inner node An inner node 4 42 A tree 0 7 22 65 -2 19 a leaf A leaf 48
Terminology Given any node o The node to its left is its left child o The node to its right is its right child o The node above it is its parent .. and Computer Science mixes botanical trees and family trees! 12 A node 4 42 Its left child Its right child 0 7 22 65 Their parent -2 19 49