Database Systems: B+ Trees and Index Structures

Database Systems: B+ Trees and Index Structures
Slide Note
Embed
Share

The role of B+ trees and index structures in database systems, focusing on speeding up search processes, handling dynamic changes efficiently, and supporting fast, range, and dynamic searches. Learn about the structure and search mechanisms of B+ trees, including their ability to quickly identify blocks containing desired tuples and their support for both dense and sparse configurations. Discover the characteristics of B+ tree nodes and the significance of the order parameter in optimizing disk I/O operations.

  • Database Systems
  • B+ Trees
  • Index Structures
  • Search Optimization
  • Dynamic Changes

Uploaded on Apr 13, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 25: B+ trees: structure and search

  2. in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS

  3. The Main Purpose of Index Structures Speedup the search process quickly figure out blocks containing the desired tuples index a=6(R) disks 3

  4. The Main Purpose of Index Structures Speedup the search process quickly figure out blocks containing the desired tuples index a=6(R) otherwise have to scan the entire R disks 4

  5. The Main Purpose of Index Structures Speedup the search process quickly figure out blocks containing the desired tuples index a=6(R) otherwise have to scan the entire R disks But also need to handle dynamic changes of R 5

  6. B+Trees Support fast search Support range search Support dynamic changes Could be either dense or sparse dense: pointers to all records sparse: one pointer per block 6

  7. B+Trees A B+tree node of order n pn+1 kn k2 p1 p2 k1 p3 where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index) 7

  8. B+Trees A B+tree node of order n pn+1 kn k2 p1 p2 k1 p3 where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index) How big is n? Basically we want each B+tree node to fit in a disk block so that a B+tree node can be read/written by a single disk I/O. Typically, n ~ 100-200. 8

  9. B+Tree Example order n = 3 root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 9

  10. A B+Tree of order n Each node has: n keys and n+1 pointers These are fixed To keep the nodes not too empty, also for the operations to be applied efficiently: * Non-leaf: at least (n+1)/2 pointers (to children) * Leaf: at least (n+1)/2 pointers to data (plus a sequence pointer to the next leaf) Basically: use at least one half of the pointers 10

  11. Sample non-leaf order n = 3 57 81 95 To keys 81 k<95 To keys k 95 To keys k < 57 To keys 57 k<81 11

  12. Sample leaf node order n = 3 From non-leaf node To next leaf in sequence 57 81 95 To record with key 57 To record with key 95 To record with key 81 12

  13. Example (B+ tree of order n=3) Min. node Full node 120 150 180 30 Non-leaf 30 35 3 5 11 Leaf 13

  14. B+tree rules Rule 1. All leaves are at same lowest level (balanced tree) Rule 2. Pointers in leaves point to records except for sequence pointer Rule 3. Number of keys/pointers in nodes: Max. # pointers n+1 n+1 n+1 Max. # keys n n n Min. # pointers (n+1)/2 (n+1)/2 + 1 2 Min. # keys (n+1)/2 1 (n+1)/2 1 Non-leaf Leaf Root 14

  15. B+tree rules Rule 1. All leaves are at same lowest level (balanced tree) Rule 2. Pointers in leaves point to records except for sequence pointer Rule 3. Number of keys/pointers in nodes: Max. # pointers n+1 n+1 n+1 Max. # keys n n n Min. # pointers (n+1)/2 (n+1)/2 + 1 2 Min. # keys (n+1)/2 1 (n+1)/2 1 Non-leaf Leaf Root could be 1 15

  16. Search in a B+tree 1. Start from the root 2. Search in a leaf block 3. May not have to go to the data file Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); 16

  17. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn A tree node 17

  18. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn A tree node root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 18

  19. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn A tree node Search(*prt, 130) root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 19

  20. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn A tree node Search(*prt, 130) root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 20

  21. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn A tree node Search(*prt, 130) root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 21

  22. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn A tree node Search(*prt, 130) root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 22

  23. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn A tree node Search(*prt, 130) root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 130 =130 return 23

  24. Range Search in B+tree To research all records whose key values are between k1 and k2: 24

  25. Range Search in B+tree To research all records whose key values are between k1 and k2: Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the search key value is larger than k2. 25

  26. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Range Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the search key value is larger than k2. A tree node root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 26

  27. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Range Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the search key value is larger than k2. Range-Search(*prt, 50, 125) A tree node root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 27

  28. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Range Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the search key value is larger than k2. Range-Search(*prt, 50, 125) Search(*prt, 50) A tree node root 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 28

  29. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Range Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the search key value is larger than k2. Range-Search(*prt, 50, 125) Search(*prt, 50) A tree node root 100 120 150 180 30 30 50 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 29

  30. Search(ptr, k); \\ search a record of key value k in the subtree rooted at ptr \\ assume the B+tree is a dense index of order n Case 1. ptr is a leaf \\ pn+1 is the sequence pointer IF (k = ki) for a key ki in *ptr THEN return(pi); ELSE return(Null); Case 2. ptr is not a leaf find a key ki in *ptr such that ki-1 k < ki; return(Search(pi, k)); Range Search in B+tree k1 k2 kn p1 p2 p3 pn+1 pn Range-Search(ptr, k1, k2) 1. Call Search(ptr, k1); 2. Follow the sequence pointers until the search key value is larger than k2. Range-Search(*prt, 50, 125) Search(*prt, 50) A tree node root 100 120 150 180 30 30 50 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 35 < 50 Not Return 135 > 125 STOP 110 125 return 100 125 return 101 125 return 120 125 return 30

  31. Insert into B+tree 31

  32. Insert into B+tree Basic idea: 1. Find the leaf L where the record r should be inserted; 2. If L has further room, then insert r into L, and return; 3. If L is full, spilt L plus r into two leaves (each is about half full): this causes an additional child for the parent P of L, thus we need to add a child to P; 4. If P is already full, then we have to split P and add an additional child to P s parent (recursively) 32

  33. Insert into B+tree Simple case (space available for new child) Leaf overflow Non-leaf overflow New root 33

  34. Insert into B+tree Simple case (space available for new child) Leaf overflow Non-leaf overflow New root 34

  35. I. Simple case: Insert key 32 order n=3 100 30 40 3 5 11 30 31 35

  36. I. Simple case: Insert key 32 order n=3 Insert(prt, 32) 100 30 40 3 5 11 30 31 36

  37. I. Simple case: Insert key 32 order n=3 Insert(prt, 32) 100 30 40 3 5 11 30 31 37

  38. I. Simple case: Insert key 32 order n=3 Insert(prt, 32) 100 30 40 3 5 11 30 31 room for 32 38

  39. I. Simple case: Insert key 32 order n=3 Insert(prt, 32) 100 30 40 3 5 11 30 31 32 39

  40. Insert into B+tree Simple case (space available for new child) Leaf overflow Non-leaf overflow New root 40

  41. Insert into B+tree Simple case (space available for new child) Leaf overflow Non-leaf overflow New root 41

  42. Insert into B+tree Simple case (space available for new child) Leaf overflow Non-leaf overflow New root Idea: when there is no space for a new child (all pointers are in use), split the node into two nodes, with both at least half full. 42

  43. Complication: node overflow 43

  44. Complication: Leaf overflow 44

  45. Complication: Leaf overflow p k p k pn kn p* ? + 1 /2 +1 p1 k1 p kp ? + 1 /2 ? + 1 /2 +1 ? + 1 /2 pn+1kn+1 45

  46. Complication: Leaf overflow p k k p p1 k1 p k --- p** ? + 1 /2 pk pn kn pn+1 kn+1 --- p* ? + 1 /2 +1 ? + 1 /2 +1 ? + 1 /2 p k p k pn kn p* ? + 1 /2 +1 p1 k1 p kp ? + 1 /2 ? + 1 /2 +1 ? + 1 /2 pn+1kn+1 46

  47. Complication: Leaf overflow p k k p p1 k1 p k --- p** ? + 1 /2 pk pn kn pn+1 kn+1 --- p* ? + 1 /2 +1 ? + 1 /2 +1 ? + 1 /2 p k p k pn kn p* ? + 1 /2 +1 p1 k1 p kp ? + 1 /2 ? + 1 /2 +1 ? + 1 /2 pn+1kn+1 47

  48. Complication: Leaf overflow q p k k p p1 k1 p k --- p** ? + 1 /2 pk pn kn pn+1 kn+1 --- p* ? + 1 /2 +1 ? + 1 /2 +1 ? + 1 /2 p k p k pn kn p* ? + 1 /2 +1 p1 k1 p kp ? + 1 /2 ? + 1 /2 +1 ? + 1 /2 pn+1kn+1 48

  49. Complication: Leaf overflow q p k k p ? What is the key here? p1 k1 p k --- p** ? + 1 /2 pk pn kn pn+1 kn+1 --- p* ? + 1 /2 +1 ? + 1 /2 +1 ? + 1 /2 p k p k pn kn p* ? + 1 /2 +1 p1 k1 p kp ? + 1 /2 ? + 1 /2 +1 ? + 1 /2 pn+1kn+1 49

  50. Complication: Leaf overflow q p k k p ? What is the key here? p1 k1 p k --- p** ? + 1 /2 pk pn kn pn+1 kn+1 --- p* ? + 1 /2 +1 ? + 1 /2 +1 ? + 1 /2 p k p k pn kn p* ? + 1 /2 +1 p1 k1 p kp ? + 1 /2 ? + 1 /2 +1 ? + 1 /2 pn+1kn+1 50

More Related Content