Understanding Multidimensional Access Structures in Advanced Databases

slide1 n.w
1 / 78
Embed
Share

Explore the concept of multidimensional access structures in advanced databases with Dr. Nicholas Gibbins. Learn about conventional indexes, hash-like grid files, hierarchical indexing, and various tree structures like kd-trees and r-trees. Discover applications in geographic information systems and practical scenarios for database queries. Delve into different approaches for finding specific data records efficiently.

  • Databases
  • Multidimensional Structures
  • Advanced
  • Indexing
  • Geographic Information

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Multidimensional Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins nmg@ecs.soton.ac.uk

  2. Overview Conventional indexes Hash-like grid files, partitioned hashing Hierarchical indexes multiple key, kd-trees, quad trees, r-trees, ub-trees Bitmap indexes 3

  3. Multidimensional Access Structures Indexes discussed so far are one-dimensional assume a single search key require a single linear order for keys (B-trees) require that the key be completely known for any lookup (hash tables) 4

  4. Applications Geographic information systems partial match queries range queries nearest-neighbour queries 5

  5. Conventional Indexes

  6. Scenario Personnel database EMPLOYEE table with attributes dept salary How can we find employees who work in the sales department and have salaries greater than 40,000? 7

  7. Approach #1 1. Get all matching records using an index on one attribute 2. Check values of other attribute on those records dept=sales scan for salary>40000 Idept ... 8

  8. Approach #2 1. Use secondary indexes on each attribute to get two sets of record pointers 2. Take intersection of sets salary>40000 dept=sales Idept Isalary compare for intersection ... ... 9

  9. Approach #3 1. Use secondary index on one attribute to select suitable index on other attribute 2. Get all matching records using selected index sales Isalary dept=sales research Idept Isalary ... Isalary production 10

  10. For which queries is this index good? dept=sales salary=40000 dept=sales salary>40000 dept=sales salary = 40000 11

  11. Grid Files

  12. Grid File 100k Partition multi-dimensional space with a grid Grid lines partition space into stripes Intersections of stripes from different dimensions define regions salary 40k 20k 0k 0 40 55 100 age 13

  13. Grid File 100k Partition multi-dimensional space with a grid age < 40 salary < 100k salary >= 40k Grid lines partition space into stripes Intersections of stripes from different dimensions define regions salary 40k 20k 0k 0 40 55 100 age 14

  14. Grid File 100k Each region associated with a pointer to a bucket of record pointers Attribute values for record determine region and therefore bucket salary Fixed number of regions overflow blocks used to increase bucket size as necessary 40k Can index grid on value ranges 20k 0k 0 40 55 100 age 15

  15. Grid files Pro Good for multiple-key search Supports partial-match, range and nearest-neighbour queries Con - Space, management overhead (nothing is free) - Need partitioning ranges that evenly split keys 16

  16. Partitioned Hash

  17. Partitioned Hash Hash function takes a list of attribute values as arguments 1 0 0 1 1 0 0 0 1 1 Bits of hash value divided between attributes Effectively, a hash function per attribute hash1 hash2 attribute 1 attribute 2 18

  18. Example 000 hash1(sales) = 0 hash1(research) = 1 001 010 hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 110 111 19

  19. Insertion 000 hash1(sales) = 0 hash1(research) = 1 001 010 Fred hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 Fred works in sales Fred s salary is 40,000 110 111 20

  20. Retrieval 000 hash1(sales) = 0 hash1(research) = 1 001 010 hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 dept=sales salary=40000 110 111 21

  21. Retrieval 000 hash1(sales) = 0 hash1(research) = 1 001 010 hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 salary=20000 110 111 22

  22. Retrieval 000 hash1(sales) = 0 hash1(research) = 1 001 010 hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 dept=sales 110 111 23

  23. Partitioned hash Pro Good hash function will evenly distribute records between buckets Supports partial-match queries Con No good for nearest-neighbour or range queries 24

  24. kd-Tree

  25. kd-Tree Multidimensional binary search tree Each node splits the k-dimensional space along a hyperplane Nodes contain an attribute-value pair a pair of pointers All nodes at the same level discriminate for the same attribute Levels rotate between attributes of all dimensions 26

  26. Example, k=2 100k salary 55k 45k 0k 0 40 70 100 age 27

  27. Example, k=2 100k age=40 salary 55k 45k 0k 0 40 70 100 age 28

  28. Example, k=2 100k age=40 salary=45k salary 55k 45k 0k 0 40 70 100 age 29

  29. Example, k=2 100k age=40 salary=45k salary=55k salary 55k 45k 0k 0 40 70 100 age 30

  30. Example, k=2 100k age=40 salary=45k salary=55k salary 55k 45k age=70 0k 0 40 70 100 age 31

  31. Example, k=2 100k age=40 salary=45k salary=55k salary 55k 20,20k 25,80k 35,45k 50,55k 80,60k 45k age=70 40,35k 55,45k 70,20k 0k 0 40 70 100 age 32

  32. Partial-Match Queries If we know value of attribute, we can choose which branch to explore If we don t know value of attribute, must explore both branches 33

  33. Adapting kd-Trees to Secondary Storage Average path length from root to leaf: log2n Disk accesses should be kept as few as possible Two approaches: 1. Multiway nodes (split values into n ranges) 2. Group nodes in blocks (node plus descendants to a given ply) 34

  34. Quad-Tree

  35. Quad-Trees Two main types: Region quad-tree Point quad-tree 36

  36. Region Quad-tree 100k Each partition divides the space into four equal area sub-regions ne, nw, se, sw Split regions if they contain more records than will fit into a block salary 50k Operations similar to those for kd-trees 25k 0k 25 0 50 100 age 37

  37. Region Quad-tree 100k salary 50k 25k 0k 25 0 50 100 age 38

  38. Region Quad-tree 100k 50,50k nw sw ne se salary 50k 25k 0k 25 0 50 100 age 39

  39. Region Quad-tree 100k 50,50k nw sw ne se 25,25k salary 50k nw sw ne se 25k 0k 25 0 50 100 age 40

  40. Region Quad-tree 100k 50,50k nw sw ne se 25,80k 50,55k 80,60k 55,45k 70,20k 25,25k salary 50k nw sw ne se 25k 20,20k 35,45k 40,35k 0k 25 0 50 100 age 41

  41. Point Quad-Tree 100k Partitions are not equal area Split lines centred on data points ne/nw/se/sw sub-regions Otherwise, equivalent to region quad- tree salary 55k 45k 0k 0 50 100 35 age 42

  42. Point Quad-Tree 100k salary 55k 45k 0k 0 35 50 100 age 43

  43. Point Quad-Tree 100k 50,55k nw sw ne se salary 55k 45k 0k 0 35 50 100 age 44

  44. Point Quad-Tree 100k 50,55k nw sw ne se 35,45k salary 55k 45k nw sw ne se 0k 0 35 50 100 age 45

  45. Point Quad-Tree 100k 50,55k nw sw ne se 25,80k 50,55k 80,60k 55,45k 70,20k 35,45k salary 55k 45k nw sw ne se 20,20k 35,45k 40,35k 0k 0 35 50 100 age 46

  46. R-Tree

  47. R-Trees Used to represent data that consists of k-dimensional data regions r3 r1 d6 Internal nodes of tree represent regions that contain data regions d5 d1 r2 d4 Regions typically defined as top-right, bottom-left coordinates d2 d3 48

  48. R-Trees root d6 d5 d1 d4 d2 d3 49

  49. R-Trees root r3 r1 d6 d5 d1 r1 r2 r3 r2 d4 d2 d3 50

More Related Content