Understanding Multidimensional Access Structures in Advanced Databases
Explore the concept of multidimensional access structures in advanced databases with Dr. Nicholas Gibbins. Learn about conventional indexes, hash-like grid files, hierarchical indexing, and various tree structures like kd-trees and r-trees. Discover applications in geographic information systems and practical scenarios for database queries. Delve into different approaches for finding specific data records efficiently.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Multidimensional Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins nmg@ecs.soton.ac.uk
Overview Conventional indexes Hash-like grid files, partitioned hashing Hierarchical indexes multiple key, kd-trees, quad trees, r-trees, ub-trees Bitmap indexes 3
Multidimensional Access Structures Indexes discussed so far are one-dimensional assume a single search key require a single linear order for keys (B-trees) require that the key be completely known for any lookup (hash tables) 4
Applications Geographic information systems partial match queries range queries nearest-neighbour queries 5
Scenario Personnel database EMPLOYEE table with attributes dept salary How can we find employees who work in the sales department and have salaries greater than 40,000? 7
Approach #1 1. Get all matching records using an index on one attribute 2. Check values of other attribute on those records dept=sales scan for salary>40000 Idept ... 8
Approach #2 1. Use secondary indexes on each attribute to get two sets of record pointers 2. Take intersection of sets salary>40000 dept=sales Idept Isalary compare for intersection ... ... 9
Approach #3 1. Use secondary index on one attribute to select suitable index on other attribute 2. Get all matching records using selected index sales Isalary dept=sales research Idept Isalary ... Isalary production 10
For which queries is this index good? dept=sales salary=40000 dept=sales salary>40000 dept=sales salary = 40000 11
Grid File 100k Partition multi-dimensional space with a grid Grid lines partition space into stripes Intersections of stripes from different dimensions define regions salary 40k 20k 0k 0 40 55 100 age 13
Grid File 100k Partition multi-dimensional space with a grid age < 40 salary < 100k salary >= 40k Grid lines partition space into stripes Intersections of stripes from different dimensions define regions salary 40k 20k 0k 0 40 55 100 age 14
Grid File 100k Each region associated with a pointer to a bucket of record pointers Attribute values for record determine region and therefore bucket salary Fixed number of regions overflow blocks used to increase bucket size as necessary 40k Can index grid on value ranges 20k 0k 0 40 55 100 age 15
Grid files Pro Good for multiple-key search Supports partial-match, range and nearest-neighbour queries Con - Space, management overhead (nothing is free) - Need partitioning ranges that evenly split keys 16
Partitioned Hash Hash function takes a list of attribute values as arguments 1 0 0 1 1 0 0 0 1 1 Bits of hash value divided between attributes Effectively, a hash function per attribute hash1 hash2 attribute 1 attribute 2 18
Example 000 hash1(sales) = 0 hash1(research) = 1 001 010 hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 110 111 19
Insertion 000 hash1(sales) = 0 hash1(research) = 1 001 010 Fred hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 Fred works in sales Fred s salary is 40,000 110 111 20
Retrieval 000 hash1(sales) = 0 hash1(research) = 1 001 010 hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 dept=sales salary=40000 110 111 21
Retrieval 000 hash1(sales) = 0 hash1(research) = 1 001 010 hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 salary=20000 110 111 22
Retrieval 000 hash1(sales) = 0 hash1(research) = 1 001 010 hash2(10000) = 00 011 hash2(20000) = 01 hash2(40000) = 10 100 hash2(100000) = 11 101 dept=sales 110 111 23
Partitioned hash Pro Good hash function will evenly distribute records between buckets Supports partial-match queries Con No good for nearest-neighbour or range queries 24
kd-Tree Multidimensional binary search tree Each node splits the k-dimensional space along a hyperplane Nodes contain an attribute-value pair a pair of pointers All nodes at the same level discriminate for the same attribute Levels rotate between attributes of all dimensions 26
Example, k=2 100k salary 55k 45k 0k 0 40 70 100 age 27
Example, k=2 100k age=40 salary 55k 45k 0k 0 40 70 100 age 28
Example, k=2 100k age=40 salary=45k salary 55k 45k 0k 0 40 70 100 age 29
Example, k=2 100k age=40 salary=45k salary=55k salary 55k 45k 0k 0 40 70 100 age 30
Example, k=2 100k age=40 salary=45k salary=55k salary 55k 45k age=70 0k 0 40 70 100 age 31
Example, k=2 100k age=40 salary=45k salary=55k salary 55k 20,20k 25,80k 35,45k 50,55k 80,60k 45k age=70 40,35k 55,45k 70,20k 0k 0 40 70 100 age 32
Partial-Match Queries If we know value of attribute, we can choose which branch to explore If we don t know value of attribute, must explore both branches 33
Adapting kd-Trees to Secondary Storage Average path length from root to leaf: log2n Disk accesses should be kept as few as possible Two approaches: 1. Multiway nodes (split values into n ranges) 2. Group nodes in blocks (node plus descendants to a given ply) 34
Quad-Trees Two main types: Region quad-tree Point quad-tree 36
Region Quad-tree 100k Each partition divides the space into four equal area sub-regions ne, nw, se, sw Split regions if they contain more records than will fit into a block salary 50k Operations similar to those for kd-trees 25k 0k 25 0 50 100 age 37
Region Quad-tree 100k salary 50k 25k 0k 25 0 50 100 age 38
Region Quad-tree 100k 50,50k nw sw ne se salary 50k 25k 0k 25 0 50 100 age 39
Region Quad-tree 100k 50,50k nw sw ne se 25,25k salary 50k nw sw ne se 25k 0k 25 0 50 100 age 40
Region Quad-tree 100k 50,50k nw sw ne se 25,80k 50,55k 80,60k 55,45k 70,20k 25,25k salary 50k nw sw ne se 25k 20,20k 35,45k 40,35k 0k 25 0 50 100 age 41
Point Quad-Tree 100k Partitions are not equal area Split lines centred on data points ne/nw/se/sw sub-regions Otherwise, equivalent to region quad- tree salary 55k 45k 0k 0 50 100 35 age 42
Point Quad-Tree 100k salary 55k 45k 0k 0 35 50 100 age 43
Point Quad-Tree 100k 50,55k nw sw ne se salary 55k 45k 0k 0 35 50 100 age 44
Point Quad-Tree 100k 50,55k nw sw ne se 35,45k salary 55k 45k nw sw ne se 0k 0 35 50 100 age 45
Point Quad-Tree 100k 50,55k nw sw ne se 25,80k 50,55k 80,60k 55,45k 70,20k 35,45k salary 55k 45k nw sw ne se 20,20k 35,45k 40,35k 0k 0 35 50 100 age 46
R-Trees Used to represent data that consists of k-dimensional data regions r3 r1 d6 Internal nodes of tree represent regions that contain data regions d5 d1 r2 d4 Regions typically defined as top-right, bottom-left coordinates d2 d3 48
R-Trees root d6 d5 d1 d4 d2 d3 49
R-Trees root r3 r1 d6 d5 d1 r1 r2 r3 r2 d4 d2 d3 50