
Improving Efficiency in Nearest Neighbor Queries
"Learn about SLICE, Reverse k-Nearest Neighbors, and Region-Based Pruning techniques for optimizing query performance in computer science and engineering. Explore related work on pruning strategies like Six-regions and Half-space pruning. Discover innovative methods for finding k-closest facilities and users efficiently."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
SLICE: Reviving SLICE: Reviving Regions Reverse k Nearest Reverse k Nearest Neighbors Regions- -Based Pruning Based Pruning for Neighbors Queries for Queries Click to edit Present s Name Never Stand Still Never Stand Still Faculty of Engineering Faculty of Engineering Computer Science and Engineering Computer Science and Engineering Shiyu Yang1, Muhammad Aamir Cheema2,1, Xuemin Lin1,3, Ying Zhang4,1 1The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China 4University of Technology, Sydney, Australia
Introduction k Nearest Neighbor Query Find the facility that is one of k-closest facilities to the query user. u2 Reverse k Nearest Neighbor Query Find every user for which the query facility is one of the k-closest facilities. u1 f3 u3 f1 RkNNs are the potential customers of a facility f2 K=1 School of Computer Science and Engineering 2
Related Work Pruning Verification Six-regions (SIGMOD 2000) Six-regions (SIGMOD 2000) Region-based TPL (VLDB 2004) FINCH (VLDB 2008) Boost (SIGMOD 2010) InfZone (ICDE2011) TPL (VLDB 2004), FINCH (VLDB 2008), InfZone (ICDE 2011) Half-space School of Computer Science and Engineering 3
Related Work k=2 -Six-regions(SIGMOD 2000) Regions-based Pruning: u2 b a c u1 1. Divide the whole space centred at the query q into six equal regions q d 2. Find the k-th nearest neighbor in each Partition. 3. The k-th nearest facility of q in each region defines the area that can be pruned The user points that cannot be pruned should be verified by range query School of Computer Science and Engineering 4
Related Work k=2 Half-space Pruning: the space that is contained by k half- spaces can be pruned -TPL(VLDB 2004) b a c q 1. Find the nearest facility f in the unpruned area. d 2. Draw a bisector between q and f, prune by using the half-space 3. Iteratively access the nearest facility in unpruned area. School of Computer Science and Engineering 5
Related Work k=2 Half-space Pruning: -InfZone(ICDE 2011) b a c 1. The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning. q d 2. A point p is a RkNN of q if and only if p lies inside unpruned area. 3. No verification phase. Half-space pruning is expensive especially when k is large. School of Computer Science and Engineering 6
Related Work Regions-based SLICE Half-space O(km2) O(m log m) O(m log k) Pruning Cost m is the # of facilities considered for pruning High High Pruning Power Low Verification Cost O(log m) O(k) Range query Can regions-based pruning do better? School of Computer Science and Engineering 7
Notations Partition: P P Upper p f Subtended angle: a min Lower ????(?,?) ????( ???) max Maximum (minimal) subtended angle w.r.t P ( ???, ???) a ????(?,?) ????( ???) q Upper (lower) arc Center: q Radius: ????(?,?) 2cos(?) ? = ??? or ??? School of Computer Science and Engineering 8
Observation -- Pruning p P for which dist(p,q) > ????(?,?) 2cos( ???) (UpperArc) ???< 90 We can prove a < b. a2=b2+c2-2bc cos(?) b>????(?,?) cos( ???) = 2cos( ???) c2-2bc cos(?) < c2-2 A facility f prunes every point P a p Upper f ????(?,?) ????( ???) b c ? ? 2cos( ???)c cos(?) = q cos(?) ???( ???)) <0 c2(1- Facility prunes area outside the upper arc of f for every partition P for which ??? < 90 School of Computer Science and Engineering 9
Comparison with Six-regions Six-region SLICE f ????(?,?) 2cos(?) Area pruned dist(f,q) q Partitions Pruned One ?max< 90o No. of Partitions any 6 School of Computer Science and Engineering 10
Pruning Algorithm Divide space into t partitions u1 f1 Compute the upper arc of each partition for facilities. f2 u2 The area outside the k-th smallest upper arc (rB) in each partition can be pruned. q Users in the pruned area can be pruned Users in the unpruned area will be verified by accessing significant facilities k=2 School of Computer Science and Engineering 11
Significant Facility Verification P Significant facility: A facility f that prunes at least one point p P lying inside the bounding arc of P. 2?? ?? ?? Verification for a candidate SLICE Regions-based ?? Issuing range query for each candidate Accessing significant facilities (O(k)) q Significant facility cannot be in red area High I/O cost No additional I/O cost School of Computer Science and Engineering 12
Theoretical Analyses Number of significant facilities I/O Cost Pruning phase: Same as circular range query centered at q with radius 2rB 2.34k ( 0) 9k ( = 60o) Verification phase: Same as circular range query centered at q with radius rB More analyses can be found in paper School of Computer Science and Engineering 13
Experiments Data Set : Synthetic data : Size:50000, 100000, 150000 or 200000 Distribution: Uniform or Normal Algorithms: Six-regions, InfZone and SLICE Page size 4KB and number of buffers for Six-regions is 10 Real data: The real data set consists of 175, 812 points in North America Number of partitions for SLICE is 12 School of Computer Science and Engineering 14
Experiments Effect of different values of k CPU I/O School of Computer Science and Engineering 15
Experiments Effect of data distribution Effect of % users School of Computer Science and Engineering 16
Experiments Effect of partitions Number of significant facilities Number of partitions Value of k School of Computer Science and Engineering 17
Thanks! Q&A