Reverse k Nearest Neighbors Query Processing: Experiments and Analysis

Slide Note

This study delves into the intricate process of reverse k-nearest neighbors query processing, featuring experiments and analysis by Muhammad Aamir Cheema, Xuemin Lin, Wei Wang, and Shiyu Yang. The research focuses on frameworks, algorithms like TPL++, and explores the motivations, contributions, and optimizations in the field. Dive deep into the realm of locating the k-closest facilities to a query facility, and uncover the applications in computation, marketing, clustering, outlier analysis, and decision support systems.

mali_7 Follow

Uploaded on Feb 15, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Reverse k Nearest Neighbors Query Processing: Experiments and Analysis , Muhammad Aamir Cheema , Xuemin Lin , Wei Wang Shiyu Yang The University of New South Wales, Australia Monash University, Australia

Outline Introduction Framework Motivation Algorithms TPL++ Experiments Conclusion 2

Introduction f1 Reverse k Nearest Neighbors Find every user u for which the query facility q is one of its k-closest facilities RNN of f1 is {u1,u2} RNN of f2 is {u3} Applications Influence computation, marketing, cluster and outlier analysis and decision support systems u2 u1 f2 u3 K=1 3

Framework Pruning Verification Six-regions (SIGMOD 2000) SLICE (ICDE 2014) Six-regions (Stanoi et al., UCSB, SIGMOD 2000) Region-based TPL (Tao et al., CUHK, VLDB 2004) FINCH (Wu et al., NUS, VLDB 2008) InfZone (Cheema et al., UNSW, ICDE2011) SLICE (Yang et al.,UNSW, ICDE 2014) Half-space TPL (VLDB 2004), TPL++ (VLDB2015) FINCH (VLDB 2008), InfZone (ICDE 2011) 4

Motivation and Contributions 10ms per I/O 400 Motivation 300 200 Reporting overall cost may be misleading 100 0 Buffer size affects the ranking Six TPL FINCH INF SLICE All in one comparison 0.1ms per I/O 27.5 Contributions 22 16.5 Comprehensive experiments 11 5.5 Improved TPL algorithm: TPL++ 0 Six TPL FINCH INF SLICE 5

Algorithms --TPL Half-space Pruning Property of perpendicular bisector b c TPL [Tao et al., CUHK, VLDB 2004] p Sort facilities according Hilbert value q Draw bisectors between facilities and query a d A point can be filtered if it is in a combination of the half-spaces of k facilities. u Pruning power <> Pruning cost k=2 6

Algorithms --TPL++ Optimisation 1: A point can be filtered if it is pruned by k facilities. Add a counter for point to be pruned. Iterate over facilities. O(mk) -> O(m) b c p q a e Optimisation 2: Include more facilities for pruning k=2 7

Effect of Optimisations I/O cost CPU cost (ms) 40 240 TPL TPL++ TPL TPL++ 30 180 20 120 10 60 0 0 2 5 10 15 20 25 2 5 10 15 20 25 k k 20 times better 2 times better 8

Algorithms Half-space Pruning Region-based Pruning SLICE FINCH InfZone Six-regions Yang et al., UNSW, ICDE 2014 Wu et al., NUS, VLDB 2008 Cheema et al., UNSW, ICDE 2011 Stanoi et al., UCSB, SIGMOD 2000 9

Experiments Setup Intel Xeon 2.66 GHz CPU, 4GB Memory and Hard disk Index: R*-tree 100 buffers I/O cost and CPU cost Average cost per query Data sets Three real data sets (up to 25M points) CA, LA and NA Synthetic data sets follows different distributions (up to 20M points) Source code and data sets are available online 10

Experiments Ranking Criteria 1st 2nd 3rd 4th 5th 6th I/O (no buffer) TPL++,InfZone SLICE TPL FINCH SIX I/O (small buffer) TPL++,InfZone FINCH SLICE TPL,SIX CPU (k<10) SLICE InfZone TPL++ FINCH SIX,TPL CPU (10<k<25) SLICE InfZone, TPL++ FINCH SIX TPL CPU (25<k<200) SLICE TPL++ SIX FINCH InfZone TPL Implementation SIX,SLICE TPL, TPL++ FINCH, InfZone 11

More in the paper Lower bound algorithm Complexity Six-regions TPL TPL++ FINCH InfZone SLICE Pruning node O(1) O(km) O(m) O(m) O(m) O(t) point O(1) O(km) O(m) O(logm) O(m) O(1) Add O(log k) O(logm) O(logm) O(m2) O(m2) O(tlogm) Verification node O(1) O(km) O(m) O(m) O(m) O(t) point O(1) O(km) O(m) O(logm) O(logm) O(1) Verify Cand O(logm) O(m) Range query Range query Range query Range query 12