Adaptive Processing for Distributed Skyline Queries
This study delves into adaptive processing for distributed skyline queries over uncertain data, focusing on the challenges of efficiency and progressiveness. The research addresses the growing importance of distributed environments in skyline query optimization, presenting a novel framework for enhancing the Distributed Skyline query over Uncertain Data (DSUD). By introducing an adaptive algorithm, significant improvements in efficiency and progressiveness are demonstrated through evaluation. The work contributes to the advancement of distributed skyline query techniques, offering insights for real-life applications in uncertain data scenarios.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Adaptive Processing for Distributed Skyline Queries over Uncertain Data Xu Zhou Kenli Li Yantao Zhou Keqin Li 1
Motivation Increased data uncertainty Increased distributed data storage systems More attention towards skyline queries over uncertain data in distributed environments(DSUD query) Also, DSUD(Distributed Skyline query over Uncertain Data) is a vital research topic with many potential real-life application 2
Problem Statement Many Research achievement on Uncertain Data. But Most of them focused on single and centralized storage database Lack adaptations or optimization specific to Distributed environment Distributed skyline queries are available but mostly in certain data DSUD query and enhanced-DSUD was first formulated by Ding and Jin[1] with minimized bandwidth consumption and progressiveness But DSUD query still needs to be improved in three aspects PROGRESSIVENESS, EFFICIENCY, AND UNIVERSALITY3
Contribution by Authors 1. Review DSUD query and summarize its objectives 2. Propose an improved framework for DSUD query with local site pruning 3. Present an adaptive ( ADSUD) algorithm based on IDSUD framework 4. Present evaluation of ADSUD algorithm which showed much better efficiency and progressiveness compared to e-DSUD. 4
Skyline query In a database, a Skyline = set of points which stand out among the others because are of special interest to us Skyline: those points which are not dominated by any other point. A point dominates another point if it is as good or better in all dimensions and better in at least one dimension 5
Uncertain Data - Data which either exist or doesn t exist in the real world - Has an existential probability value based on which we decide whether it exist or not in the real world - Here, world1 has existential probability of 0.001. - 0.001% chance it exists actual database - 99.99% chance it does not exist in the actual database. 6 Fig 1
DSUD Query At first, each local site computes its local skylines, respectively. Given two local skyline tuples t1 and t2 from local site S1and t1 t2, local skyline probability of tuple t2 is PrLSky(t2)= Pr(t2)(1- (Pr(t1)) t t2,t UDBi-t1(1-Pr(t)) Uncertain DataBase UDBk UDB2 7
DSUD Query In server H, let tuple t1 t1 in priority queue, then Local skyline probability of tuple t1 is refreshed as PrLSky(t1 )= PrLSky(t1 )(1-(Pr(t1)) t t1 ,t L-t1(1-Pr(t)) At the start of the second iteration,assume that tuple t2 is sent to H and t2 t1 . The approximate global probability of tuple t1 can be computed by Pr GSky(t1 )= PrLSky(t1 ) x (1- PrLSky(t1))2 x (1-Pr(t2)) x t L,t t1(1-Pr(t)) x t t2,t UDBi-t1(1-Pr(t)) x t t1 ,t UDBx L-t2[PrLSky(t)(1 - Pr (t) / Pr(t))] 8
DSUD Query Finally, DSUD query returns the tuples at H whose Global Skyline probabilities are not less than . Problem with DSUD It doesn t take Total Query Time into consideration Its progressiveness also has room to be improved 9
The Improved-DSUD(IDSUD) Framework GOAL: Minimize TQT(Total Query Time) & Perform better progressiveness for DSUD query Improvements: First, Query-Routing Phase Introduced. Includes: Site pruning in Query-Routing Phase Progressive Pruning at each local site Second, improved PR-tree (IPR-tree) To boost DSUD 10 Finally, in To-Server Phase, only one local site representative tuple each time.
Adaptive-DSUD(ADSUD) Algorithm Local Sorting Strategies t1 t2 Probability Decreases Old method to calculate GSky t3 effective to choose most dominant tuple Less effective for Local Skyline answers have dominant relationship If PrNewGSky(t) < , prune t and refresh the probabilities at H and arrange in descending order. New method to calculate approx. global skyline probability Let UGPrune be the set of unqualified tuples at H tN PrNewGSky(t)= PrLSky(t,UDBi) x t t[(1-Pr(t )) x Local Site 11 PrLSky(t ,UDBk) x t t 1/(1-Pr(t ))]
Minimum Probabilistic Bounding Rectangle(MPBR) Good skyline algorithm = minm. transfer of unqualified skyline points Therefore, selection of Local skyline algorithm is very important MBR of R-tree MPBR(their local algorithm) 1. Created usually by clustering the near points 1. Generated according to the probability threshold 2. Utilized in local-computation phase for improving pruning capacity 2. Used to help choosing the local multiple representative tuples & helps to gain the abstracted info for the site pruning in Query-Routing Phase 12
MPBR Definition: Set of tuples that satisfy the condition Prnonexist(BR) = tj BR(1-Pr(tj)) < MPBR-Dominance: Given two MPBRs BR= (tmin,tmax) and BR = (t min, t max), we have BR BR , if tmax t min.(tmin,tmaxare the minimum and maximum corner of BR) Lemma 1: Given a MPBR BR = (tmin,tmax) and a tuple t, if tmax t, then t can be safely pruned. Lemma 2: Given two MPBRs BR =(tmin, tmax) and BR =(t min, t max), if BR BR , then the tuples contained in BR can be safely pruned. 13
MPBR Constrained Space(MPCS) Definition: For a MPBR set BRS = {1 i |BRS||BRi= (timin, timax)} , its MPCS consists of the union of all the regions which are dominated by timax. Based on the property of MPCS: Lemma 3: Given a MPCS CS of a MPBR set BRS = {1 i |BRS||BRi= (timin, timax)} and a tuple t, if t CS, the tuple t can be safely pruned Lemma 4: Given a MPCS CS and a MPBR BR = (tmin,tmax), if BR CS, the tuples contained in BR can be safely pruned 14
The Local Algorithms - Reduced Local individual processing time = reduced TQT - State-of-the-art Centralized algorithm The Reuse Mechanism Reduces I/O operation Boosts performance Traditional Methods: Applies window query over R-tree to find skyline result each time 15 Results in visiting same node many times(larger I/O operation)
Prnonexist(PE3) x Prnonexist(PE4) max(Prmax(PE3),Prmax(PE4)) The IPR-Tree Indices are built on UDB to improve Query Efficiency by reducing the Processing time In this solution, IPR-Tree is used Existential Probabilities Skyline Probability of tuple depends on: Its existential probability Fig: Example of an IPR-Tree Non-existential probability of each entry that dominates it 16
The LUSQ ALgorithm Computes Skyline at each local site Uses Progressive pruning Strategy to reduce the Search Space Two Phases: Pruning & Refining Pruning: Uses different Lemma for pruning the tuples: Lemma 5 Given an entity E, if Prmax(E) < , then tuples within E can be safely pruned. Lemma 6 Given two entries Eiand Ej, if Ei Ejand Prmax(Ej) Prnonexist(Ei) < , the tuples contained in Ejcan be safely pruned. Lemma 7. Given an entry E, in case PrUBLSky(E) < , tuples within E can be safely pruned. And others. 17
The GUSQ Algorithm Uncertain database UDB0 - used to find global candidate skylines 18
Conclusion To accelerate DSUD, improved DSUD framework and new algorithm ADSUD In ADSUD, several efficient technologies: IPR-Tree Reuse Technology MPBR Collecting global abstract information Selecting local representative tuples 22