XQuery Workload Materialized View Selection

materialized view selection for xquery workloads n.w
1 / 39
Embed
Share

Learn about materialized view selection for XQuery workloads, including problem definitions, algorithms, and experimental evaluation. Explore how to minimize workload evaluation costs within a space budget and the anatomy of query and view languages.

  • XQuery
  • Materialized View
  • Workloads
  • Query Language
  • Database

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Materialized View Selection for XQuery Workloads Asterios Katsifodimos1, Ioana Manolescu1& Vasilis Vassalos2 1Inria Saclay & Universit Paris-Sud, 2Athens University of Economics and Business Athens University of Economics and Business

  2. View selection in XML databases Problem definition Find a set of materialized views that minimizes workload evaluation costs not exceeding a space budget. 2 Materialized View Selection for XQuery Workloads

  3. Materialized View Selection for XQuery Workloads Contributions View selection for multiple-views XQuery rewriting Rich subset of XQuery Tree patterns with multiple return nodes and value joins We provide Candidate view pruning methods View selection algorithms: Utility-Based Greedy (UDG) Reduce-Optimize Algorithm (ROA) Extensive experimental evaluation Outperforming & extending state-of-the-art works 3 Materialized View Selection for XQuery Workloads

  4. Outline The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation - 4 Materialized View Selection for XQuery Workloads

  5. Query and view language Anatomy of a query ID of book book paper ID author conference[= SIGMOD ] textcont authorval cont=subtree of the text element Value-join Return the ID of every book along with its text and author if the book author has a paper in the SIGMOD conference. 5 Materialized View Selection for XQuery Workloads

  6. Candidate Views Rewriting Query Candidate Views PROJECT [textcont, authorval] Example: v1 v2 JOIN book [v1.authorID>v2.bookID] authorID,val bookID textcont authorval textID,cont SCAN(v1) SCAN(v2) Candidate views: views that can participate in a rewriting of a query. Property: candidate views are exactly those embeddable in a query. 6 Materialized View Selection for XQuery Workloads

  7. Candidate Views Number of candidate views k tpi-1 2m 25 For query of m value joins and k tree patterns: i=1 Early pruning is needed Rules of thumb for pruning Drop all views that can be replaced by others Views should not store anything extraneous Challenge: remove maximum number of views Preserve low cost and/or small size rewriting possibilities. 7 Materialized View Selection for XQuery Workloads

  8. Candidate Views Pruning techniques Query Candidate Views Do not store unnecessary data Annotate all nodes with ID i.e. useless cont, val or //-axis Maximize rewriting opportunities Avoid expensive rewritings Save space v2 v2 v1 v1 authorID,val authorID, cont book book bookID authorval textcont authorID authorID v3 v3 bookID bookID textID,cont textID,cont Materialized View Selection for XQuery Workloads 8

  9. Outline The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation - 9 Materialized View Selection for XQuery Workloads

  10. Materializing a set of views View set benefit Benefit of materializing a set of views benefit (V, Q)=(cost of evaluating Q over D) (cost of evaluating Q over V) Computation of benefit requires invoking rewriting algorithm Expensive! Space occupancy of a view set V Total size (in bytes) 10 Materialized View Selection for XQuery Workloads

  11. View Selection Algorithms Knapsack-inspired view selection High similarity with the classic 0-1 knapsack problem Knapsack View Selection Weight View Size Profit Benefit (evaluation cost savings) Typical element of the greedy algorithms for knapsack: utility(v,Q)=benefit({v} U V, Q)/size(v) 11 Materialized View Selection for XQuery Workloads

  12. View Selection Algorithms S=12 Utility-Driven Greedy (UDG) Algorithm 1. Enumerate candidate views 2. Compute view utilities 3. Order views by utility 4. Select the view of largest utility fitting in budget U=60 S=5 U=50 S=4 U=10 S=7 U=8 S=2 5. Repeat 2-4 until budget exhausted Candidate Views Space Budget U=Utility(=benefit/size) S=Space occupancy 12 Materialized View Selection for XQuery Workloads

  13. View Selection Algorithms S=12 Utility-Driven Greedy (UDG) Algorithm 1. Enumerate candidates 2. Compute utilities 3. Order by utility 4. Select the view of largest utility fitting in budget U=40 S=5 U=64 S=4 U=12 S=7 U=9 S=2 5. Repeat 2-4 until budget exhausted Candidate Views Space Budget U=Utility(=benefit/size) S=Space occupancy 13 Materialized View Selection for XQuery Workloads

  14. View Selection Algorithms S=12 Utility-Driven Greedy (UDG) Algorithm 1. Enumerate candidates Greedy algorithms for knapsack not a perfect fit for our problem U=64 S=4 2. Compute utilities 3. Order by utility Utility of a view may change after every round depends on other views already selected 4. Select the view of largest utility fitting in budget U=10 S=5 U=13 S=7 U=4 S=2 5. Repeat 2-4 until budget exhausted Candidate Views Space Budget U=Utility(=benefit/size) S=Space occupancy 14 Materialized View Selection for XQuery Workloads

  15. View Selection Algorithms State space search (state=candidate view set) Initial state: query workload S1 transform(S1) S8 S3 S6 S7 S8 S4 S5 S9 S15 S16 S10 S14 Best state: largest benefit under space budget S11 S12 S13 15 Materialized View Selection for XQuery Workloads

  16. View Selection Algorithms State Transformations: Break, Join, Generalize, Adapt View Break: break a view in smaller parts Reveals common sub-expressions of views Can reduce or increase space occupancy Increases query evaluation costs book paper conference[= SIGMOD ] author textcont authorval 16 Materialized View Selection for XQuery Workloads

  17. View Selection Algorithms State Transformations: Break, Join, Generalize, Adapt Join: opposite to Break, join two views into one Reduces evaluation costs Joined views can be smaller in size book paper ID conference[= SIGMOD ] author textcont authorval val ,ID 17 Materialized View Selection for XQuery Workloads

  18. View Selection Algorithms State Transformations: Break, Join, Generalize, Adapt Generalize: generalization/relaxation of a view Reveals common sub-expressions of views Can reduce or increase space occupancy Increases query evaluation costs book paper cont textcont authorval conference author val [= SIGMOD ] val 18 Materialized View Selection for XQuery Workloads

  19. View Selection Algorithms State Transformations: Break, Join, Generalize, Adapt Adapt: specialization of views by 1. Conversion of //-axis to /-axis 2. Addition of existential nodes Reduces evaluation costs Adapted views can be smaller in size book paper cont text author conference author val [= SIGMOD ] Break, Join, Generalize, Adapt Allow to generate all states Guaranteed not to generate pruned views 19 Materialized View Selection for XQuery Workloads

  20. View Selection Algorithms The Reduce-Optimize algorithm (ROA) Huge number of states Call rewriting algorithm after every state transition Need for heuristics Proposal: heuristic three-phase algorithm ROA Reduce Jump Optimize 20 Materialized View Selection for XQuery Workloads

  21. View Selection Algorithms The Reduce-Optimize algorithm (ROA) Intermediary State Best State Solution Revisited State Space Occupancy Space Budget Time Benefit Time Reduce Reduce Reduce ... Optimize Jump Optimize 21 Materialized View Selection for XQuery Workloads

  22. View Selection Algorithms Reducing ROA search time - heuristics 1. Some transitions may apply several transformations at once 2. Stop the rewriting algorithm early After k rewritings found or At a timeout 3. Consider only the lowest cost rewritings 22 Materialized View Selection for XQuery Workloads

  23. Outline The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation - 23 Materialized View Selection for XQuery Workloads

  24. Related Work Algorithm Rewriting power [Mandhani, Suciu VLDB05] 1-view rewritings [Tang et. al. DASFAA09] 1-view rewritings Utility-Driven Greedy Multiple view rewritings Reduce-Optimize Multiple view rewritings 24 Materialized View Selection for XQuery Workloads

  25. Experimental Evaluation Settings Space budget Queries S=size(Q) Workloads Tested space budgets: Tree patterns: Q1(14), Q2(50), Q3(100) Tree patterns + joins: Q4(50), 20% joins Query Selectivity S, S/2, S/4, S/6 Algorithms UDG and ROA low, medium, high Competitors: Database: [Mandhani & Suciu VLDB05] 1GB XMark (10x100MB documents) [Tang et al. DASFAA09] Implementation ViP2P*, Java *http://vip2p.saclay.inria.fr 25 Materialized View Selection for XQuery Workloads

  26. Experimental Evaluation Workload Evaluation Time of Q1(14 queries) 100 60 Evaluation time versus docs evaluating queries from docs % of evaluation time versus 80 % of queries answered 50 Hit Ratio using views 40 60 30 40 20 20 10 0 0 S/6 S/4 S/2 S S/6 S/4 S/2 S Reduce-Optimize (ROA) Space/Time Greedy [Tang et al. DASFAA09] Set-Cover Greedy [Mandhani & Suciu VLDB05] Utility-Driven Greedy (UDG) Space Optimal [Tang et al. DASFAA09] 26 Materialized View Selection for XQuery Workloads

  27. Experimental Evaluation Evaluation Time & hit ratio for Q3(100 queries) 90 100 Evaluation time versus docs 80 evaluating queries from docs % of evaluation time versus 80 % of queries answered 70 60 using views Hit Ratio 60 50 40 40 30 20 20 10 0 0 S/6 S/4 S/2 S S/6 S/4 S/2 S Reduce-Optimize (ROA) Set-Cover Greedy [Mandhani & Suciu VLDB05] 27 Materialized View Selection for XQuery Workloads

  28. Experimental Evaluation ROA evaluation for Q4(50 queries, 20% value-joined) 100 80 60 40 20 0 S/6 S/4 S/2 S % of evaluation time vs. documents Hit Ratio 28 Materialized View Selection for XQuery Workloads

  29. Conclusions Automatic selection of XQuery views for multiple-views rewritings Reduction of candidate views By orders of magnitude ROA performs better than related work Scales and manages to find good solutions relatively fast 80% of the benefits attained in ~2 minutes Maximum benefit attained within 25 minutes. Algorithms of [Tang et. al. DASFAA09] did not scale beyond 14 queries Utility Drive Greedy (UDG) did not scale beyond 50 queries 29 Materialized View Selection for XQuery Workloads

  30. Thank you Questions? ? - 30

  31. BACKUP - 31 Materialized View Selection for XQuery Workloads

  32. Cost of algebraic plans Estimating the evaluation cost of a rewriting Data Statistics Algebraic Plan cost DataGuide of every document Execution cost of an operator has Enriched with information: A CPU execution cost and # of instances of a path An IO cost Average path val size (bytes) Both depend on input Average path cont size (bytes) Evaluation cost of a plan: Distinct values of each path Calculated bottom-up Used to estimate Cardinality & size of a view 32 Materialized View Selection for XQuery Workloads

  33. Cost of algebraic plans Cost estimation example OUTPUT=25 View Size Cardinality IO=600 | CPU=320+25 v1 v2 500KB 50 PROJECT [textcont, authorval] 100KB 10 OUTPUT=25 (50*5*0.1) IO=500+100 | CPU=70+50*5 JOIN [v1.author=v2.author] OUTPUT=50 OUTPUT=5 IO= 500 | CPU=50 IO=100 | CPU=10+10 SCAN(v1) SELECT [conference= SIGMOD ] OUTPUT=10 IO=100 | CPU=10 SCAN(v2) 33 Materialized View Selection for XQuery Workloads

  34. Experimental Evaluation ROA time to attain increasing benefits (minutes) 34 Materialized View Selection for XQuery Workloads

  35. Experimental Evaluation Candidate views pruning 16 Number of candidate views 14 12 (log10-scale) 10 8 6 4 2 Q1 Q2 Q3 Q4 max min CS1 CS2 CS0 CS0 CS0max CS0min CS1 CS2 Maximum estimated number of candidate views Minimum estimated number of candidate views Pruned candidate view set Pruned candidate view set only linear path candidates 35 Materialized View Selection for XQuery Workloads

  36. Candidate Views Size of the set of candidate views for a tree pattern The cardinality of the set of candidate views of a tree pattern query q of |q| nodes is bounded by: q |q| q-1 CS0(q)= 2k 12k=25 k k=1 Example: q=/a/bval/c Combinations of nodes of q: ({a},{b},{c},{a,b},{a,c},{a ,b,c}) Edge combinations: how to connect nodes with (/, //) e.g. /a/b, //a/b, /a//b, //a//b}. There are 12 return node variations for each node in a pattern e.g. (aID,cont,aval,aID,val ) 36 Materialized View Selection for XQuery Workloads

  37. Candidate Views Size of the set of candidate views for a joined pattern Given a joined pattern q with: k tree patterns and m value-joins The candidate view set size of q is bounded by: k 2m CS0(tpi) i=1 Number of views resulting from all possible cartesian products of k tree patterns Value join combinations 37 Materialized View Selection for XQuery Workloads

  38. View Selection Algorithms Benefit of materializing a set of views The benefit of materializing a view set V is The difference in cost of evaluating the workload over V vs. evaluating from the documents benefit(V,Q)= (fq (cost(q| )-cost(q|V))) q Q Cost of evaluating query q given the set of materialized views V Cost of evaluating query q from the documents Frequency of query q 38 Materialized View Selection for XQuery Workloads

  39. Tree Pattern query of |q| nodes Joined Pattern query with m value joins & k tree patterns k tpi-1 2m q-1 25 25 i=1 39 Materialized View Selection for XQuery Workloads

More Related Content