Query Processing in Relational Algebra: Transforming, Estimating Costs, and Generating Plans

Query Processing in Relational Algebra: Transforming, Estimating Costs, and Generating Plans
Slide Note
Embed
Share

This content delves into the intricacies of query processing at the relational algebra level. It covers transformations, estimation of costs, detailed query plan generation, and plan comparison. The process involves estimating the cost and size of query plans, calculating the number of IO operations, and evaluating result sizes using statistical data. Examples and detailed explanations aid in understanding the methods used for these estimations and analyses.

  • Query Processing
  • Relational Algebra
  • Cost Estimation
  • Query Plans
  • Database Management

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Outline - Query Processing Relational algebra level transformations good transformations Detailed query plan level estimate costs generate and compare plans 1

  2. Estimating cost of query plan (1) Estimating size of results (2) Estimating # of IOs 2

  3. Estimating result size Keep statistics for relation R - T(R) : # tuples in R - L(R) : # of bytes in each R tuple - B(R): # of blocks to hold all R tuples - V(R, A) : # distinct values in R for attribute A - b: block size - bf(R) (blocking factor): # of tuples in a block bf(R) = b/L(R) 3

  4. Example R bat 1 A: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string A B C 10 20 30 40 50 D a b a c d cat 1 cat 1 dog 1 dog 1 T(R) = 5 L(R) = 37 V(R,A) = 3 V(R,B) = 1 V(R,C) = 5 V(R,D) = 4 4

  5. Size estimates for W = R x S T(W) = T(R) T(S) L(W) = L(R) + L(S) bf(W) = b/(L(R)+L(S)) B(W) = T(R)*T(S)/bf(W) = = T(R)*T(S)*L(S)/b + T(S)*T(R)*L(R)/b = = T(R)*T(S)/bf(S) + T(S)*T(R)/bf(R) = = T(R)*B(S) + T(S)*B(R) 5

  6. Size estimate for W = A=a(R) L(W) = L(R) T(W) = ? 6

  7. Example R 40 50 V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 A B C 10 20 30 D a b a c d cat 1 cat 1 dog 1 dog 1 bat 1 T(R) V(R,Z) W = z=val(R) T(W) = 7

  8. Selection cardinality SC(R,A) = average # records that satisfy equality condition on R.A SC(R,A) = T(R) / V(R,A) 8

  9. What about W = z val (R) ? T(W) = ? Solution # 1: T(W) = T(R)/2 Solution # 2: T(W) = T(R)/3 9

  10. Solution # 3: Estimate values in range Example R Z Min=1 V(R,Z)=10 W= z 15 (R) Max=20 f = 20-15+1 = 6 (fraction of range) 20-1+1 20 T(W) = f T(R) 10

  11. Equivalently: f V(R,Z) = fraction of distinct values T(W)= [f V(Z,R)] T(R)= f T(R) V(Z,R) 11

  12. Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2 Case 1 X Y = Same as R1 x R2 12

  13. W = R1 R2 X Y = A Case 2 R1 A B C R2 A D Assumption: V(R1,A) V(R2,A) Every A value in R1 is in R2 V(R2,A) V(R1,A) Every A value in R2 is in R1 containment of value sets 13

  14. Computing T(W)when V(R1,A) V(R2,A) R1 A B C R2 A D Take 1 tuple Match 1 tuple matches with T(R2)tuples... V(R2,A) so T(W) = T(R2) T(R1) V(R2, A) 14

  15. V(R1,A) V(R2,A) T(W) = T(R2) T(R1) V(R2,A) V(R2,A) V(R1,A) T(W) = T(R2) T(R1) V(R1,A) [A is common attribute] 15

  16. In general W = R1 R2 T(W) = T(R2) T(R1) max{ V(R1,A), V(R2,A) } 16

  17. Size Estimation Summary (1/2) A=v(R) SC(R,A) (--> SC(R,A) = T(R)/ V(R,A)) min( R , ) v A R A v(R) ( ) * T R max( , ) min( , ) A A R 1 2 ... n(R) multiplying probabilities ( ) * [( ( )) * ( ( )) * ...( ( ))] T R sc T R sc T R sc T R 1 2 n 1 2v... n(R) probability that a record satisfy none of : [( 1 ( )) 1 ( * ( )) * . .. * 1 ( ( ))] sc T R sc T R sc T R 1 2 n ( ) 1 ( * [( 1 ( )) 1 ( * ( )) * . .. * 1 ( ( ))] ) T R sc T R sc T R sc T R 1 2 n 17

  18. Size Estimation Summary(2/2) R x S T(RxS) = T(R)*T(S) R S R S = : T(R) * T(S) R S key for R: maximum output size is T(S) R S foreign key for R: T(S) R S = {A}, neither key of R nor S T(R) * T(S)/ V(S,A) T(R) * T(S) / V(R,A) 18

  19. A Note on Histograms 40 number of tuples in R with A value in given range 30 20 10 20 30 10 40 A=val(R) = ? 19

  20. Summary Estimating size of results is an art Don t forget: Statistics must be kept up to date (cost?) 20

Related


More Related Content