
Improving Database Query Optimization: Cost Estimation and Logic Plan Enhancement
Enhance database query optimization by estimating cost, implementing efficient algorithms, and optimizing logic plans via relation size. Learn key steps like collecting size parameters, setting up estimation rules, and converting logic queries to minimize intermediate relation sizes. Understand how to calculate size parameters precisely and optimize queries using algorithms and cost considerations for better performance.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 21: Cost Estimation for Query Plans II
Query Optimization An input database program P Prepare a collection C of efficient algorithms for operations in relational algebra; parser View processing, Semantic checking parse tree preprocessing parse tree parse tree-lqp convertor logic query plan push selections, group joins apply logic laws logic query plan reduce the size of intermediate results Optimization via logic and size logic query plan Lqp-pqp convertor physical query plan take care of issues in optimization and security. choices of algorithms, data structures, and computational modes Optimization via algorithms and cost Machine executable code
Improving logic plan via relation size Major Steps: 1. Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) 2. Set up estimation rules for size parameters on relational algebraic operators; 3. Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations. 150 500 1500 R S S R 2000 5000 5000 2000
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) }
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A)
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) Each value of the attribute A takes about 1/V(R,A) of the tuples.
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) Each value of the attribute A takes about 1/V(R,A) of the tuples.
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 In average, about one half of the tuples have their values of A smaller than a given c. So T( A<c(R)) = T(R)/2 ? But practically, one may be more interested in a smaller fraction of the tuples. So T( A<c(R)) = T(R)/3 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 In average, about one half of the tuples have their values of A smaller than a given c. So T( A<c(R)) = T(R)/2 ? But practically, one may be more interested in a smaller fraction of the tuples. So T( A<c(R)) = T(R)/3 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 In average, about one half of the tuples have their values of A smaller than a given c. So T( A<c(R)) = T(R)/2 ? But practically, one may be more interested in a smaller fraction of the tuples. So T( A<c(R)) = T(R)/3 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 In average, about one half of the tuples have their values of A smaller than a given c. So T( A<c(R)) = T(R)/2 ? But practically, one may be more interested in a smaller fraction of the tuples. So T( A<c(R)) = T(R)/3 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 In average, about one half of the tuples have their values of A smaller than a given c. So T( A<c(R)) = T(R)/2 ? But practically, one may be more interested in a smaller fraction of the tuples. So T( A<c(R)) = T(R)/3 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is small)
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) Assuming T(R) T(S). T(R S) can be as large as T(S), and can be as small as 0. So in average, take T(R S) = T(S)/2?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) Assuming T(R) T(S). T(R S) can be as large as T(S), and can be as small as 0. So in average, take T(R S) = T(S)/2?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) Assuming T(R) T(S). T(R S) can be as large as T(S), and can be as small as 0. So in average, take T(R S) = T(S)/2?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) Assuming T(R) T(S). T(R S) can be as large as T(S), and can be as small as 0. So in average, take T(R S) = T(S)/2?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R) + T(S)/2 (assume S is small)
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(S). So in average, take T(R U S) = (T(R)+T(S))/2 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = (T(R)+T(S))/2 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = (T(R)+T(S))/2 (assume S is smaller) Assuming T(R) T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = T(R)+T(S)/2 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) Assuming T(R) T(S). T(R U S) can be as large as T(R)+T(S), and as small as T(R). So in average, take T(R U S) = T(R)+T(S)/2 ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 T(R S) can be as large as T(R), and as small as 0. So take T(R S) = T(R)/2 ? S has no impact ! So take T(R S) = T(R) max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 T(R S) can be as large as T(R), and as small as 0. So take T(R S) = T(R)/2 ? S has no impact ! So take T(R S) = T(R) max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 T(R S) can be as large as T(R), and as small as 0. So take T(R S) = T(R)/2 ? S has no impact ! So take T(R S) = T(R) max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 T(R S) can be as large as T(R), and as small as 0. So take T(R S) = T(R)/2 ? S has no impact ! So take T(R S) = T(R) max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) max{T(R)/2, T(S)/2} T(R S) can be as large as T(R), and as small as 0. So take T(R S) = T(R)/2 ? S has no impact ! So take T(R S) = T(R) max{T(R)/2, T(S)/2} ?
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S))/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} Assumption. If V(R,A) > V(S,A), then every A-value in S will appear in R. Thus, for a tuple t with an A-value a in S, in average a appears T(R)/V(R,A) times in R. Thus, t can be joined with T(R)/V(R,A) tuples in R. This gives T(R S) = T(R)T(S)/max{V(R,A),V(S,A)}
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T( C(R S))
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T( C(R S))
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T( C(R S))
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T( C(R S))
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T( C(R S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T( C(R S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) = min{ T(R)/2, AV(R,A) } : T( (R)) = min{ T(R)/2, grouping AV(R,A) } A=c: T( A=c(R)) = T(R)/V(R,A) A<c: T( A<c(R)) = T(R)/3 : T(R S) = T(S)/2 (assume S is smaller) U : T(R U S) = T(R)+T(S)/2 (assume S is smaller) : T(R S) = T(R) T(S)/2 : T(R S) = T(R)T(S)/max{V(R,A),V(S,A)} C : T(R CS) = T( C(R S)) Assume that the tables are stored in a clustered way. If we know the schemas of relations R and S, we will also know the schema of the relation W obtained by applying an operation on R and/or S, from which we know how much space a tuple in W will take. Therefore, the value B(W) can be computed from the value T(W). B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) : T( (R)) A=c: T( A=c(R)) A<c: T( A<c(R)) : T(R S) U : T(R U S) : T(R S) : T(R S) C : T(R CS) B( (R)) B( (R)) B( A=c(R)) B( A<c(R)) B(R S) B(R U S) B(R S) B(R S) B(R CS) B(W) = T(W)/#tuples-per-block
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : T( (R)) : T( (R)) A=c: T( A=c(R)) A<c: T( A<c(R)) : T(R S) U : T(R U S) : T(R S) : T(R S) C : T(R CS) B( (R)) B( (R)) B( A=c(R)) B( A<c(R)) B(R S) B(R U S) B(R S) B(R S) B(R CS)
Estimating size parameters (T,B,V) Similar to that for the parameter T
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : V( (R),A) = V(R,A) : V( (R),A) = V((R,A) (A is a grouping attribute) A=c: V( A=c(R),B) = V(R,B), V( A=c(R),A) = 1 A<c: V( A<c(R),B) = V(R,B), V( A<c(R),A) = V(R,A)/3 : V(R S,A) = V(S,A)/2 (assume V(R,A) V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) V(S,A)) : V(R S,A) = V(R,A) max{V(R,A)/2,V(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V( C(R S),A) Similar to that for the parameter T
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : V( (R),A) = V(R,A) : V( (R),A) = V((R,A) (A is a grouping attribute) A=c: V( A=c(R),B) = V(R,B), V( A=c(R),A) = 1 A<c: V( A<c(R),B) = V(R,B), V( A<c(R),A) = V(R,A)/3 : V(R S,A) = V(S,A)/2 (assume V(R,A) V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) V(S,A)) : V(R S,A) = V(R,A) max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V( C(R S),A) Containment Law: if V(R,A) > V(S,A), then all A-values in S are in R
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : V( (R),A) = V(R,A) : V( (R),A) = V((R,A) (A is a grouping attribute) A=c: V( A=c(R),B) = V(R,B), V( A=c(R),A) = 1 A<c: V( A<c(R),B) = V(R,B), V( A<c(R),A) = V(R,A)/3 : V(R S,A) = V(S,A)/2 (assume V(R,A) V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) V(S,A)) : V(R S,A) = V(R,A) max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V( C(R S),A) Preservation Law: if attribute A is not involved in the operation, then the # of A-values is unchanged.
Estimating size parameters (T,B,V) , , : size parameters can be calculated precisely : V( (R),A) = V(R,A) : V( (R),A) = V((R,A) (A is a grouping attribute) A=c: V( A=c(R),B) = V(R,B), V( A=c(R),A) = 1 A<c: V( A<c(R),B) = V(R,B), V( A<c(R),A) = V(R,A)/3 : V(R S,A) = V(S,A)/2 (assume V(R,A) V(S,A)) U : V(RUS,A) = V(R,A)+V(S,A)/2 (assume V(R,A) V(S,A)) : V(R S,A) = V(R,A) max{R(R,A)/2,T(S,A)/2} : V(R S,A) = min{V(R,A),V(S,A)} (A is a shared attribute) V(R S,A) = max{V(R,A),V(S,A)} (A is non-shared) C: V(R CS,A) = V( C(R S),A) The formulas for set/bag operations may depend on applications.
Improving logic plan via relation size Major Steps: 1. Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) 2. Set up estimation rules for size parameters on relational algebraic operators; 3. Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.
Improving logic plan via relation size Major Steps: 1. Collect size parameters for stored relations: T(R), B(R), V(R,A) (the # of different values on attribute A) 2. Set up estimation rules for size parameters on relational algebraic operators; 3. Using logic laws to convert a logic query into the one that minimizes the (estimated) sizes of intermediate relations.