
Distributed Database Design and Query Processing Techniques
Explore the intricacies of designing distributed databases, integrating databases, controlling semantic data, and processing queries efficiently. Learn about query optimization, distributed transaction management, data replication, parallel database systems, and more in the realm of distributed data management.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control Distributed Query Processing Overview Query decomposition and localization Distributed query optimization Multidatabase query processing Distributed Transaction Management Data Replication Parallel Database Systems Distributed Object DBMS Peer-to-Peer Data Management Web Data Management Current Issues
2 Query Optimization Issues Replicated Fragments A distributed relation is usually divided into relation fragments. Localization: Distributed queries expressed on global relations are mapped into queries on physical fragments of relations by translating relations into fragments.
3 Distributed Query Processing Methodology Calculus Query on Distributed Relations Query Decomposition GLOBAL SCHEMA Algebraic Query on Distributed Relations CONTROL SITE Data FRAGMENT SCHEMA Localization Fragment Query Global Optimization STATS ON FRAGMENTS Optimized Fragment Query with Communication Operations Local LOCAL SCHEMAS LOCAL SITES Optimization Optimized Local Queries
4 Step 1 Query Decomposition Input : Calculus query on global relations Process: Normalization manipulate query quantifiers and qualification Analysis detect and reject incorrect queries possible for only a subset of relational calculus Simplification eliminate redundant predicates Restructuring calculus query algebraic query NOTE: More than one translation may be possible. use transformation rules 1) 2) 3) 4)
5 Normalization Lexical and syntactic analysis check validity (similar to compilers) check for attributes and relations type checking on the qualification Put into normal form Conjunctive normal form (p11 p12 p1n) (pm1 pm2 pmn) Disjunctive normal form (p11 p12 p1n) (pm1 pm2 pmn) OR's mapped into union AND's mapped into join or selection
6 Analysis Refute incorrect queries Type incorrect If any of its attribute or relation names are not defined in the global schema If operations are applied to attributes of the wrong type Semantically incorrect Components do not contribute in any way to the generation of the result Only a subset of relational calculus queries can be tested for correctness Those that do not contain disjunction and negation To detect connection graph (query graph) join graph
7 Analysis Example SELECT ENAME,RESP FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND PNAME = "CAD/CAM" AND DUR 36 AND TITLE = "Programmer" Query graph Join graph DUR 36 ASG ASG EMP.ENO=ASG.ENO ASG.PNO=PROJ.PNO ASG.PNO=PROJ.PNO EMP.ENO=ASG.ENO PROJ EMP TITLE = Programmer PROJ EMP RESP ENAME RESULT PNAME= CAD/CAM
8 Analysis If the query graph is not connected, the query may be wrong or use Cartesian product. SELECT ENAME,RESP FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND PNAME = "CAD/CAM" AND DUR > 36 AND TITLE = "Programmer" ASG EMP PROJ RESP ENAME RESULT PNAME= CAD/CAM
9 Simplification Why simplify? Remember the example How? Use transformation rules Elimination of redundancy idempotency rules Application of transitivity Use of integrity rules
10 Simplification Example SELECT FROM WHERE OR AND OR AND TITLE EMP EMP.ENAME = "J. Doe" (NOT(EMP.TITLE = "Programmer") (EMP.TITLE = "Programmer" EMP.TITLE = "Elect. Eng.") NOT(EMP.TITLE = "Elect. Eng.")) SELECT FROM WHERE TITLE EMP EMP.ENAME = "J. Doe"
11 Restructuring Convert relational calculus (declarative) to relational algebra (procedural) Make use of query trees Example query: Find the names of employees other than J. Doe who worked on the CAD/CAM project for either 1 or 2 years. SELECT ENAME FROM EMP, ASG, PROJ WHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND ENAME "J. Doe" AND PNAME = "CAD/CAM" AND (DUR = 12 OR DUR = 24) ENAME Project DUR=12 OR DUR=24 PNAME= CAD/CAM Select ENAME J. DOE PNO ENO Join PROJ ASG EMP
12 Restructuring Transformation Rules Commutativity of binary operations R S S R R S S R R S S R Associativity of binary operations ( R S) T R (S T) (R S) T R (S T) Idempotence of unary operations A ( A (R)) A (R) p1(A1)( p2(A2)(R)) p1(A1) p2(A2)(R) where R[A] and A' A, A" A and A' A" Commuting selection with projection
13 Restructuring Transformation Rules Commuting selection with binary operations p(A)(R S) ( p(A) (R)) S p(Ai)(R (Aj,Bk)S) ( p(Ai) (R)) (Aj,Bk)S p(Ai)(R T) p(Ai) (R) p(Ai) (T) where Ai belongs to R and T Commuting projection with binary operations C(R S) A (R) B (S) C(R (Aj,Bk)S) A (R) (Aj,Bk) B (S) C(R S) C(R) C(S) where R[A] and S[B]; C = A' B' where A' A, B' B
14 Example ENAME Recall the previous example: Project Find the names of employees other than J. Doe who worked on the CAD/CAM project for either one or two years. DUR=12 DUR=24 PNAME= CAD/CAM Select SELECT ENAME ENAME J. DOE FROM PROJ, ASG, EMP WHERE ASG.ENO=EMP.ENO PNO AND ASG.PNO=PROJ.PNO AND ENAME "J. Doe" ENO Join AND PROJ.PNAME="CAD/CAM" AND (DUR=12 OR DUR=24) PROJ ASG EMP
15 Equivalent Queries ENAME PNAME= CAD/CAM (DUR=12 DUR=24) ENAME J. Doe PNO,ENO ASG EMP PROJ
16 Restructuring ENAME PNO PNO,ENAME ENO PNO PNO,ENO PNO,ENAME PNAME = "CAD/CAM" DUR =12 DUR=24 ENAME "J. Doe" PROJ ASG EMP
17 Step 2 Data Localization Input: Algebraic query on distributed relations Process: Determine which fragments are involved Localization program substitute for each global query its materialization program optimize
18 Example Assuming ENAME EMP is fragmented into EMP1, EMP2, and EMP3 as follows: DUR=12 DUR=24 EMP1= ENO E3 (EMP) PNAME= CAD/CAM EMP2= E3 <ENO E6 (EMP) EMP3= ENO E6 (EMP) ASG fragmented into ASG1 and ASG2 as follows: ENAME J. DOE PNO ASG1= ENO E3 (ASG) ENO ASG2= ENO> E3 (ASG) PROJ Replace EMP by (EMP1 EMP2 EMP3) and ASG by (ASG1 ASG2) in any query. EMP1 EMP2 EMP3 ASG1 ASG2
19 Provides Parallellism ENO ENO ENO ENO EMP1 ASG1 EMP2 ASG2 EMP3 ASG1 EMP3 ASG2
20 Eliminates Unnecessary Work ENO ENO ENO EMP1 ASG1 EMP2 ASG2 EMP3 ASG2
21 Reduction for Primary Horizontal Fragmentation (PHF) Reduction with selection Relation R and FR={R1, R2, , Rw} where Rj= pj(R) pi(Rj)= if x in R: (pi(x) pj(x)) Example SELECT * FROM WHERE ENO="E5" EMP ENO= E5 ENO= E5 EMP1 EMP2 EMP3 EMP2
22 Reduction for PHF Reduction with join Possible if fragmentation is done on join attribute Distribute join over union (R1 R2) S (R1 S) (R2 S) Given Ri = pi(R) and Rj = pj(R) Ri Rj = if x in Ri, y in Rj: (pi(x) pj(y))
23 Reduction for PHF ENO Assume EMP is fragmented as before and ASG1: ENO "E3"(ASG) ASG2: ENO > "E3"(ASG) Consider the query EMP1EMP2EMP3 ASG1 ASG2 SELECT * FROM WHERE EMP.ENO=ASG.ENO EMP,ASG Distribute join over unions Apply the reduction rule ENO ENO ENO EMP1 ASG1EMP2 ASG2EMP3 ASG2
24 Reduction for VF Find useless (not empty) intermediate relations Relation R defined over attributes A = {A1, ..., An} vertically fragmented as Ri = A'(R) where A' A: D,K(Ri) is useless if the set of projection attributes D is not in A' Example: EMP1= ENO,ENAME (EMP); EMP2= ENO,TITLE (EMP) SELECT ENAME FROM ENAME EMP ENAME ENO EMP1 EMP1 EMP2
25 Reduction for DHF Rule : Distribute joins over unions Apply the join reduction for horizontal fragmentation Example ASG1: ASG ENO EMP1 ASG2: ASG ENO EMP2 EMP1: TITLE= Programmer (EMP) EMP2: TITLE= Programmer (EMP) Query SELECT * FROM EMP, ASG WHERE ASG.ENO = EMP.ENO AND EMP.TITLE = "Mech. Eng."
26 Reduction for DHF Generic query ENO TITLE= Mech. Eng. ASG1 ASG2 EMP1 EMP2 Selections first ENO TITLE= Mech. Eng. ASG1 ASG2 EMP2
27 Reduction for DHF Joins over unions ENO ENO TITLE= Mech. Eng. TITLE= Mech. Eng. ASG1 EMP2 ASG2 EMP2 Elimination of the empty intermediate relations (left sub-tree) ENO TITLE= Mech. Eng. ASG2 EMP2
28 Reduction for Hybrid Fragmentation Combine the rules already specified: Remove empty relations generated by contradicting selections on horizontal fragments; Remove useless relations generated by projections on vertical fragments; Distribute joins over unions in order to isolate and remove useless joins.
29 Reduction for HF Example ENAME Consider the following hybrid fragmentation: ENAME ENO= E5 EMP1= ENO "E4" ( ENO,ENAME (EMP)) EMP2= ENO>"E4" ( ENO,ENAME (EMP)) ENO= E5 ENO EMP3= ENO,TITLE (EMP) and the query SELECT FROM EMP WHERE ENAME EMP2 ENO="E5" EMP1 EMP2 EMP3