
Exploring the Layers of Query Processing in Distributed Systems
Delve into the intricacies of query processing in distributed systems, covering topics such as types of query processors, optimization strategies, decision sites, network topology exploitation, and more. Learn about languages used in object DBMS, optimization timing, statistics, and the exploitation of replicated fragments for efficient data processing.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Query Processing - GROUP #3 #6 SUJAY SANGLIKAR #11 DARSHIL VARIA #15 AAKIB LADHANI #18 PURVISH OZA
TYPES OF QUERY PROCESSOR Agenda LAYERS OF QUERY PROCESSING
Types Of Query Processor: Characteristics are as follows: Languages Types Of Optimization Optimizing Timing Statistics Decision Sites Exploitation of Network Topology Exploitation of Replicated Fragments Use Of Semi joins
o Languages: In Object DBMS language is based on object calculus. XML has is own language , primarily in XPath an XQuery. Output language in distributed system is internal form of relational algebra. Query processing must perform efficient mapping from input language to output language. o Types Of Optimization: It aims at choosing the best point in the solution space of all possible execution strategies. Exhaustive search approach is the best query optimization technique , but it has high processing cost. Iterative improvement and simulated annealing are good strategies to avoid this cost.
o Optimization Timing: Optimization can be done statically before executing the query or dynamically as query executed. Static query compilation done at query compilation time and dynamic query optimization proceeds at execution time. Hybrid query optimization attempts to provide advantages of static query optimization by avoiding the inaccurate estimates. Statistics: In distributed database, statistics for query optimization bear on fragments. To minimize the probability of error, histograms of attribute values are used at the expense of higher management cost. The accuracy of statistics is achieved by periodic updates.
o Decision sites: Most systems use the centralized decision approach, in which a single site generates the strategy. The centralized approach is simpler but requires knowledge of the entire distributed database, while the distributed approach requires only local information. Hybrid approaches where one site makes the major decisions and other sites can make local decisions are also frequent. o Exploitation of the Network Topology: The network topology is generally exploited by the distributed query processor. With wide area networks, the cost function to be minimized can be restricted to the data communication cost. With local area networks, communication costs are comparable to I/O costs. In a client-server environment, the power of the client workstation can be exploited to perform database operators using data shipping
o Exploitation of Replicated Fragments: Localization process means its main function is to localize the data involved in the query. For higher reliability and better read performance, it is useful to have fragments replicated at different sites. Some algorithms exploit the existence of replicated fragments at run time in order to minimize communication times. o Use of Semi joins: The semi join operator has the important property of reducing the size of the operand relation. Using semi joins may result in an increase in the number of messages and in the local processing time. Some query processing algorithms aim at selecting an optimal combination of joins and semi joins.
Layers of Query Processing In query processing each layer solves a well-defined sub-problem. There are four main layers are involved in distributed query processing. Query Decomposition Data Localization Global Optimization Distributed Execution
Query Decomposition The first layer decomposes the calculus query into an algebraic query on global relations. Query Decomposition can be viewed as four successive steps. First, the calculus query is rewritten in a normalized form that is suitable for subsequent manipulation. Second, the normalized query is analyzed semantically so that incorrect queries are detected and rejected as early as possible. Third, the correct query is simplified. Fourth, the calculus query is restructured as an algebraic query.
Data Localization The input to the second layer is an algebraic query on global relations. The main role of second layer is to localize the query s data using data distribution information in the fragment schema. This layer determines fragments that are involved in query and transforms the distributed query into a query fragments. A global relation can be reconstructed by applying the fragmentation rules and then deriving a program called localization program of algebra operators, which act on fragments.
Global Query Optimization The input to the third layer is an algebraic query on fragments. The goal of query optimization is to find an execution strategy for the query which is close to optimal. This optimization is independent of fragment characteristics such as fragment allocation and cardinalities. Query Optimization consists of finding the best ordering operators in the query including communication operators that minimize a cost function. Also an important aspect of query optimization is join ordering, since the permutations of the joins within the query may lead to improvements of orders of magnitude. The output of this layer is optimized algebraic query with communication operators included on fragments.
Distributed Query Execution The last layer is performed by all the sites having fragments involved in the query. Each subquery executing at one site, called a local query, is then optimized using the local schema of the site and executed. At this time, the algorithms to perform the relational operators may be chosen.