Query Processing in Database Systems: Understanding SQL Query Execution

1 / 70

Embed Share

Explore the fundamentals of query processing in database systems with a focus on SQL query evaluation. Learn about single-table queries, query optimization, execution, and executor architecture. Understand how to process basic queries and utilize distinct selections, ordering, and grouping for efficient data retrieval.

hees405 Follow

Uploaded on Apr 12, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Database Systems Query Evaluation Based on slides by Feifei Li, University of Utah

A Slice Through Query Processing SQL Query We ll start with single-table queries SQL details Query Optimization and Execution Query Executor Architecture Simple Query Optimization Relational Operators Files and Access Methods Buffer Management Disk Space Management DB 2

Basic Single-Table Queries SELECT [DISTINCT] <column expression list> FROM <single table> [WHERE <predicate>] [GROUP BY <column list> [HAVING <predicate>] ] [ORDER BY <column list>] 3

Basic Single-Table Queries SELECT [DISTINCT] <column expression list> FROM <single table> [WHERE <predicate>] [GROUP BY <column list> [HAVING <predicate>] ] [ORDER BY <column list>] Simplest version is straightforward Produce all tuples in the table that satisfy the predicate Output the expressions in the SELECT list Expression can be a column reference, or an arithmetic expression over column refs 4

Basic Single-Table Queries SELECT FROM Students S WHERE S.dept = CS [GROUP BY <column list> [HAVING <predicate>] ] [ORDER BY <column list>] S.name, S.gpa Simplest version is straightforward Produce all tuples in the table that satisfy the predicate Output the expressions in the SELECT list Expression can be a column reference, or an arithmetic expression over column refs 5

SELECT DISTINCT SELECT DISTINCT S.name, S.gpa FROM Students S WHERE S.dept = CS [GROUP BY <column list> [HAVING <predicate>] ] [ORDER BY <column list>] DISTINCT flag specifies removal of duplicates before output 6

ORDER BY SELECT DISTINCT S.name, S.gpa, S.age*2 AS a2 FROM Students S WHERE S.dept = CS [GROUP BY <column list> [HAVING <predicate>] ] ORDER BY S.gpa, S.name, a2; ORDER BY clause specifies that output should be sorted Lexicographic ordering again! Obviously must refer to columns in the output Note the AS clause for naming output columns! 7

ORDER BY SELECT DISTINCT S.name, S.gpa FROM Students S WHERE S.dept = CS [GROUP BY <column list> [HAVING <predicate>] ] ORDER BY S.gpa DESC, S.name ASC; Ascending order by default, but can be overridden DESC flag for descending, ASC for ascending Can mix and match, lexicographically 8

Aggregates SELECT [DISTINCT] AVERAGE(S.gpa) FROM Students S WHERE S.dept = CS [GROUP BY <column list> [HAVING <predicate>] ] [ORDER BY <column list>] Before producing output, compute a summary (a.k.a. an aggregate) of some arithmetic expression Produces 1 row of output with one column in this case Other aggregates: SUM, COUNT, MAX, MIN Note: can use DISTINCT inside the agg function SELECT COUNT(DISTINCT S.name) FROM Students S vs. SELECT DISTINCT COUNT (S.name) FROM Students S; 9

GROUP BY SELECT [DISTINCT] AVERAGE(S.gpa), S.dept FROM Students S [WHERE <predicate>] GROUP BY S.dept [HAVING <predicate>] [ORDER BY <column list>] Partition the table into groups that have the same value on GROUP BY columns Can group by a list of columns Produce an aggregate result per group Cardinality of output = # of distinct group values Note: can put grouping columns in SELECT list For aggregate queries, SELECT list can contain aggs and GROUP BY columns only! What would it mean if we said SELECT S.name, AVERAGE(S.gpa) above?? 10

HAVING SELECT [DISTINCT] AVERAGE(S.gpa), S.dept FROM Students S [WHERE <predicate>] GROUP BY S.dept HAVING COUNT(*) > 5 [ORDER BY <column list>] The HAVING predicate is applied after grouping and aggregation Hence can contain anything that could go in the SELECT list I.e. aggs or GROUP BY columns It s an optional clause 11

Putting it all together SELECT S.dept, AVERAGE(S.gpa), COUNT(*) FROM Students S WHERE S.gender = F GROUP BY S.dept HAVING COUNT(*) > 5 ORDER BY S.dept; 12

Query Processing Overview The query optimizertranslates SQL to a special internal language Query Plans The query executor is an interpreter for query plans Think of query plans as box-and-arrow dataflow diagrams Each box implements a relational operator Edges represent a flow of tuples (columns as specified) For single-table queries, these diagrams are straight-line graphs name, gpa Distinct SELECT DISTINCT name, gpa FROM Students name, gpa Optimizer Sort name, gpa HeapScan 13

Iterators iterator The relational operators are all subclasses of the class iterator: class iterator { void init(); tuple next(); void close(); iterator &inputs[]; // additional state goes here } Note: Edges in the graph are specified by inputs (max 2, usually) Encapsulation: any iterator can be input to any other! When subclassing, different iterators will keep different kinds of state information 14

Example: Sort class Sort extends iterator { void init(); tuple next(); void close(); iterator &inputs[1]; int numberOfRuns; DiskBlock runs[]; RID nextRID[]; } init(): generate the sorted runs on disk Allocate runs[] array and fill in with disk pointers. Initialize numberOfRuns Allocate nextRID array and initialize to NULLs next(): nextRID array tells us where we re up to in each run find the next tuple to return based on nextRID array advance the corresponding nextRID entry return tuple (or EOF -- End of Fun -- if no tuples remain) close(): deallocate the runs and nextRID arrays 15

Sort GROUP BY: Nave Solution The Sort iterator (could be external sorting, as explained before) naturally permutes its input so that all tuples are output in sequence The Aggregate iterator keeps running info ( transition values ) on agg functions in the SELECT list, per group E.g., for COUNT, it keeps count-so-far For SUM, it keeps sum-so-far For AVERAGE it keeps sum-so-far andcount-so-far As soon as the Aggregate iterator sees a tuple from a new group: Aggregate 1. It produces an output for the old group based on the agg function E.g. for AVERAGE it returns (sum-so-far/count-so-far) Sort 2. It resets its running info. 3. It updates the running info with the new tuple s info 16

An Alternative to Sorting: Hashing! Idea: Many of the things we use sort for don t exploit the order of the sorted data E.g.: forming groups in GROUP BY E.g.: removing duplicates in DISTINCT Often good enough to match all tuples with equal field-values Hashing does this! And may be cheaper than sorting! But how to do it for data sets bigger than memory? 17

General Idea Two phases: Partition: use a hash function hp to split tuples into partitions on disk. We know that all matches live in the same partition. Partitions are spilled to disk via output buffers ReHash: for each partition on disk, read it into memory and build a main-memory hash table based on a hash function hr Then go through each bucket of this hash table to bring together matching tuples 18

Original Relation Two Phases Partitions OUTPUT 1 1 2 INPUT 2 hash function hp Partition: . . . B-1 B-1 B main memory buffers Disk Disk Result Partitions Hash table for partition Ri (k <= B pages) hash fn hr Rehash: B main memory buffers Disk 19

Analysis L=B How big of a relation (file) can we hash in one pass? L-1 spill partitions in Phase 1 Each should be no more than L blocks big Answer: L(L-1) = B (B-1). (this is the size of the file in pages ) Said differently: We can hash a table of size N blocks in about square root (N) space Much like sorting! Have a bigger relation? Recursive partitioning! In the ReHash phase, if a partition b is bigger than B, then recurse: pretend that b is a relation we need to hash, run the Partitioning phase on b, and then the ReHash phase on each of its (sub)partitions 20

Hash GROUP BY: Nave Solution The Hash iterator permutes its input so that all tuples are output in sequence The Aggregate iterator keeps running info ( transition values ) on agg functions in the SELECT list, per group E.g., for COUNT, it keeps count-so-far For SUM, it keeps sum-so-far For AVERAGE it keeps sum-so-far andcount-so-far When the Aggregate iterator sees a tuple from a new group: 1. It produces an output for the old group based on the agg function E.g. for AVERAGE it returns (sum-so-far/count-so-far) 2. It resets its running info. 3. It updates the running info with the new tuple s info Aggregate Hash 21

We Can Do Better! Combine the summarization into the hashing process During the ReHash phase, don t store tuples, store pairs of the form <GroupVals, TransVals> When we want to insert a new tuple into the hash table If we find a matching GroupVals, just update the TransVals appropriately Else insert a new <GroupVals,TransVals> pair What s the benefit? Q: How many pairs will we have to handle? A: Number of distinct values of GroupVals columns Not the number of tuples! Also probably narrower than the tuples Can we play the same trick during sorting? HashAgg 22

Even Better: Hybrid Hashing What if the set of <GroupVals,TransVals> pairs fits in memory It would be a waste to spill it to disk and read it all back! Recall this could be true even if there are tons of tuples! Idea: keep a smaller 1st partition in memory during phase 1! Output its stuff at the end of Phase 1. Q: how do we choose the number k? k-buffer hashtable Original Relation Partitions OUTPUT 2 2 1 3 hr 3 INPUT . . . hh B-k B-k B main memory buffers Disk Disk 23

A Hash Function for Hybrid Hashing Assume we like the use the hash-partition function hp Define hh operationally as follows: hh(x) = 1 if in-memory hashtable is not yet full hh(x) = 1 if x is already in the hashtable hh(x) = hp(x) otherwise This ensures that: Bucket 1 fits in k pages of memory If the entire set of distinct hashtable entries is smaller than k, we do no do spilling! Original Relation k-buffer hashtable Partitions OUTPUT 2 2 1 3 hr 3 hh INPUT . . . B-k B main memory buffers Disk Disk 24

Query Optimization: a quick preview Distinct A deep subject, focuses on multi-table queries We will only need a cookbook version for now. Build the dataflow bottom up: Choose an Access Method (HeapScan or IndexScan) Non-trivial, we ll learn about this later! Next apply any WHERE clause filters Next apply GROUP BY and aggregation Can choose between sorting and hashing! Next apply any HAVING clause filters Next Sort to help with ORDER BY and DISTINCT In absence of ORDER BY, can do DISTINCT via hashing! Note: Where did SELECT clause go? Implicit! Sort Filter HashAgg Filter HeapScan 25

Summary Single-table SQL, in detail Exposure to query processing architecture Query optimizer translates SQL to a query plan Query executor interprets the plan Query plans are graphs of iterators Three ways to read the input relation R to perform some operation Read R in sequential order of pages (linear scan) Sort R and then perform on the operator Use hashing to break R into smaller partitions (Ri), each Ri fits in memory, and then perform the operation on each partition indepently This works for specific operators 26

Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Join ( ) Allows us to combine two relations. Set-difference ( ) Tuples in reln. 1, but not in reln. 2. Union ( ) Tuples in reln. 1 and in reln. 2. Aggregation (SUM, MIN, etc.) and GROUP BY Since each op returns a relation, ops can be composed! After we cover the operations, we will discuss how to optimize queries formed by composing them. 27

Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string) Reserves: Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. Sailors: Each tuple is 50 bytes long, 80 tuples per page, 500 pages. 28

Simple Selections SELECT * FROM Reserves R WHERER.rname < C% R.attrop ( ) R Of the form value Question: how best to perform? Depends on: what indexes/access paths are available what is the expected size of the result (in terms of number of tuples and/or number of pages) Size of result (cardinality) approximated as size of R * reduction factor reduction factor is usually called selectivity. estimate of reduction factors is based on statistics we will discuss later. 29

Simple Selections (cont) With no index, unsorted: Must essentially scan the whole relation cost is M (#pages in R). For reserves = 1000 I/Os. With no index, sorted: cost of binary search + number of pages containing results. For reserves = 10 I/Os + selectivity*#pages With an index on selection attribute: Use index to find qualifying data entries, then retrieve corresponding data records. Cost? 30

Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost: finding qualifying data entries (typically small) plus cost of retrieving records (could be large w/o clustering). In example reserves relation, if 10% of tuples qualify (100 pages, 10000 tuples). With a clustered index, cost is little more than 100 I/Os; If unclustered, could be up to 10000 I/Os! Unless you get fancy 31

Selections using Index (cont) Important refinement for unclustered indexes: 1. Find qualifying data entries. 2. Sort the rid s of the data records to be retrieved. 3. Fetch rids in order. This ensures that each data page is looked at just once (though # of such pages likely to be higher than with clustering). Index entries direct search for data entries CLUSTERED Data entries Data entries (Index File) (Data file) Data Records Data Records 32

General Selection Conditions (day<8/9/94 AND rname= Paul ) OR bid=5 OR sid=3 Such selection conditions are first converted to conjunctive normal form (CNF): (day<8/9/94 OR bid=5 OR sid=3 ) AND (rname= Paul OR bid=5 OR sid=3) We only discuss the case with no ORs (a conjunction of terms of the form attr op value). A B-tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key. Index on <a, b, c> matches a=5 AND b= 3, but not b=3. (For Hash index, must have all attrs in search key) 33

Summary Single-table SQL, in detail Exposure to query processing architecture Query optimizer translates SQL to a query plan Query executor interprets the plan Query plans are graphs of iterators Three ways to read the input relation R to perform some operation Read R in sequential order of pages (linear scan) Sort R and then perform on the operator Use hashing to break R into smaller partitions (Ri), each Ri fits in memory, and then perform the operation on each partition indepently This works for specific operators 34

Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string) Reserves: Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. Sailors: Each tuple is 50 bytes long, 80 tuples per page, 500 pages. 35

Two Approaches to General Selections General Selection: Term1 AND Term2 AND Term3 AND E.g., 100K< R.salary < 150K AND 30 < R.age < 40 First approach: Find the most selective access path, retrieve tuples using it, and apply any remaining terms that don t match the index: Most selective access path: An index or file scan that we estimate will require the fewest page I/Os. Terms that match this index reduce the number of tuples retrieved; other terms are used to discard some retrieved tuples, but do not affect number of tuples/pages fetched. 36

Most Selective Index - Example Consider day<8/9/94 AND bid=5 AND sid=3. A B+ tree index on day can be used; then, bid=5 and sid=3 must be checked for each retrieved tuple. Similarly, a hash index on <bid, sid> could be used; Then, day<8/9/94 must be checked. How about a B+tree on <rname,day>? How about a B+tree on <day, rname>? How about a Hash index on <day, rname>? 37

Intersection of Rids Second approach: if we have 2 or more matching indexes (w/Alternatives (2) or (3) for data entries): Get sets of rids of data records using each matching index. Then intersect these sets of rids. Retrieve the records and apply any remaining terms. Consider day<8/9/94 AND bid=5 AND sid=3. With a B+ tree index on day and an index on sid, we can retrieve rids of records satisfying day<8/9/94 using the first, rids of recs satisfying sid=3 using the second, intersect, retrieve records and check bid=5. Note: commercial systems use various tricks to do this: bit maps, bloom filters, index joins 38

Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string) Projection (DupElim) Issue is removing duplicates. SELECTDISTINCT R.sid, R.bid FROM Reserves R Basic approach is to use sorting 1. Scan R, extract only the needed attrs (why do this 1st?) 2. Sort the resulting set 3. Remove adjacent duplicates Cost: Reserves with size ratio 0.25 = 250 pages. With 20 buffer pages can sort in 2 passes, so 1000 +250 + 2 * 2 * 250 + 250 = 2500 I/Os Can improve by modifying external sort algorithm (see chapter 12): Modify Pass 0 of external sort to eliminate unwanted fields. Modify merging passes to eliminate duplicates. Cost: for above case: read 1000 pages, write out 250 in runs of 40 pages, merge runs = 1000 + 250 +250 = 1500. 39

DupElim Based on Hashing Just like our discussion of GROUP BY and aggregation from before! But the aggregation function is missing SELECT DISTINCT R.sid, R.bid FROM Reserves R SELECT R.sid, R.bid FROM Reserves R GROUP BY R.sid, R.bid Cost for Hashing? Without hybrid assuming partitions fit in memory (i.e. #bufs >= square root of the #of pages of projected tuples) read 1000 pages and write out partitions of projected tuples (250 pages) Do dup elim on each partition (total 250 page reads) Total : 1500 I/Os. With hybrid hash : subtract the I/O costs of 1st partition 40

DupElim & Indexes If an index on the relation contains all wanted attributes in its search key, can do index-only scan. Apply projection techniques to data entries (much smaller!) If an ordered (i.e., tree) index contains all wanted attributes as prefix of search key, can do even better: Retrieve data entries in order (index-only scan), discard unwanted fields, compare adjacent tuples to check for duplicates. Same tricks apply to GROUP BY/Aggregation 41

Joins Joins are very common Joins are very expensive (worst case: cross product!) Many approaches to reduce join cost 42

Equality Joins With One Join Column SELECT * FROM Reserves R1, Sailors S1 WHERE R1.sid=S1.sid In algebra: R S. Common! Must be carefully optimized. R S is large; so, R S followed by a selection is inefficient. Note: join is commutative and associative. Assume: N1/ pR pages in R, pR tuples per page, N1 number of records in R N2/ pS pages in S, pS tuples per page, N2 number of records in S In our examples, R is Reserves and S is Sailors. We will consider more complex join conditions later. Cost metric : # of I/Os. We will ignore output costs. 43

Simple Nested Loops Join foreach tuple r in R do foreach tuple s in S do if ri == sj then add <r, s> to result For each tuple in the outer relation R, we scan the entire inner relation S. How much does this Cost? (pR * N1/ pR) * N2/ pS + N1/ pR = 100*1000*500 + 1000 I/Os. At 10ms/IO, Total: ??? What if smaller relation (S) was outer? What assumptions are being made here? Q: What is cost if one relation can fit entirely in memory? 44

Page-Oriented Nested Loops Join foreach page bR in R do foreach page bS in S do foreach tuple r in bR do foreach tuple s in bSdo if ri == sj then add <r, s> to result For each page of R, get each page of S, and write out matching pairs of tuples <r, s>, where r is in R-page and S is in S-page. What is the cost of this approach? N1/pR*N2/pS + N1/pR= 1000*500 + 1000 If smaller relation (S) is outer, cost = 500*1000 + 500 45

Index Nested Loops Join foreach tuple r in R do foreach tuple s in S where ri == sj do add <r, s> to result If there is an index on the join column of one relation (say S), can make it the inner and exploit the index. Cost: N1/pR + ( (N1/ pR *pR) * cost of finding matching S tuples) For each R tuple, cost of probing S index is about 2-4 IOs for B+ tree. Cost of then finding S tuples (assuming Alt. (2) or (3) for data entries) depends on clustering. Clustered index: 1 I/O per page of matching S tuples. Unclustered: up to 1 I/O per matching S tuple. 46

Examples of Index Nested Loops B+-tree index (Alt. 2) on sid of Sailors (as inner): Scan Reserves: 1000 page I/Os, 100*1000 tuples. For each Reserves tuple: 2 I/Os to get data entry in index, plus 1 I/O to get (the exactly one) matching Sailors tuple. Total: B+-Tree index (Alt. 2) on sid of Reserves (as inner): Scan Sailors: 500 page I/Os, 80*500 tuples. For each Sailors tuple: 2 I/Os to find index page with data entries, plus cost of retrieving matching Reserves tuples. Assuming uniform distribution, 2.5 reservations per sailor (100,000 / 40,000). Cost of retrieving them is 1 or 2.5 I/Os depending on whether the index is clustered. Total: 47

Block Nested Loops Join Page-oriented NL doesn t exploit extra buffers. Alternative approach: Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ``block (think chunk ) of outer R. For each matching tuple r in R-chunk, s in S-page, add <r, s> to result. Then read next R-chunk, scan S, etc. Buffer of B pages Join Result R & S chunk of R tuples (k <= B-2 pages) . . . . . . . . . Output buffer Input buffer for S 48

Examples of Block Nested Loops Cost: Scan of outer + #outer chunks * scan of inner #outer chunks = #pages outer/ chunksize With Reserves (R) as outer, and 100 pages of R: Cost of scanning R is 1000 I/Os; a total of 10 chunks. Per chunk of R, we scan Sailors (S); 10*500 I/Os. If space for just 90 pages of R, we would scan S 12 times. With 100-page chunk of Sailors as outer: Cost of scanning S is 500 I/Os; a total of 5 chunks. Per chunk of S, we scan Reserves; 5*1000 I/Os. If you consider seeks, it may be best to divide buffers evenly between R and S. Disk arm jogs between read of S and write of output If output is not going to disk, this is not an issue 49

Sort-Merge Join (R S) i=j Sort R and S on the join column, then scan them to do a ``merge (on join col.), and output result tuples. Useful if One or both inputs already sorted on join attribute(s) Output should be sorted on join attribute(s) General scheme: Do { Advance scan of R until current R-tuple >= current S tuple; Advance scan of S until current S-tuple >= current R tuple; } Until current R tuple = current S tuple. At this point, all R tuples with same value in Ri (current R group) and all S tuples with same value in Sj (current S group) match; output <r, s> for all pairs of such tuples. Like a mini nested loops Then resume scanning R and S. R is scanned once; each S group is scanned once per matching R tuple. (Multiple scans of an S group will probably find needed pages in buffer.) 50

Query Processing in Database Systems: Understanding SQL Query Execution

Download Presentation

Presentation Transcript

Related

More Related Content