SQL Joins, Aggregations, and Query Optimization

csce 608 database systems n.w
1 / 73
Embed
Share

Explore the fundamentals of SQL, including joins, aggregations, and query optimization. Learn about formal semantics of queries, subqueries, union, intersection, and difference operations. Dive into bag semantics, controlling duplicate elimination, and more in the realm of structured query language.

  • SQL
  • Database Systems
  • Query Optimization
  • Joins
  • Aggregations

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 12: SQL Joins and Aggregations

  2. SQL: Structured Query language a very-high-level language. * say what to do rather than how to do it. * avoid a lot of data-manipulation details needed in procedural languages like C++ or Java. Database management system figures out the best way to execute queries * called query optimization For both data definition and data manipulation.

  3. Queries: Formal Semantics 1. Start with the product of all the relations R1, , Rk in the FROM clause. 2. Apply the selection condition C from the WHERE clause. 3. Project onto the list L of attributes and expressions in the SELECT clause. SELECT L FROM R1,R2, ,Rk WHERE C L C R1 R2 Rk 3

  4. Subqueries A parenthesized SELECT-FROM- WHERE statement (subquery) can be used as a value in a number of places, including FROM and WHERE clauses. Example: in place of a relation in the FROM clause, we can place another query, and then query its result. * Better use a tuple-variable to name tuples of the result.

  5. Union, Intersection, and Difference Union, intersection, and difference of relations are expressed by the following forms, each involving subqueries: (subquery) UNION (subquery) (subquery) INTERSECT (subquery) (subquery) EXCEPT (subquery)

  6. Bag Semantics Although the SELECT-FROM-WHERE statement uses bag semantics, the default for union, intersection, and difference is set semantics. -- That is, duplicates are eliminated Why? for efficiency When doing projection, it is easier to avoid eliminating duplicates. -- Just work tuple-at-a-time. For intersection or difference, it is most efficient to sort the relations first. -- So you may as well eliminate the duplicates anyway.

  7. Controlling Duplicate Elimination Force the result to be a set by SELECT DISTINCT . . .

  8. Controlling Duplicate Elimination Force the result to be a set by SELECT DISTINCT . . . Force the result to be a bag (i.e., don t eliminate duplicates) by ALL, as in . . . UNION ALL . . .

  9. Example: DISTINCT From Sells(bar, beer, price), find all the different prices charged for beers: SELECT DISTINCT price FROM Sells; Without DISTINCT, each price would be listed as many times as there were bar/beer pairs at that price.

  10. Example: ALL Using relations Frequents(drinker, bar) and Likes(drinker, beer), list drinkers who frequent more bars than they like beers, and does so as many times as the difference of those counts. (SELECT drinker FROM Frequents) EXCEPT ALL (SELECT drinker FROM Likes);

  11. Join Expressions SQL provides several versions of (bag) joins. These expressions can be stand-alone queries or used in place of relations in a FROM clause. 11

  12. Products and Natural Joins Cross join (Cartesian Product): R CROSS JOIN S;

  13. Products and Natural Joins Cross join (Cartesian Product): R CROSS JOIN S; A B 1 3 4 B C D 2 5 6 4 7 8 S 2 R CROSS JOIN A R.B S.B C D 1 2 2 5 6 1 2 4 7 8 3 4 2 5 6 3 4 4 7 8

  14. Products and Natural Joins Natural join (join tuples agreeing on common attributes): R NATURAL JOIN S;

  15. Products and Natural Joins Natural join (join tuples agreeing on common attributes): R NATURAL JOIN S; A B 1 3 4 B C D 2 5 6 4 7 8 2 R NATURAL JOIN S A B C D 1 2 5 6 3 4 7 8

  16. Theta Join R JOIN S ON <condition> S R A B 1 4 5 B C D 2 5 6 4 7 2 ON A < D; 2 JOIN A R.B S.B C D 1 2 2 5 6 1 2 4 7 2 4 5 2 5 6

  17. Drinkers(name, addr) Frequents(drinker, bar) Theta Join R JOIN S ON <condition> Example: using Drinkers and Frequents: S R A B 1 4 5 B C D 2 5 6 4 7 2 ON A < D; 2 JOIN Drinkers JOIN Frequents ON name = drinker; gives all (d, a, d, b) quadruples such that drinker d lives at address a and frequents bar b. A R.B S.B C D 1 2 2 5 6 1 2 4 7 2 4 5 2 5 6

  18. Outerjoins R OUTER JOIN S is the core of an outerjoin expression. It is modified by: 1. Optional NATURAL in front of OUTER. 2. Optional ON <condition> after JOIN. 3. Optional LEFT, RIGHT, or FULL before OUTER. LEFT = pad dangling tuples of R only. RIGHT = pad dangling tuples of S only. FULL = pad both; this choice is the default.

  19. Outerjoins (Examples) R NATURAL FULL OUTER JOIN S R NATURAL LEFT OUTER JOIN S R NATURAL RIGHT OUTER JOIN S

  20. Outerjoins (Examples) R NATURAL FULL OUTER JOIN S A B 1 3 9 B C D 2 5 6 4 7 8 2 R S NATURAL FULL OUTER JOIN A B C D 1 2 5 6 3 9 N N N 4 7 8

  21. Outerjoins (Examples) R NATURAL LEFT OUTER JOIN S A B 1 3 9 B C D 2 5 6 4 7 8 2 R S NATURAL LEFT OUTER JOIN A B C D 1 2 5 6 3 9 N N

  22. Outerjoins (Examples) R NATURAL RIGHT OUTER JOIN S A B 1 3 9 B C D 2 5 6 4 7 8 2 R S NATURAL RIGHT OUTER JOIN A B C D 1 2 5 6 N 4 7 8

  23. Aggregations SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column.

  24. Aggregations SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column. Also, COUNT(*) counts the number of tuples.

  25. Aggregations SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column. Also, COUNT(*) counts the number of tuples. Example: From Sells(bar, beer, price), find the average price of Bud: SELECT AVG(price) FROM Sells WHERE beer = Bud ;

  26. Sells(bar, beer, price) Eliminating Duplicates in Aggregation Use DISTINCT inside an aggregation. Example: find the number of different prices charged for Bud: SELECT COUNT(DISTINCT price) FROM Sells WHERE beer = Bud ;

  27. NULL is Ignored in Aggregation NULL never contributes to a sum, average, or count, and can never be the minimum or maximum of a column. But if there are no non-NULL values in a column, then the result of the aggregation is NULL.

  28. Sells(bar, beer, price) Example: Effect of NULL s SELECT count(*) FROM Sells WHERE beer = Bud ; The number of bars that sell Bud. SELECT count(price) FROM Sells WHERE beer = Bud ; The number of bars that sell Bud at a known price.

  29. Grouping We may follow a SELECT-FROM- WHERE expression by GROUP BY and a list of attributes. The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group.

  30. Example: Grouping From Sells(bar, beer, price), find the average price for each beer: SELECT beer, AVG(price) FROM Sells GROUP BY beer;

  31. Example: Grouping From Sells(bar, beer, price), find the average price for each beer: SELECT beer, AVG(price) FROM Sells GROUP BY beer; Output one tuple for each group

  32. Sells(bar, beer, price) Frequents(drinker, bar) Example: Grouping From Sells and Frequents, find for each drinker the average price of Bud at the bars they frequent: SELECT drinker, AVG(price) FROM Frequents, Sells WHERE beer = Bud AND Frequents.bar = Sells.bar GROUP BY drinker; 32

  33. Sells(bar, beer, price) Frequents(drinker, bar) Example: Grouping From Sells and Frequents, find for each drinker the average price of Bud at the bars they frequent: compute drinker-bar- price for Bud tuples first, then group by drinker. SELECT drinker, AVG(price) FROM Frequents, Sells WHERE beer = Bud AND Frequents.bar = Sells.bar GROUP BY drinker; 33

  34. Restriction on SELECT Lists With Aggregation If any aggregation is used, then each element of the SELECT list must be either: 1. Aggregated, or 2. An attribute on the GROUP BY list.

  35. Sells(bar, beer, price) Illegal Query Example You might think you could find the bar that sells Bud the cheapest by: SELECT SELECT bar, bar, MIN MIN(price) FROM FROM Sells Sells WHERE WHERE beer = Bud ; beer = Bud ; But this query is illegal in SQL. (price)

  36. HAVING Clauses HAVING <condition> may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated.

  37. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. SELECT beer, AVG(price) FROM Sells GROUP BY beer

  38. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer

  39. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 (SELECT name FROM Beers WHERE manf = Pete s ); at least 3 bars appear in the beer group beer IN

  40. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = Pete s ); at least 3 bars appear in the beer group

  41. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = Pete s ); beers made by Pete s at least 3 bars appear in the beer group

  42. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = Pete s ); beers made by Pete s at least 3 bars appear in the beer group the beer is made by Pete s

  43. Requirements on HAVING Conditions These conditions may refer to any relation or tuple-variable in the FROM clause. They may refer to attributes of those relations, as long as the attribute makes sense within a group; i.e., it is either: 1. A grouping attribute, or 2. Aggregated.

  44. Requirements on HAVING Conditions It is easier to understand this from an implementation viewpoint: SELECT FROM WHERE GROUP BY HAVING

  45. Requirements on HAVING Conditions It is easier to understand this from an implementation viewpoint: SELECT FROM WHERE GROUP BY HAVING step 4, pick the proper groups step 5, compute the output step 1, input step 2, pick the proper tuples step 3, group the picked tuples

  46. Database Modifications A modification command does not return a result (as a query does), but changes the database in some way.

  47. Database Modifications A modification command does not return a result (as a query does), but changes the database in some way. Three kinds of modifications: 1. Insert a tuple or tuples. 2. Delete a tuple or tuples. 3. Update the value(s) of an existing tuple or tuples.

  48. Insertion To insert a single tuple: INSERT INTO <relation> VALUES (<list of values>);

  49. Likes(drinker, beer) Insertion To insert a single tuple: INSERT INTO <relation> VALUES (<list of values>); Example: add to Likes(drinker, beer) the fact that Sally likes Bud. INSERT INTO Likes VALUES( Sally , Bud );

  50. Likes(drinker, beer) Insertion To insert a single tuple: INSERT INTO <relation> VALUES (<list of values>); Example: add to Likes(drinker, beer) the fact that Sally likes Bud. INSERT INTO Likes VALUES( Sally , Bud ); We may add a list of attributes to <relation>. Two reasons for doing so: 1. Forget the order of attributes for the relation. 2. Don t have values for all attributes, and want the system to fill in missing ones with default values.

Related


More Related Content