
Understanding SQL Joins Explained by Tony Hasler at UKOUG TECH14 (December 2014)
Explore the comprehensive guide on SQL joins presented by Tony Hasler at UKOUG TECH14 in December 2014. Learn about different join types, syntax, methods, and potential pitfalls in join operations.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
All about Joins Tony Hasler UKOUG TECH14 December 2014
Win a copy of the book - SQL QUIZ http://tonyhasler.wordpress.com L
Agenda What is a join? ANSI join syntax and outer joins Left lateral joins Join methods Hash join input swapping
Warning and disclaimer! Some statements made in the forthcoming slides are not true when hash join input swapping is considered! We will consider hash join input swapping later in the presentation
What is a join in SQL? A join is an operation on an ordered pair of row sources I will refer to the first element of the pair as the driving row source and the second element the probed row source Row sources can be elements in the FROM clause of an SQL query block Tables Data dictionary views (unmerged) Inline views (unmerged or created by CBO transformations) Factored subqueries Results from the TABLE and XMLTABLE operators Or unnested subqueries ( EXISTS, NOT EXISTS, IN, NOT IN, etc.) Or intermediate results generated from other joins
The simplest possible join SELECT * FROM T1, T2; If there are M rows in T1 and N rows in T2 there are M x N rows in the result set. In this case the result set will have 5 x 5 = 25 rows
A more complex case SELECT * FROM T1, T2, T3, T4 WHERE T1.C1 > 1 AND T1.C1=T2.C2 AND T2.C2 = T3.C3 AND T3.C3 > T4.C4; The number of joins is always one less than the number of row sources (in this case 3 joins for 4 row sources) One possible join tree can be depicted as: (((T1 T2) T3) T4)
Possible join trees There are120 possible join trees for four tables The CBO will only consider the 24 join trees where the driving row source of the second and subsequent joins is the intermediate result from previous joins. The CBO may consider (((T3 T4) T1) T2) The CBO will not consider ((T3 T4) (T1 T2)) Using the restricted set of join trees considered by the CBO we can consider the join order as fully specifying the join tree.
The original vision for joins in the SQL language SQL is a declarative language. You specify what you want to do not how to do it. Join order is not a programmers concern All predicates are equal under Codd! Logically, all predicates in the WHERE clause of a query block can be evaluated after all the joins have completed. The comma-separated syntax of the FROM clause and the separation of the FROM and WHERE clauses reflects that vision.
ANSI join syntax SELECT * FROM T1 JOIN T2 ON T1.C1 = T2.C2 CROSS JOIN T3 JOIN T4 ON T3.C3 > T4.C4 WHERE T1.C1 > T4.C3 ANSI syntax includes join predicates in the FROM clause and selection predicates in the WHERE clause. A join order is explicitly specified In the above SQL statement, the CBO will ignore the distinction between predicates and ignore the specified join order! Heresy?
Outer Joins An outer join has a preserved row source and an optional row source Any rows in the preserved rows source that do not match rows in the optional row source are included in the join results These extra rows have no value (NULL) for columns from the optional row source Implications The operands of an outer join are semantically different The preserved row source must precede the optional row source in the join order. There is now a distinction between join predicates and selection predicates
SELECT * FROM t1 LEFTJOIN t2 ON t1.c1 = t2.c2 LEFTJOIN t3 ON t1.c1 = t3.c3 WHERE t1.c1 != 3 ORDER BY t1.c1; C1 C2 C3 1 2 2 4 4 4 5 5 5
SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c2 LEFT JOIN t3 ON t1.c1 = t3.c3 and t1.c1 != 3 ORDER BY t1.c1; C1 C2 C3 1 2 2 3 3 4 4 4 5 5 5 Only legal join orders are ((T1 T2) T3) and ((T1 T3) T2)
SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.c2 = t3.c3) ON t1.c1 = t2.c2 ORDER BY t1.c1; C1 C2 C3 1 2 2 3 3 3 4 4 4 5 5 5
SELECT c1,c2,c3 FROM t2 LEFT JOIN t3 ON t2.c2 = t3.c3 RIGHT JOIN t1 ON t1.c1 = t2.c2 ORDER BY t1.c1; C1 C2 C3 1 2 2 3 3 3 4 4 4 5 5 5 Theoretically no legal join order! The CBO has to transform this query.
WITH q1 AS (SELECT * FROM t2 LEFT JOIN t3 ON t2.c2 = t3.c3) SELECT * FROM t1 LEFT JOIN q1 ON t1.c1 = q1.c2 ORDER BY t1.c1; -------------------------------------- | Id | Operation | Name | -------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT ORDER BY | | |* 2 | HASH JOIN OUTER | | | 3 | TABLE ACCESS FULL | T1 | | 4 | VIEW | | |* 5 | HASH JOIN OUTER | | | 6 | TABLE ACCESS FULL| T2 | | 7 | TABLE ACCESS FULL| T3 | --------------------------------------
Outer joins with extensions to traditional syntax SELECT * FROM t1, t2, t3 WHERE t1.c1 = t2.c2(+) AND t3.c3 = t2.c2(+) ORDER BY t1.c1; SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c2 RIGHT JOIN t3 ON t3.c3 = t2.c2; SELECT * FROM t1 CROSS JOIN t3 LEFT JOIN t2 ON t1.c1 = t2.c2 AND t2.c2 = t3.c3;
Lateral joins with the TABLE operator SELECT p.* FROM v$session s, TABLE (DBMS_XPLAN.display_cursor (s.sql_id, s.sql_child_number)) p WHERE sid in (123,456); LEGAL ENTITY, department or author (Click Insert | Header & Footer) Month Day, Year
Lateral joins in 12c SELECT * FROM t1 ,LATERAL ( SELECT t2.c2, MEDIAN (t2.c2) OVER () med FROM t2 WHERE t2.c2 BETWEEN t1.c1 - 3 AND t1.c1) v WHERE t1.c1 = v.c2 ORDER BY t1.c1; In ANSI syntax use the keywords CROSSAPPLY (for inner lateral joins) and OUTERAPPLY (for outer lateral joins)
Other types of join Full outer joins (two preserved row sources) Partitioned outer joins (potentially multiple rows in the result set from one preserved row). Anti joins (standard and null aware in 11g onwards) Semi joins (standard and null accepting in 12c onwards) NOTE: A Partial Join is a 12c optimizer transformation not a special type of join
There are 3 join methods Nested loops join Hash join Merge join Merge join Cartesian (a variation on a merge join) Variations on a theme: Full and partial partition-wise joins Buffered joins for parallel queries Bloom filtering can be applied, primarily in parallel queries Nested loops can be pre-fetched or batched
Nested loops join For every row in the driving row source, find corresponding rows in the probed row source. SELECT /*+ leading(t1) use_nl(t2) */ * FROM t1, t2 WHERE t1.c1 = t2.c2; ----------------------------------- | Id | Operation | Name | ----------------------------------- | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | TABLE ACCESS FULL| T1 | |* 3 | TABLE ACCESS FULL| T2 | -----------------------------------
Nested loops join with a hash cluster (1) CREATE CLUSTER cluster_hash ( ck INTEGER ) HASHKEYS 3 HASH IS ck; CREATE TABLE tch1 ( ck INTEGER ,c1 INTEGER ) CLUSTER cluster_hash ( ck );
Nested loops join with a hash cluster (2) SELECT /*+ leading(t1) use_nl(tch1) */ * FROM t1, tch1 WHERE t1.c1 = tch1.ck; ----------------------------------- | Id | Operation | Name | ----------------------------------- | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | TABLE ACCESS FULL| T1 | |* 3 | TABLE ACCESS HASH| TCH1 | -----------------------------------
Nested loops join with an index CREATE INDEX t2_i1 ON t2 (c2); SELECT /*+ leading(t1) use_nl_with_index(t2 (c2)) */ * FROM t1, t2 WHERE t1.c1 = t2.c2;------------------------------------ | Id | Operation | Name | ------------------------------------ | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | | 2 | TABLE ACCESS FULL| T1 | |* 3 | INDEX RANGE SCAN | T2_I1 | ------------------------------------
Pros and cons of nested loops joins Advantages Assuming a suitable index is available, nested loops joins scale linearly Only blocks containing rows to be selected are accessed The only join method that support lateral joins Supports theta joins (e.g. t1.c1 > t2.c2) No in-memory workarea Disadvantages Usually works poorly without an index and indexes are expensive! Indexes access a table with single block reads Index range scans may access the same block multiple times Interview sound bite: Used with two small tables
Hash join Create an in-memory hash cluster from the contents of the driving row source. Use an upside-down nested loops join from the probe row source into the in-memory hash cluster. SELECT /*+ leading(t1) use_hash(t2) no_swap_join_inputs(t2) */ * FROM t1, t2 WHERE t1.c1 = t2.c2; | Id | Operation | Name | ------------------------------------ | 0 | SELECT STATEMENT | | |* 1 | HASH JOIN | | | 2 | TABLE ACCESS FULL| T1 | | 3 | INDEX FULL SCAN | T2_I1 | ------------------------------------ ------------------------------------
Pros and cons of hash joins Advantages No index on the join column is needed! When full table scans are used blocks are only visited once (excluding visits to the workarea) When full table scans are used multi-block reads are possible More tolerant of cardinality errors that nested loops joins Supports hash join input swapping Disadvantages Requires a workarea that limits scalability Only equijoins are supported (e.g. t1.c1 = t2.c2) May visit blocks in the probed row source that contain no rows in the result set. Interview sound bite: used when joining a small table with a large table
Hash join input swapping Exchange the two operands of the hash join. According to the hints the join order is not affected by the swap! SELECT /*+ leading(t2 t1) use_hash(t1) swap_join_inputs(t1) */ * FROM t1, t2 WHERE t1.c1 = t2.c2; ------------------------------------ | Id | Operation | Name | ------------------------------------ | 0 | SELECT STATEMENT | | |* 1 | HASH JOIN | | | 2 | TABLE ACCESS FULL| T1 | | 3 | INDEX FULL SCAN | T2_I1 | ------------------------------------
The three benefits of hash join input swapping The intermediate result of a join operation can be the probed row source of a hash join. The optional row source in an outer join can be the driving row source of a hash join. The subquery in a semi-join or an anti-join can be the driving row source of a hash join. However,hash join input swapping does not allow the intermediate result set from one join to be the optional row source in a later outer join. Furthermore, hash join input swapping doesn t facilitate bushy joins like the one shown earlier: ((T3 T4) (T1 T2))
SELECT /*+ leading(t3 t4 t1 t2) use_hash(t4) use_hash(t1) use_hash(t2) swap_join_inputs(t4) swap_join_inputs(t1) swap_join_inputs(t2) */ * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c2 JOIN t3 ON t1.c1 = t3.c3 JOIN t4 ON t3.c3 = t4.c4; --------------------------------------- | Id | Operation | Name | --------------------------------------- | 0 | SELECT STATEMENT | | |* 1 | HASH JOIN RIGHTOUTER| | | 2 | INDEX FULL SCAN | T2_I1 | |* 3 | HASH JOIN | | | 4 | TABLE ACCESS FULL | T1 | |* 5 | HASH JOIN | | | 6 | TABLE ACCESS FULL | T4 | | 7 | TABLE ACCESS FULL | T3 | --------------------------------------- ((T2 (T1 (T4 T3)))
Questions? http://tonyhasler.wordpress.com