Efficient Algorithms for Relational Database Systems Operations

csce 608 database systems n.w
1 / 58
Embed
Share

Explore efficient algorithms for implementing relational algebraic operations like projection, selection, set/bag operations, join operations, and more in relational database systems. Learn about one-pass and multi-pass algorithms, as well as two-pass techniques such as sort-based and hash-based approaches for handling large relations that do not fit into main memory.

  • Database Systems
  • Algorithms
  • Relational Operations
  • Efficiency
  • Two-pass

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 32: Multi-pass algorithms

  2. in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager Efficient Algorithms for Relational algebriac operations index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS

  3. Algorithms Implementing Relational Algebraic Operations Projection and selection , Set/bag operations US, S, S, UB, B, B Join operations , , C Extended operations , , , table-scan 3

  4. Algorithms Implementing Relational Algebraic Operations Quick Review Operations requiring almost no space: , , UB, table-scan Nested-loop Algorithms For binary operations: US, S, S, Memory: M 2 One-pass Algorithms: , , , US, S, S, B, B, Unary: Memory: M B(R) Cost: B(R) , , , , C C Memory: M = 2 Cost: (R), (R), table-scan(R): B(R) Cost: B(R)*B(S)/M + B(R) Binary: Memory: M B(Rsmall) Cost: B(Rsmall) + B(Rlarge) UB(R, S): B(R) + B(S) 4 , , US, S, S, UB, B, B, , , , table-scan, , , C

  5. Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: 1. Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; 2. Apply operation based on merging the blocks from the smaller and more organized pieces. Two main techniques: 1. Sort-based; 2. Hash-based. 5 5 , , US, S, S, UB, B, B, , , , table-scan, , , C

  6. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; R 6 6 , , US, S, S, UB, B, B, , , , table-scan, , , C

  7. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) main memory R 7 7 , , US, S, S, UB, B, B, , , , table-scan, , , C

  8. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) main memory R Phase I 8 8 , , US, S, S, UB, B, B, , , , table-scan, , , C

  9. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) main memory R R End of phase I 9 9 , , US, S, S, UB, B, B, , , , table-scan, , , C

  10. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) main memory R Phase II 10 10 , , US, S, S, UB, B, B, , , , table-scan, , , C

  11. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) main memory R R Phase II 11 11 , , US, S, S, UB, B, B, , , , table-scan, , , C

  12. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) (R) 1. Remove all minimums except one; 2. Output the minimum main memory R R Phase II 12 12 , , US, S, S, UB, B, B, , , , table-scan, , , C

  13. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) (R) 1. Remove all minimums except one; 2. Output the minimum main memory R R Phase II Remark. read in the next block from a sublist if its block is exhausted (Apply to all algorithms) 13 13 , , US, S, S, UB, B, B, , , , table-scan, , , C

  14. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) (R) \\ sublists are sorted by \\ the grouping attributes main memory 1. Group all tuples with the minimum grouping attributes; 2. Calculate the aggregation value; 3. Output a grouping tuple. R R Phase II 14 14 , , US, S, S, UB, B, B, , , , table-scan, , , C

  15. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Unary operations (R), (R), (R) Summary: Memory: M B(R) Cost: 3B(R) main memory R R Phase II 15 15 , , US, S, S, UB, B, B, , , , table-scan, , , C

  16. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , main memory R R S 16 16 , , US, S, S, UB, B, B, , , , table-scan, , , C

  17. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , main memory R R Phase I S 17 17 , , US, S, S, UB, B, B, , , , table-scan, , , C

  18. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , main memory R R End of phase I S 18 18 , , US, S, S, UB, B, B, , , , table-scan, , , C

  19. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , main memory R Phase II S 19 19 , , US, S, S, UB, B, B, , , , table-scan, , , C

  20. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , main memory R Phase II S 20 20 , , US, S, S, UB, B, B, , , , table-scan, , , C

  21. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , main memory R May build an efficient data structure for searching the minimum. Phase II S 21 21 , , US, S, S, UB, B, B, , , , table-scan, , , C

  22. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , RUS S main memory REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE send the smaller to output, and delete it. R Phase II S 22 22 , , US, S, S, UB, B, B, , , , table-scan, , , C

  23. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R S S main memory REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE delete the smaller. R Phase II S 23 23 , , US, S, S, UB, B, B, , , , table-scan, , , C

  24. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R S S main memory REPEAT IF Rmin = Smin THEN send one copy to output; delete both; ELSE delete the smaller. R Phase II S The same algorithm works for R B S! 24 24 , , US, S, S, UB, B, B, , , , table-scan, , , C

  25. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R S S main memory REPEAT IF Rmin = Smin THEN delete both; ELSE IF Rmin > Smin THEN delete Smin; ELSE \\ Rmin < Smin send Rmin to output; and delete Rmin. R Phase II S 25 25 , , US, S, S, UB, B, B, , , , table-scan, , , C

  26. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R S S main memory REPEAT IF Rmin = Smin THEN delete both; ELSE IF Rmin > Smin THEN delete Smin; ELSE \\ Rmin < Smin send Rmin to output; and delete Rmin. R Phase II S The same algorithm works for R B S! 26 26 , , US, S, S, UB, B, B, , , , table-scan, , , C

  27. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R S main memory R Phase II S 27 27 , , US, S, S, UB, B, B, , , , table-scan, , , C

  28. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R S main memory Sorted sublists do not seem to help computing R S: each tuple of R should be joined with all tuples of S R Phase II S 28 28 , , US, S, S, UB, B, B, , , , table-scan, , , C

  29. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , CS R S R main memory Sorted sublists do not seem to help computing R S: each tuple of R should be joined with all tuples of S The same is true for R C S R Phase II S 29 29 , , US, S, S, UB, B, B, , , , table-scan, , , C

  30. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , CS R S R main memory Simply use the Nested-Loop algorithm Memory: M 2 cost: B(R)B(S)/M + B(R) R Phase II S 30 30 , , US, S, S, UB, B, B, , , , table-scan, , , C

  31. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R S How about R S? main memory R Phase II S 31 31 , , US, S, S, UB, B, B, , , , table-scan, , , C

  32. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R S How about R S? main memory R In extreme cases (a large number of tuples are joinable), the sort-based algorithm does not work for . On the other hand, most practical and interesting cases are not that extreme Phase II S 32 32 , , US, S, S, UB, B, B, , , , table-scan, , , C

  33. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , R \\ sublists are sorted by the join \\ attributes. REPEAT IF Rmin = Smin THEN collect all Rmin-tuples and all Smin-tuples, and send their join to the output; delete all Rmin and Smintuples; ELSE delete the smaller; S main memory R Phase II S 33 33 , , US, S, S, UB, B, B, , , , table-scan, , , C

  34. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , Summary: main memory Memory: M B(R) + B(S) Cost: 3(B(R) + B(S)) R Phase II S 34 34 , , US, S, S, UB, B, B, , , , table-scan, , , C

  35. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , Summary: main memory Comments: Memory: M B(R) + B(S) Cost: 3(B(R) + B(S)) R Phase II S 35 35 , , US, S, S, UB, B, B, , , , table-scan, , , C

  36. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , Summary: main memory Comments: 1. Applicable unless the relations are extremely large, in that case we can extend the method to multiway pass; Memory: M B(R) + B(S) Cost: 3(B(R) + B(S)) R Phase II S 36 36 , , US, S, S, UB, B, B, , , , table-scan, , , C

  37. Two-pass sort-based algorithms General framework: 1. Repeat (Phase 1. making sorted sublists) Fill the main memory with tuples of a relation; Make them a sorted sublist; Write the sorted sublist back to disk. 2. Repeat (Phase 2. Merging) Bring in a block from each of the sorted sublists; apply operation by merging the sorted blocks; Binary operations on two relations R and S: US, S, S, B, B, , C , Summary: main memory Comments: 1. Applicable unless the relations are extremely large, in that case we can extend the method to multiway pass; 2. The output is sorted. Memory: M B(R) + B(S) Cost: 3(B(R) + B(S)) R Phase II S 37 37 , , US, S, S, UB, B, B, , , , table-scan, , , C

  38. Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: 1. Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; 2. Apply operation based on merging the blocks from the smaller and more organized pieces. Two-pass sort-based algorithms: , , , US, S, S, B, B, Unary: Memory: M B(R) Cost: 3B(R) Two main techniques: 1. Sort-based; 2. Hash-based. Binary: Memory: M B(R) + B(S) Cost: 3(B(R) + B(S)) 38 38 , , US, S, S, UB, B, B, , , , table-scan, , , C

  39. Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: 1. Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; 2. Apply operation based on merging the blocks from the smaller and more organized pieces. Two main techniques: 1. Sort-based; 2. Hash-based. 39 39 , , US, S, S, UB, B, B, , , , table-scan, , , C

  40. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. R 40 40 , , US, S, S, UB, B, B, , , , table-scan, , , C

  41. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I 41 41 , , US, S, S, UB, B, B, , , , table-scan, , , C

  42. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I 42 42 , , US, S, S, UB, B, B, , , , table-scan, , , C

  43. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. main memory Phase I 43 43 , , US, S, S, UB, B, B, , , , table-scan, , , C

  44. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. R 44 44 , , US, S, S, UB, B, B, , , , table-scan, , , C

  45. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. One-pass Algorithms: , , , US, S, S, B, B, Unary: Memory: M B(R) Cost: B(R) , , C R Binary: Memory: M B(Rsmall) Cost: B(Rsmall) + B(Rlarge) 45 45 , , US, S, S, UB, B, B, , , , table-scan, , , C

  46. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations (R), (R), (R) main memory R 46 46 , , US, S, S, UB, B, B, , , , table-scan, , , C

  47. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations (R), (R), (R) main memory R Phase I 47 47 , , US, S, S, UB, B, B, , , , table-scan, , , C

  48. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations (R), (R), (R) main memory Bucket 1 Bucket 2 R Phase I Bucket M 48 48 , , US, S, S, UB, B, B, , , , table-scan, , , C

  49. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations (R), (R), (R) main memory Bucket 1 Bucket 2 R End of phase I Bucket M 49 49 , , US, S, S, UB, B, B, , , , table-scan, , , C

  50. Two-pass hash-based algorithms General framework: 1. (Phase 1. making hash buckets) Hash tuples into M buckets (using one block for each bucket); write the buckets back to disk. 2. (Phase 2. bucketwise operation) apply the operation based on buckets. Unary operations (R), (R), (R) main memory Bucket 1 Bucket 2 R One bucket per time Phase II Bucket M 50 50 , , US, S, S, UB, B, B, , , , table-scan, , , C

Related


More Related Content