Efficient Algorithms for Relational Algebraic Operations in Database Systems

csce 608 database systems n.w
1 / 73
Embed
Share

Explore one-pass and nested loop algorithms, table locking, query languages, and more in the context of database systems. Learn about efficient operations, memory management, and relational algebraic operations implementing projections, selections, joins, and more.

  • Algorithms
  • Database Systems
  • Relational Algebra
  • Efficient Operations
  • Memory Management

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSCE-608 Database Systems Spring 2025 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 31: One-pass algorithms and nested loop algorithms

  2. in tables (relations) lock table DDL language DDL complier database administrator concurrency control file logging & recovery manager transaction manager Efficient Algorithms for Relational algebriac operations index/file manager buffer manager DML (query) language query execution engine database programmer DML complier main memory buffers secondary storage (disks) DBMS

  3. Algorithms Implementing Relational Algebraic Operations Projection and selection , Set/bag operations US, S, S, UB, B, B Join operations , , C Extended operations , , , table-scan 3

  4. Operations requiring (almost) no M Tuple-based operations: , , UB, table-scan General framework: 1. Read in a block; 2. Process; 3. Send to the output Memory: M = 2 Cost: (R), (R), table-scan(R): B(R) UB(R, S): B(R) + B(S) main memory process disk 4 , , US, S, S, UB, B, B, , , , table-scan, , , C

  5. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation R; 2. Process R; 3. Read in the other relation S block by block; 4. Sent the results to an output block Unary operations (R), (R), (R) Summary: Memory: M B(R) Cost: B(R) main memory R R process 5 5 , , US, S, S, UB, B, B, , , , table-scan, , , C

  6. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rsmall Rsmall Rlarge process disk 6 6 , , US, S, S, UB, B, B, , , , table-scan, , , C

  7. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rsmall Rsmall Build an efficient data structure for Rsmall (e.g., sort Rsmall) Rlarge process disk 7 7 , , US, S, S, UB, B, B, , , , table-scan, , , C

  8. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rlarge US Rsmall Rsmall 1. Sort Rsmall; 2. FOR each tuple t in Rlarge DO IF t is not in Rsmall THEN put t to the output; 3. Send Rsmall to the output. Rsmall Rlarge process disk 8 8 , , US, S, S, UB, B, B, , , , table-scan, , , C

  9. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rlarge S Rsmall Rsmall 1. Sort Rsmall; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN put t to the output; Rsmall Rlarge process disk 9 9 , , US, S, S, UB, B, B, , , , table-scan, , , C

  10. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rsmall Rsmall Rlarge process disk 10 10 , , US, S, S, UB, B, B, , , , table-scan, , , C

  11. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , S is not commutative main memory Rsmall Rsmall Rlarge process disk 11 11 , , US, S, S, UB, B, B, , , , table-scan, , , C

  12. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , S is not commutative main memory Rlarge S Rsmall Rsmall Rsmall 1. sort Rsmall; 2. FOR each tuple t in Rlarge DO IF t is not in Rsmall THEN put t to the output. Rlarge process disk 12 12 , , US, S, S, UB, B, B, , , , table-scan, , , C

  13. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , S is not commutative main memory Rsmall S Rlarge Rsmall Rsmall 1. sort Rsmall; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN remove t from Rsmall; 3. send Rsmall to the output Rlarge process disk 13 13 , , US, S, S, UB, B, B, , , , table-scan, , , C

  14. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rlarge B Rsmall Rsmall 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN output t; and remove a copy of t from Rsmall Rsmall Rlarge process disk 14 14 , , US, S, S, UB, B, B, , , , table-scan, , , C

  15. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , B is not commutative main memory Rsmall Rsmall Rlarge process disk 15 15 , , US, S, S, UB, B, B, , , , table-scan, , , C

  16. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , B is not commutative main memory Rlarge B Rsmall Rsmall Rsmall 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is not in Rsmall THEN output t ELSE remove a copy of t from Rsmall; Rlarge process disk 16 16 , , US, S, S, UB, B, B, , , , table-scan, , , C

  17. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , B is not commutative main memory Rsmall B Rlarge Rsmall Rsmall 1. Make Rsmall a balance tree; 2. FOR each tuple t in Rlarge DO IF t is in Rsmall THEN remove a copy of t from Rsmall; 3. Output Rsmall. Rlarge process disk 17 17 , , US, S, S, UB, B, B, , , , table-scan, , , C

  18. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rlarge Rsmall Rsmall 1. FOR each tuple t in Rlarge DO cross join t and each tuple in Rsmall andsendto the output. Rsmall Rlarge process disk 18 18 , , US, S, S, UB, B, B, , , , table-scan, , , C

  19. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rlarge Rsmall C Rsmall 1. FOR each tuple t in Rlarge DO cross join t and each tuple in Rsmall ; IF the join satisfies C THENsendto the output. Rsmall Rlarge process disk 19 19 , , US, S, S, UB, B, B, , , , table-scan, , , C

  20. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Rlarge Rsmall Rsmall 1. sort Rsmall by join attributes A; 2. FOR each tuple t in Rlarge DO find the tuples in Rsmall with the same A-value; join them with t and put in the output block Rsmall Rlarge process disk 20 20 , , US, S, S, UB, B, B, , , , table-scan, , , C

  21. One-pass algorithms Condition: the main memory M is sufficiently large General framework: 1. Read in an entire relation Rsmall; 2. Process Rsmall; 3. Read in the other relation Rlarge block by block; 4. Sent the results to an output block Binary operations on two relations R and S: US, S, S, B, B, , C , main memory Summary: Memory: M B(Rsmall) Cost: B(Rsmall) + B(Rlarge) Rsmall Rsmall Rlarge process disk 21 21 , , US, S, S, UB, B, B, , , , table-scan, , , C

  22. Algorithms Implementing Relational Algebraic Operations Quick Review Operations requiring almost no space: , , UB, table-scan One-pass Algorithms: , , , US, S, S, B, B, Unary: Memory: M B(R) Cost: B(R) Binary: Memory: M B(Rsmall) Cost: B(R) + B(S) , , C Memory: M = 2 Cost: (R), (R), table-scan(R): B(R) UB(R, S): B(R) + B(S) 22 , , US, S, S, UB, B, B, , , , table-scan, , , C

  23. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. 23 23 , , US, S, S, UB, B, B, , , , table-scan, , , C

  24. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS 24 24 , , US, S, S, UB, B, B, , , , table-scan, , , C

  25. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS Binary operations on two relations R and S: US, S, S, B, B, , C , 25 25 , , US, S, S, UB, B, B, , , , table-scan, , , C

  26. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS R US S 1. \\ in the first execution of the \\ tR-loop, output tS; 2. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 3. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, S, S, B, B, , C , 26 26 , , US, S, S, UB, B, B, , , , table-scan, , , C

  27. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS R S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is marked THEN output tR. Binary operations on two relations R and S: US, S, S, B, B, , C , 27 27 , , US, S, S, UB, B, B, , , , table-scan, , , C

  28. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS R S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, S, S, B, B, , C , 28 28 , , US, S, S, UB, B, B, , , , table-scan, , , C

  29. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS R S S 1. \\ in an execution of the tR-loop IF tR = tS THEN mark tR; 2. \\ at the end of the tR-loop IF tR is unmarked THEN output tR. Binary operations on two relations R and S: US, S, S, B, B, , C , Not working for S S R because S is not commutative 29 29 , , US, S, S, UB, B, B, , , , table-scan, , , C

  30. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS Binary operations on two relations R and S: US, S, S, B, B, , C , 30 30 , , US, S, S, UB, B, B, , , , table-scan, , , C

  31. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS R B S and R B S Nested-loop does not seem to be effective for R B S and R B S Remark: we cannot simply mark tR. Binary operations on two relations R and S: US, S, S, B, B, , C , 31 31 , , US, S, S, UB, B, B, , , , table-scan, , , C

  32. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS R B S and R B S Nested-loop does not seem to be effective for R B S and R B S Remark: we cannot simply mark tR. Binary operations on two relations R and S: US, S, S, B, B, , C , 32 32 , , US, S, S, UB, B, B, , , , table-scan, , , C

  33. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS Binary operations on two relations R and S: US, S, S, B, B, , C , 33 33 , , US, S, S, UB, B, B, , , , table-scan, , , C

  34. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop is particularly simple for join operations Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS Binary operations on two relations R and S: US, S, S, B, B, , C , 34 34 , , US, S, S, UB, B, B, , , , table-scan, , , C

  35. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Nested-loop is particularly simple for join operations Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS R join S IF tR and tS are joinable THEN Join tR and tS; IF (the join is or ) THEN output the join; ELSE \\ the join is C output the join if it satisfies C Binary operations on two relations R and S: US, S, S, B, B, , C , 35 35 , , US, S, S, UB, B, B, , , , table-scan, , , C

  36. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS Memory: M 2 Cost: T(R)*T(S) + T(R) Binary operations on two relations R and S: US, S, S, B, B, , C , 36 36 , , US, S, S, UB, B, B, , , , table-scan, , , C

  37. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR each tuple tR in R DO FOR each tuple tS in S DO Apply the operation on tR and tS Memory: M 2 Cost: T(R)*T(S) + T(R) Binary operations on two relations R and S: US, S, S, B, B, , C , Very bad 37 37 , , US, S, S, UB, B, B, , , , table-scan, , , C

  38. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR each tuple tR in R DO FOR each in S DO Apply the operation on tR and the tuples in bS Binary operations on two relations R and S: US, S, S, B, B, , C , Memory: M 2 Cost: T(R)*B(S) + T(R) block bS 38 38 , , US, S, S, UB, B, B, , , , table-scan, , , C

  39. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR each tuple tR in R DO FOR each in S DO Apply the operation on tR and the tuples in bS Binary operations on two relations R and S: US, S, S, B, B, , C , Memory: M 2 Cost: T(R)*B(S) + T(R) block bS Still large 39 39 , , US, S, S, UB, B, B, , , , table-scan, , , C

  40. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR each in R DO FOR each in S DO Apply the operation on the tuples in bR and the tuples in bS Binary operations on two relations R and S: US, S, S, B, B, , C , block bR Memory: M 2 Cost: B(R)*B(S) + B(R) block bS 40 40 , , US, S, S, UB, B, B, , , , table-scan, , , C

  41. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR each in R DO FOR each in S DO Apply the operation on the tuples in bR and the tuples in bS Binary operations on two relations R and S: US, S, S, B, B, , C , block bR Memory: M 2 Cost: B(R)*B(S) + B(R) block bS Can it be further improved? 41 41 , , US, S, S, UB, B, B, , , , table-scan, , , C

  42. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR in R DO FOR each in S DO Apply the operation on the tuples in Rand the tuples in bS Binary operations on two relations R and S: US, S, S, B, B, , C , max # blocks fitting in M Memory: M 2 Cost: B(R)*B(S)/M + B(R) block bS 42 42 , , US, S, S, UB, B, B, , , , table-scan, , , C

  43. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR in R DO FOR each in S DO Apply the operation on the tuples in Rand the tuples in bS Binary operations on two relations R and S: US, S, S, B, B, , C , max # blocks fitting in M Memory: M 2 Cost: B(R)*B(S)/M + B(R) block bS Very good if B(R) or B(S) is only slightly larger than M 43 43 , , US, S, S, UB, B, B, , , , table-scan, , , C

  44. For larger relations When relations cannot fit in main memory, one- pass algorithms cannot be used. A generic algorithm for binary operations: Summary: Nested-loop (R S): FOR in R DO FOR each in S DO Apply the operation on the tuples in Rand the tuples in bS Binary operations on two relations R and S: US, S, S, B, B, , C , max # blocks fitting in M Memory: M 2 Cost: B(R)*B(S)/M + B(R) block bS Should pick the smaller relation for the outer loop (not working for S ) 44 44 , , US, S, S, UB, B, B, , , , table-scan, , , C

  45. Algorithms Implementing Relational Algebraic Operations Quick Review Operations requiring almost no space: , , UB, table-scan Nested-loop Algorithms For binary operations: US, S, S, Memory: M 2 One-pass Algorithms: , , , US, S, S, B, B, Unary: Memory: M B(R) Cost: B(R) , , , , C C Memory: M = 2 Cost: (R), (R), table-scan(R): B(R) Cost: B(R)*B(S)/M + B(R) Binary: Memory: M B(Rsmall) Cost: B(Rsmall) + B(Rlarge) UB(R, S): B(R) + B(S) 45 , , US, S, S, UB, B, B, , , , table-scan, , , C

  46. Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. 46 46 , , US, S, S, UB, B, B, , , , table-scan, , , C

  47. Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: 1. Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; 2. Apply operation based on merging the blocks from the smaller and more organized pieces. 47 47 , , US, S, S, UB, B, B, , , , table-scan, , , C

  48. Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: 1. Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; 2. Apply operation based on merging the blocks from the smaller and more organized pieces. Two main techniques: 1. Sort-based; 2. Hash-based. 48 48 , , US, S, S, UB, B, B, , , , table-scan, , , C

  49. Two-pass algorithms Condition: large relations that cannot fit into the main memory M, but not extremely large. Basic Ideas: 1. Break relations into smaller pieces that fit in the main memory, make them more organized, and store them back to disk; 2. Apply operation based on merging the blocks from the smaller and more organized pieces. Two main techniques: 1. Sort-based; 2. Hash-based. 49 49 , , US, S, S, UB, B, B, , , , table-scan, , , C

  50. Two-pass sort-based algorithms 50 50 , , US, S, S, UB, B, B, , , , table-scan, , , C

Related


More Related Content