Matrix Computations in Convex Optimization Lecture by Ying Yuan

cse203b convex optimization lecture 4 matrix n.w
1 / 39
Embed
Share

Explore matrix computations in convex optimization through lecture slides covering key concepts, direct vs. iterative methods, error and residual analysis, and more at the University of California, San Diego.

  • Matrix Computations
  • Convex Optimization
  • Ying Yuan
  • Lecture
  • UCSD

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CSE203B Convex Optimization Lecture 4: Matrix Computations Ying Yuan Dept. of Computer Science and Engineering University of California, San Diego 1

  2. Outline Introduction Key Concepts Error and Residual Condition Number Machine Precision Iterative Methods Categories Conjugate Gradient Preconditioning GMRES 2

  3. Introduction: Direct vs. Iterative Methods Direct Methods Gaussian Elimination LU Decomposition Iterative Methods Stationary Methods Krylov Subspace Methods Multigrid Methods Preconditioned Iterations Domain Decomposition Methods 3

  4. Introduction: Direct vs. Iterative Methods Direct Methods Gaussian Elimination LU Decomposition Iterative Methods Stationary Methods Krylov Subspace Methods Multigrid Methods Preconditioned Iterations Domain Decomposition Methods General and Robust but can be complicated if N >= 1M Excellent choice for SPD matrices Remain an art for arbitrary matrices 4

  5. Outline Introduction Key Concepts Error and Residual Condition Number Machine Precision Iterative Methods Categories Conjugate Gradient Preconditioning GMRES 5

  6. Key Concepts: Error and Residual Given Exact Solution vs. Approximate Solution Exact Solution: Approximation: Error vs. Residual Error: Measures how far the approximation is from the true solution. Residual: Measures how well the approximation satisfies the original system. Relative Error vs. Relative Residual Measures how large that error is compared to the true solution. Relative Error: Relative Residual: 6

  7. Key Concepts: Error and Residual Given Exact Solution vs. Approximate Solution Exact Solution: Approximation: Error vs. Residual Error: Measures how far the approximation is from the true solution. Q: How sensitive is to small changes in ? Residual: Measures how well the approximation satisfies the original system. Relative Error vs. Relative Residual Measures how large that error is compared to the true solution. Relative Error: Relative Residual: 7

  8. Key Concepts: Condition Number If changes by a small amount , how much could changes, i.e., how large is ? Give as a non-singular system of linear equations, we have 8

  9. Key Concepts: Condition Number If changes by a small amount , how much could changes, i.e., how large is ? To measure the change, we look at relative changes 9

  10. Key Concepts: Condition Number If changes by a small amount , how much could changes, i.e., how large is ? To measure the change, we look at relative changes 10

  11. Key Concepts: Condition Number If changes by a small amount , how much could changes, i.e., how large is ? To measure the change, we look at relative changes We define the condition number of as Thus, we have 11

  12. Key Concepts: Condition Number In a similar way, we can also know how sensitive the is to a perturbation to the matrix . Then, the error of the solution is in proportion to the error of both and by a factor of the condition number. 12

  13. Key Concepts: Condition Number Recall Matrix Norms Matrix 1-norm Maximum absolute column sum of the matrix Matrix -norm Maximum absolute row sum of the matrix Matrix 2-norm (Spectral Norm) 13

  14. Key Concepts: Condition Number - Summary Definition The condition number of matrix : Interpretation Measures sensitivity of the solution to small changes in and . A large means the matrix is ill-conditioned, so small perturbations in can lead to large changes in . 14

  15. Key Concepts: Condition Number Example Use direct method, LU decomposition, to solve . is a Hilbert matrix of size . 15

  16. Key Concepts: Machine Precision Floating-Point Representation (IEEE 754 Standard) 32 bits Single Sign Exponent Mantissa 23 bits 1 bits 8 bits 64 bits Double Sign Exponent Mantissa 1 bits 11 bits 52 bits Machine Precision Machine epsilon is the upper bound on relative error due to rounding in floating-point arithmetic. It is effectively the smallest number such that ___________in the floating-point system. 32-bit (single) precision 64-bit (double) precision 16

  17. Key Concepts: Machine Precision Floating-Point Representation (IEEE 754 Standard) 32 bits Single Sign Exponent Mantissa 23 bits 1 bits 8 bits 64 bits Double Sign Exponent Mantissa 1 bits 11 bits 52 bits Every arithmetic operation can introduce tiny errors that accumulate. Machine Precision Machine epsilon is the upper bound on relative error due to rounding in floating-point arithmetic. It is effectively the smallest number such that ___________in the floating-point system. 32-bit (single) precision 64-bit (double) precision 17

  18. Key Concepts: Machine Precision Floating-Point Representation (IEEE 754 Standard) 32 bits Single Sign Exponent Mantissa 23 bits 1 bits 8 bits 64 bits Double Sign Exponent Mantissa 1 bits 11 bits 52 bits Every arithmetic operation can introduce tiny errors that accumulate. Machine Precision Machine epsilon is the upper bound on relative error due to rounding in floating-point arithmetic. It is effectively the smallest number such that ___________in the floating-point system. 32-bit (single) precision 64-bit (double) precision 18

  19. Key Concepts: Machine Precision Machine Precision Machine epsilon is the upper bound on relative error due to rounding in floating-point arithmetic. It is effectively the smallest number such that ___________in the floating-point system. 32-bit (single) precision Why it s important Rounding Errors: Every arithmetic operation can introduce tiny errors that accumulate. Ill-Conditioned Problems: If or function is large, even tiny floating-point errors can lead to big inaccuracies in the final result. 64-bit (double) precision 19

  20. Key Concepts: Machine Precision Machine Precision Machine epsilon is the upper bound on relative error due to rounding in floating-point arithmetic. It is effectively the smallest number such that ___________in the floating-point system. 32-bit (single) precision Why it s important Rounding Errors: Every arithmetic operation can introduce tiny errors that accumulate. Ill-Conditioned Problems: If or function is large, even tiny floating-point errors can lead to big inaccuracies in the final result. 64-bit (double) precision Thus, remember that zero residual is rare. 20

  21. Key Concepts: Machine Precision Example Use direct method, LU decomposition, to solve . is a Hilbert matrix of size , where . 21

  22. Key Concepts: Example Recall the example Use direct method, LU decomposition, to solve . is a Hilbert matrix of size . 22

  23. Key Concepts: Example Recall the example Use direct method, LU decomposition, to solve . is a Hilbert matrix of size . What if we continue increasing dimension? 23

  24. Key Concepts: Example Recall the example Use direct method, LU decomposition, to solve . is a Hilbert matrix of size . What if we continue increasing dimension? 24

  25. Key Concepts: Example Example: Direct vs. Iterative Try using Conjugate Gradient to solve . Conjugate Gradient uses less memory and CPU time. 25

  26. Outline Introduction Key Concepts Error and Residual Condition Number Machine Precision Iterative Methods Categories Conjugate Gradient Preconditioning GMRES 26

  27. Iterative Methods: Categories Stationary Methods Jacobi, Gauss-Seidel, SOR, etc. Krylov Subspace Methods Conjugate Gradient (CG), Generalized Minimal Residual (GMRES), BiCGSTAB, etc. Multigrid Methods Hierarchical approach using coarser/finer grids. Preconditioned Iterations Domain Decomposition Methods 27

  28. Iterative Methods: Categories Stationary Methods Non Stationary Methods each iteration. , where and do not depend on iteration count . , where computation involves information that change at 28

  29. Iterative Methods: Categories Stationary Methods Jacobi, Gauss-Seidel, SOR, etc. Krylov Subspace Methods Conjugate Gradient (CG), Generalized Minimal Residual (GMRES), BiCGSTAB, etc. Multigrid Methods Hierarchical approach using coarser/finer grids. Preconditioned Iterations Domain Decomposition Methods 29

  30. Iterative Methods: Conjugate Gradient Originally developed for solving the quadratic problem of vector : Let , then the derivative is . If is symmetric, . If is symmetric, positive-definite, is minimized by setting , which is the solution to . 30

  31. Iterative Methods: Conjugate Gradient Originally developed for solving the quadratic problem of vector : Graph of quadratic form . The minimum point of this surface is the solution to . The points in the direction of steepest increase of . 31

  32. Iterative Methods: Conjugate Gradient Originally developed for solving the quadratic problem of vector : a) Positive definite matrix b) Negative-definite matrix c) Singular matrix d) Positive indefinite matrix 32

  33. Iterative Methods: Conjugate Gradient Originally developed for solving the quadratic problem of vector : Assumption: is a symmetric positive-definite (SPD) matrix. Equivalent to solving . 33

  34. Iterative Methods: Conjugate Gradient One goal of conjugate gradient is to accelerate the steepest descent. Steepest descent Main Idea: Move in the direction of the steepest negative gradient at each iteration. Iteration: Compute negative gradient as search direction . Compute optimal step size . Update solution . Drawback: Previously corrected directions can repeat. Zigzag behavior can occur, leading to slow convergence. 34

  35. Iterative Methods: Conjugate Gradient One goal of conjugate gradient is to accelerate the steepest descent. Steepest descent Main Idea: Move in the direction of the steepest negative gradient at each iteration. Iteration: Compute negative gradient as search direction . Compute optimal step size . Update solution . Drawback: Previously corrected directions can repeat. Zigzag behavior can occur, leading to slow convergence. 35

  36. Iterative Methods: Conjugate Gradient One goal of conjugate gradient is to accelerate the steepest descent. Steepest descent Main Idea: Move in the direction of the steepest negative gradient at each iteration. Iteration: Compute negative gradient as search direction . Compute optimal step size . Update solution . Drawback: Previously corrected directions can repeat. Zigzag behavior can occur, leading to slow convergence. 36

  37. Iterative Methods: Conjugate Gradient Conjugate Gradient Main Idea: Make search direction -orthogonal or -conjugate: . Each new direction is chosen to be orthogonal to all previous directions. This ensures each direction is new and not re-doing work from the previous iterations. Iteration: Start with , residual , and direction . At each step 1. 2. Drawback: Only for SPD. May still need advanced preconditioners for ill-conditioned problems. Steepest descent Conjugate gradient 3. 4. 5. 37

  38. Iterative Methods: Preconditioners Recall Condition Number High slow (or unstable) convergence for iterative methods. Preconditioning Main Idea: Solve instead of , where is a matrix that approximates but is easier to invert. Goal Reduce . Speed up convergence. 38

  39. Iterative Methods: GMRES GMRES Main Idea: Works for general matrices (not necessarily SPD). Builds an orthonormal basis of the Krylov subspace via Arnoldi iteration. Minimizes the residual in each subspace. Equivalently, we view GMRES as solving a least squares objective (minimizing ), the Hessian ends up being , whose condition number is . Iteration: For each iteration, find such that . Drawback: Need to store multiple previous directions. Memory usage grows with the iteration count. Usually restarted every k iterations to use less space. 39

Related


More Related Content