Understanding Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

slide1 n.w
1 / 64
Embed
Share

Learn about Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) techniques for matrix analysis, including the process of eigenvector-eigenvalue decomposition, application of SVD to non-square matrices, and calculation of important components. Explore the steps involved in SVD, orthogonality of eigenvector sets, and sorting eigenvalues. Gain insights into the SVD of matrices and singular values. Example included for practical understanding.

  • SVD
  • PCA
  • Matrix Analysis
  • Eigenvectors
  • Eigenvalues

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. 673 8. ComponentAnalysis Section 8.1 Singular Value Decomposition (SVD) Section 8.2 Principal Component Analysis (PCA)

  2. 674 8.1 Singular Value Decomposition If A is a square matrix, then we can perform eigenvector-eigenvalue decomposition for A: -1 = = A A EDE e f H H H H + + + + 2 2 e f e N-1 N-1 f N N e f 1 1 1 2 1 N N where H 1 E = e e e E = f f f , 1 2 N 1 2 N = Ae e m m m If | m| is the largest, then H m m e f m is the most important component of A.

  3. 675 8.1.1 Singular Value Decomposition Process Q: How do we perform eigenvector-eigenvalue decomposition for A if A is not a square matrix? ( ) , size M N M = A N We can apply the singular value decomposition (SVD) process as follows. (1) Generate B and C H H C = AA B = A A (Note): Since B is an NxN square matrix, C is an MxM square matrix, therefore, it is possible to derive the eigenvector sets for B and C.

  4. 676 H H C = AA B = A A (2) Perform Eigenvector-Eigenvalue Decomposition for B and C -1 -1 C = U U B = VDV (Note): Since BH= B, CH= C, B and C have orthogonal eigenvector sets and and V are orthogonal matrices. U U and V properly such that U U (i) It is proper to normalize H = H I = V V I then H H C = U U B = VDV (ii) It is proper to sort the eigenvalues of B and C from large to small. The eigenvectors are also sorted according to eigenvalues.

  5. 677 (3) Then, we calculate H S = U AV 1 S1will be an MxN diagonal matrix 1 S m n = , 0 if m n (4) Varying the sign of S1andU S m n = , , S m n 1 = if S1[n, n] 0, , , U m n U m n U m n = , , if S1[n, n] < 0, U m n (Note): With sign change, -1 C = U U and H S = U AV are still satisfied.

  6. 678 (5) Then, the SVD of A is H = A USV eigenvector matrix of AAH, size: MxM diagonal matrix, size: MxN eigenvector matrix of AHA, size: NxN If U = u u u V = v v v , 1 2 M 1 2 N then H 1 H 2 H K-1 s H K = + + + + A u v u v u v u v s s s s 1 2 K-1 s K 1 2 1 K K s s 1 2 1 K K ( ) = min , K M N where most second significant = significant , ns S n n

  7. 679 0 s 0 0 s 1 0 2 0 s 0 0 0 0 0 0 s 1 0 2 = = S S 0 0 0 0 s N 0 0 0 0 0 s M 0 0 0 if M < N if M > N skis call the singular value

  8. 680 [Example 1] Perform the SVD for the following matrix 2 2 = A 1 1 1 1 (Solution): First, we determine 8 0 0 0 2 2 0 2 2 6 2 2 6 H = C = AA H = B = A A Then, we perform eigenvector-eigenvalue decomposition for B and C: 1/ 2 = 8 0 0 4 1/ 2 H = where D B = VDV V 1/ 2 1/ 2

  9. 681 H C = U U 1 0 0 8 0 0 0 4 0 0 0 0 where = U 0 1/ 2 1/ 2 = 0 1/ 2 1/ 2 Note: The eigenvectors should be (i) normalized and (ii) sorted according to the magnitudes of the eigenvalues. Then, 8 0 H = S = U AV 0 0 2 1 0

  10. 682 Then, 8 0 8 0 2 0 = = S 0 0 2 0 0 0 Since S1[2, 2] < 0, we change the sign of the 2ndcolumn of U and obtain 1 0 0 = U 0 1/ 2 1/ 2 0 1/ 2 1/ 2 Therefore, H where = A USV 1 0 0 8 0 2 0 1/ 2 = 1/ 2 = V S 0 0 = 1/ 2 U 0 1/ 2 1/ 2 1/ 2 0 1/ 2 1/ 2

  11. 683 Note that H 1 H 2 = + A u v u v s s 1 2 1 2 0 1 1 1 1 1 2 = + 1/ 2 A 8 0 2 2 2 2 0 1/ 2 principal component minor component

  12. 684 (Note): (1) In fact, the eigenvalues of B and C has a close relation to the singular values of A. H H = = S S S D SS 2 = = , , , n n D n n n n Since H = A USV H H H H H H = = B = A A VS U USV VS SV

  13. 685 (Note): (2) Even when M = N (i.e., A is a square matrix), the SVD may not be the same as the eigenvector-eigenvalue decomposition. For the SVD, U and V are both orthonormal matrices and the singular values are non-negative. However, for a square matrix, the eigenvectors may not be orthogonal and the eigenvalues can be negative (even complex). (3) Moreover, since U and V are usually different and VH U-1, one cannot use the SVD to compute the power of a matrix.

  14. 686 [Example 2] Determine the eigenvector-eigenvalue decomposition and the SVD of A. 2 0 1 1 = A (Solution): The eigenvalues of A are 2 and -1. The eigenvectors corresponding to 2 is [1 0]T The eigenvectors corresponding to -1 is [1, 3]T Therefore, 1 1 0 2 0 0 1 0 1/3 1/3 = A 3 1

  15. 687 To perform SVD for A, 4 2 5 1 1 1 V ?? H H = = B = A A C = AA 2 2 ? 0.8507 -0.5257 0.5257 0.8507 5.2361 0 0 0.8507 0.5257 -0.5257 0.8507 = B 0.7639 H U U 0.9732 0.2298 0.2298 0.9732 5.2361 0 0 0.9732 0.2298 0.2298 0.9732 = C 0.7639 H 0.9732 0.2298 0.2298 0.9732 0.8507 -0.5257 0.5257 0.8507 2.2882 0 0 = A 0.8740 0.9732 0.2298 0.2298 -0.9732 2.2882 0 0 0.8507 0.5257 -0.5257 0.8507 = A 0.8740

  16. 688 8.1.2 Generalized Inverse Using the SVD Suppose that the SVD of A is H = A USV Then the generalized inverse of A is + + H = A VS U where + [ , ] 1/ [ , ] S n n = [ , ] 0 S n n if S n n + = ) = [ , ] S n n ( size 0 [ , ] 0 if S n n ( ) S + = = S N M if size M N

  17. 689 (Proof): + + + H H H H = = AA A USV VS U USV USS SV If + = S S S 2 then = = = 2[ , ] S n n 0 [ , ] 0 2[ , ] 1 S n n [ , ] 0 if S n n if S n n Therefore, + = S SS S + H = A = AA A + AA A USV A (1) is satisfied. =

  18. 690 Note: The generalized inverse derived from the SVD is in fact the pseudo inverse since + + + = A AA A (2) (3)( (4) ( ) ) H + + = AA AA H + + = A A A A are all satisfied. (Try to prove them)

  19. 691 [Example 3] Determine the generalized inverse of the following matrix 2 2 2 2 2 2 1 2 1 1 2 1 = A Note: Since the 1stand the 3rdcolumns are dependent, we cannot use the method of 1 ( ) A A A T T to determine the generalized inverse. Instead, we should apply the SVD method. = A USV A + + H H = VS U

  20. 692 (Solution): Since 2 2 1 2 2 2 2 1 = A 2 1 2 1 12 12 12 12 0 0 0 0 6 0 0 10 4 10 4 10 4 10 H = C = AA H = B = A A 16 4 0 0 6 6 6 H B = VDV 1/ 3 1/ 6 1/ 2 24 0 0 0 0 0 0 where = = V D 1/ 3 2/ 6 0 12 0 1/ 2 1/ 3 1/ 6

  21. 693 H where C = U U 1/ 2 = 0 0 1/ 2 24 0 0 0 0 0 0 0 0 0 0 0 0 1/ 2 0 0 1/ 2 12 0 0 = U 0 1/ 2 1/ 2 0 0 1/ 2 1/ 2 0 Then 24 0 0 0 0 0 12 0 0 0 0 0 H = S = U AV 1 Since all entries of S1are non-negative, = S S U = U 1

  22. 694 1/ 2 = 0 0 1/ 2 1/ 2 0 0 1/ 2 H = U A USV 0 1/ 2 1/ 2 0 0 1/ 2 1/ 2 0 24 0 0 1/ 3 1/ 24 = 1/12 1/6 1/12 1/ 6 1/ 2 0 0 0 12 0 0 0 0 0 = S = V 1/ 3 2/ 6 0 1/ 2 1/ 3 1/ 6 0 0 0 + S 0 0 1/ 12 0 0 0 0 0 + + H = A VS U 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/6 1/12 + = A

  23. 695 8.2 Principal ComponentAnalysis Principal component analysis (PCA) is to find the principal component of a set of data. Principal components: Corresponding to larger singular values for SVD

  24. 696 [Process of PCA] Suppose that there is a set of data. The number of data is M and each data has the length of N. = x x x x m N x m ,1 ,2 ,3 , m m m m = 1, 2, , M (In usual, M >> N) M 1 M = x m n x (1) First, we subtract each entry by , n = 1 m = a a a a m N a m ,1 ,2 ,3 , m m m M 1 M = = x m n x , a m n x x where , n , , m n n = 1 m

  25. 697 (2) Then, construct an MxN matrix A: a a a a a a a a 1,1 1,2 1, n 1 2,1 2,2 2, n 2 = = A a a a a ,1 ,2 , m m m n M (3) Then, perform SVD for A Nth (N-1)th important second important H = A USV important (4) Then H 1 H 2 H N-1 H N = + + + + A u v u v u v u v s s s s 1 2 N-1 N 1 2 1 N N = , ns S n n where most important U = u u u V = v v v , 1 2 M 1 2 N

  26. 698 If we want to reduce the component from N to L due to the consideration of compression or feature selection, then H 1 H 2 H L = + + + A A u v u v s u v s s 1 1 2 L L 1 2 Note: H 1 H 2 H L + + + + x v v v c c m L c x x x m ,1 ,2 , 1 2 m m L = m n c n n s u m where , mthentry of un H 1 H 2 H L v v v , , , can be viewed as the most important L axes In general, H 1 H 2 H L + + + + x v v v c c c x x x 1 2 1 2 L L ( ) nc ,

  27. 699 Main Applications of the PCA (1) Dimensionality reduction (i.e., feature selection) for pattern recognition and machine learning (2) Data compression (3) Data mining (4) Identifying the principal axis of an object in an image (5) Line approximation

  28. 700 Example of PCA From 2022

  29. 701 [Example 1] Suppose that there are 5 points in a 2-D space and their coordinates are (7,8), (9,8), (10, 10), (11,12), (13,12) Try to find a line that can approximate these points. (Note): M = 5, N = 2 (Solution): First, since the mean of these 5 points is (10, 10), we subtract these points by (10, 10) and obtain (-3, -2), (-1 -2), (0, 0), (1, 2), (3, 2)

  30. 702 (-3, -2), (-1 -2), (0, 0), (1, 2), (3, 2) Then, we construct a 5x2 matrix A: 3 1 2 2 = A 0 1 3 0 2 2 Then, we perform SVD for A: 5.8416 0 0 0 0 0 H A = USV 1.3695 0 0 0 = S 0.6116 0.3549 0 0.3549 0.6116 0.3549 0.6116 0 0.6116 0.3549 0 0 1 0 0 0.0393 0.7060 0 0.7060 0.0393 0.7060 0.0393 0 0.0393 0.7060 = U 0.7497 0.6618 0.6618 0.7497 = V

  31. 703 Then, A can be expanded by 0.6116 0.3549 0 0.3549 0.6116 0.3549 0.6116 0 0.6116 0.3549 + 0.6618 A = 5.8416 0.7497 0.6618 1.3695 0.7497 principal component secondary component Therefore, 0.6116 0.3549 0 0.3549 0.6116 3.5726 2.0733 0 2.0733 3.5726 = A 5.8416 0.7497 0.6618 0.7497 0.6618

  32. 704 7 9 0 11 12 13 12 8 8 0 3.5726 2.0733 0 2.0733 3.5726 10 10 10 10 10 10 10 10 10 10 + 0.7497 0.6618 Approximation line: + 10 10 0.7497 0.6618 c c (- , )

  33. 705 [Simplification for Computation] Suppose that we only want to find the most important L axes of the data. (It is usually the case for practical applications). If M is very large, then the MxM matrix U is unnecessary to be computed. One only has to decomposition for B and obtain the NxN matrix V: perform eigenvector-eigenvalue H -1 B = A A B = VDV If D[n, n] is larger than other diagonal entries of D, then the nth column of V is the principal axis.

  34. 706 Some Common Mathematical Notations (1) Commutator A B = AB BA , (2) Trace N = ( ) A = ( ) , tr A n n 1 n (3) Bras and Kets Notations b b 1 A and B are column vectors. 2 * 1 * 2 * N = A B a a a | B b N AH b b 1 2 * 1 * 2 * N = = B A a a a b N

  35. 707 (4) sup: supremum (the least upper bound ) sup |1 2 2 x x = ( ) n = sup 1 1/ | 1 n n N (5) inf: infimum (the greatest lower bound ) inf |1 2 1 x x = x = inf | 0 e x R (6) card: the number of elements in a set ( ( ) = , 2 card x y ) 2 2 = , , , , ,1 6 card x y xy x y

  36. 708 9. Advanced Probability Section 9.1 Moment and Correlation Section 9.2 Probability Model ( ) Section 9.3 Entropy Section 9.4 Kullback-Leibler Divergence (KL Divergence) Section 9.5 Basic Concepts of Random Process ( ) Section 9.6 Independent Component Analysis ( )

  37. 709 9.1 Moment and Correlation 9.1.1 Review for Probability 1. Discrete Case (a) Probability (Probability Mass Function, PMF) = number of cases where X total number of cases n ( ) n = = = P P X n X ( ) 1 n = P Note: X n (b) Joint Probability ( ) = = = , ( ) ( = ) P n m P X n and Y m , X Y number of cases where X total number of cases = n and Y m =

  38. 710 (c) Conditional Probability ( ) = = = | ( )|( ) P n m P X n Y m | X Y number of cases where X number of cases whereY = = n and Y = m = m ( ( ) ) , P n m ( ) , X Y P m = | P n m | X Y Y An Example of the Discrete Probability Distribution Binomial Distribution N n ( ) n N n n = (1 ) P p p for n = 0, 1, 2, , N X ( ) n = 0 P X Other examples are shown in Section 9.2.1

  39. 2. Continuous Case 711 (a) Cumulative Distribution Function (CDF) ( ) x ( ) = F Prob X x X (b) Probability Density Function (PDF) d dx ( ) x ( ) x ( ) x = f F X X x ( ) x dx = ( ) x ( ) x F f Note: = Prob Xf X x X X ( ) x dx = = lim x 1 F f X X ( ) ( ) + F x F x ( ) x = X X lim f X 2 0 ( ) + Prob x X x ( ) x = lim f X 2 0

  40. 2. Continuous Case 712 (c) Joint Cumulative Distribution Function ( ) , , X Y F x y Prob = ( ) ( ) ( ) X x and Y y (d) Joint Probability Density Function ( ) , , X Y f x y x y ( ) = , F x y , X Y (e) Conditional Probability ( ) ( ( ) y ) , F x y , f x y x y , X Y ( ) , X Y f = | f x y = | X Y dF dy ( ) y Y Y (f) Integral Probability ( ) x ( ) = , f f x y dy , X X Y ( ) y ( ) = , f f x y dx , Y X Y

  41. 713 Examples of the Probability Density Function in the Continuous Case (Quantization error is a special case of B = 0.5) Uniform Distribution ( ) x 1 B = when |x| < B Xf 2 1/(2B) ( ) x = otherwise 0 Xf -B B x Normal Distribution - < x < : mean : standard deviation 2 1 2 x 1 ( ) x = exp f X 2 when = 0, = 1 Other examples are shown in Section 9.2.2

  42. 714 3. Expected Value (discrete case) ( g n P X ( ) ) ( ) n = = = [ ] E g X n g n P X n n (continuous case) ( E g X ) ( ) ( ) ( ) x dx = g x f X It is usually written as ( ) ( ) ( ) ( ) x = E g X g x dF X

  43. 715 9.1.2 Moments Mean = ( ) ( ) n = E X nP (discrete case) X X n ( ) ( ) x dx = = E X x f (continuous case) X X Variance (var) ( ( ) ) ( ) n 2 2 = = ( ) ( ) var E X n P (discrete case) X X X X n ( ) x dx 2 2 = = ( ) ( ) var E X x f (continuous case) X X X X Standard Deviation (std) ( ( ) ) 2 = = ( ) var E X (discrete case) X X X 2 = = ( ) var E X (continuous case) X X X

  44. 716 [Example 1] ( ) n (1) P X 1 0.5 2 0.5 1.5 = + ( 0.5 1 1.5 X var = var = = + X ) ( ) 2 2 = = 0.5 2 1.5 0.25 0.5 X X ( ) n (2) P X = var = = 5 / 2 1.25 2.5 X X X

  45. 717 (1) Moment (raw form) = ( ) = ( ) n k k m E X n P (discrete case) k X n ( ) ( ) x dx = k k = (continuous case) m E X x f k X (2) Moment (central form) ( ( ) ) ( ) n k k = = ( ) ( ) m E X n P (discrete case) k X X X n ( ) x dx k k = = ( ) ( ) m E X x f (continuous case) k X X X

  46. 718 (3) Moment (standardized form) ( ) n k ( ) ( ) n P k ( ) E X X X m X = = = k k X n v ( ) k /2 /2 k k 2 ( ) E X ( ) n 2 ( ) n P X X X n (discrete case) ( ) ( ) x dx k k ( ) x f ( ) E X m X X X = = = k k X v ( ) k /2 /2 k k 2 ( ) x dx ( ) E X 2 ( ) x f X X X (continuous case)

  47. 719 Order Moment (raw) Moment (central) Moment (standardized) k = 1 Mean 0 0 k = 2 Variance 1 k = 3 Skewness k = 4 Kurtosis k = 5 Hyperskewness

  48. 720 Skewness ( ) It indicates the relative locations of the high-probability region and the long tail to the mean. ( ) ( ) X X n skewness 3 n P n ( ) x dx 3 ( ) x f X X = or 3/2 3/2 ( ) x dx ( ) n 2 ( ) x f 2 ( ) n P X X X X n skewness > 0: The high-probability region is in the left of the mean. skewness < 0: The high-probability region is in the right of the mean. skewness = 0: Symmetry

  49. 721 skewness = 0 mean = 0 skewness = 0.4992 mean = 0 skewness = -0.4992 mean = 0

  50. 722 Kurtosis ( ) It indicates how sharp the high-probability region is. ( ) n 4 ( ) n P ( ) x dx 4 X X ( ) x f = X X n kurtosis or 2 2 ( ) x dx ( ) x 2 2 ( ) x f ( ) x P X X X X x kurtosis 0 large kurtosis: The high probability region is sharp.

Related


More Related Content