Understanding PCA and Mahalanobis Distance in Multivariate Gaussian Analysis
Delve into the concepts of Principal Component Analysis (PCA) and Mahalanobis Distance as they relate to the covariance matrix, positive definite matrices, and multivariate Gaussian distributions. Explore examples and applications to gain insights into these fundamental statistical techniques.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
ECE 417, Lecture 7 PCA Mark Hasegawa-Johnson 9/19/2017
Content Mahalanobis Distance review PCA = Eigenvectors of the covariance matrix Positive semi-definite matrices; pseudo-inverse PCA = Left singular vectors of the data matrix
Mahalanobis Distance Review
Mahalanobis form of the multivariate Gaussian, dependent dimensions If the dimensions are dependent, and jointly Gaussian, then we can still write the multivariate Gaussian as 1 2? 1/2? 1 ? ?? 1 ? ? ?? ? = ? ?; ?, = 2 We call this the Mahalanobis form because the exponent is the squared Mahalanobis distance (with weight matrix ) between ? and ?: 2 ?, ? = ? ?? 1 ? ? ?
Example Suppose that ?1and ?2are linearly correlated Gaussians with means 1 and -1, respectively, and with variances 1 and 4, and covariance 1. 1 ? = 1 Remember the definitions of variance and covariance: ?12= ? ?1 ?1 ?22= ? ?2 ?2 ?12= ?21= ? ?1 ?1 ?2 ?2 2= 1 2= 4 = 1 =1 1 4 1
Example The contour lines of this Gaussian are the lines of constant Mahalanobis distance between ? and ?. For example, to plot the ? ?, ? = 1 and ? ?, ? = 2 ellipses, we find the solutions of 2 ?, ? = ? ?? 1 ? ? 1 = ? and 2 ?, ? = ? ?? 1 ? ? 4 = ?
PCA = Eigenvectors of the Covariance Matrix
Symmetric positive definite matrices If is symmetric and positive semi-definite we can write = ? ?? and ?? ? = Where is a diagonal matrix of the eigenvalues, and ? is an orthonormal matrix of the eigenvectors.
Inverse of a positive definite matrix The inverse of a positive definite matrix is: 1= ? 1?? Proof: 1= ? ??? 1??= ? 1??= ???= ? where 1 ?1 0 0 1 ?2 1= 0 1 ?? 0
Mahalanobis distance again Remember that 2 ?, ? = ? ?? 1 ? ? ? But we can write this as 2 ?, ? = ? ??? 1?? ? ? = ?? 1 ? ? Where the vector ? is defined to be the principal components of ?: ? ? ? ?? ?1 ? = ?? ? ? = ? ? ?
Facts about ellipses The formula 1 = ?? 1 ? or equivalently 1 =?12 + +??2 ?1 ?? is the formula for an ellipsoid. If ?1 ?2 ??then the biggest main axis of the ellipse is the direction in which ?1 0 and all of the other principal components are ??= 0. This happens when ? ? ?1, because in that case: ?1 ?? ? ? ? 0 ? ? ? = 0, ? 1
Example Suppose that =1 1 4 1 We get the eigenvalues from the determinant equation: ?? = 1 ? 4 ? 1 = ?2 5? + 3 which equals zero for ? =5 13 We get the eigenvectors by solving ?? = ?, which gives 1 3 + 13 2 Where the constant of proportionality is whatever s necessary to make vectors unit-length; we don t really care what it is. . 2 1 3 13 2 ?1 , ?2
Example So the principal axes of the ellipse are in the directions 1 3 + 13 2 ?1 , 1 3 13 2 ?2
Example In fact, another way to write this ellipse is 1 ?1 2 ? ? ? = ?1 2 ? ? ? ?2 + ?2
Example In fact, it s useful to talk about in this way: The first principal component, ?1, is the part of ? ? that s in the ?1direction. It has a variance of ?1. The second principal component, ?2, is the part of ? ? that s in the ?2direction. It has a variance of ?2. The principal components are uncorrelated with each other. If ? is Gaussian, then ?1and ?2are independent Gaussian random variables.
Positive semi-definite matrices
Symmetric positive semi Positive semi-definite ( 0) means that for any vector ?, ?? ? 0. This is equivalent to saying that all of the eigenvalues of are non-negative (?? 0). This kind of thing often happens if ? > ? (the vector dimension is larger than the number of training tokens, as in MP2. Now it will turn out that some of the eigenvalues are zero. In fact, only M of the eigenvalues will be nonzero, for some number ? min(?,?). The number of zero eigenvalues depends on how many different values of ? cause ?? ? = 0. Suppose that there is a (? ?)-dimensional subspace of vectors ? such that any ? from that subspace causes ?? ? = 0. Then there are (? ?) zero-valued eigenvalues. If we ve sorted the eigenvalues so that ?1 ?2 ??, then ??= 0 for ? > ?, and ? ?= semi- -definite matrices ? ? = ?????? ?????? ?=1 ?=1
Symmetric positive semi It s useful now to define the eigenvalue matrix to be only ? ?, and to define the eigenvector matrix to be ? ?, where ? > ?. That way we can keep the idea that is all zeros off the diagonal, and all positive elements on the main diagonal: semi- -definite matrices ?1 0 0 0 ?2 0 ?? = , ? = ?1, ,?? With these definitions, we can still write = ? ??(a ? ? matrix) ?? ? = (an ? ? matrix) ??? = ?? ? but ??? ?? ?.
PCA = eigenvectors corresponding to nonzero eigenvalues When is positive semi-definite, it s most useful to define PCA for the nonzero eigenvalues (you can define PCA for the other eigenvectors, but it s not really useful). Thus ? is an M-dimensional vector: ? ? ? ?? ?1 ? = ?? ? ? = ? ? ?
Pseudo-inverse of a positive semi-definite matrix For a positive semi-definite matrix, it s useful to define a pseudo- inverse : = ? 1?? The dagger is a special character that means pseudo-inverse. It s not a true inverse ( ?) But it has some other properties that make it behave almost like an inverse. For example, = = .
Mahalanobis distance uses pseudo-inverse, since the true inverse doesn t exist in particular, the only sensible definition of Mahalanobis distance, in this case, is ? 2 ?, ? = ? ?? ? ? = ?? 1 ? =?12 ?1 + +??2 ?? Notice what this means. It means that any component of ? ? in a direction outside of the M-dimensional space ? = ?1, ,?? is just completely ignored. ? ? is first projected into that subspace as ? = ?? ? ? , then ? 2 ?, ? just calculates distance in the subspace.
PCA = left singular vectors of the data matrix
Normalized Data Matrix: Outer product = sample covariance matrix Define the normalized data matrix to be ? = ? 1 That way we get the unbiased sample covariance matrix as 1 ? 1 ?=1 is a ? ? matrix. Its (?,?)? element is the sample covariance of ?? and ?? 1 ? 1 ?=1 ??? ?? ??? ?? ? ?? ?? 1 ?1 ?, ?2 ?, , ?? ? ? ?? ??= ? ?? = ?? ? ? ???= ?? ??
Inner Product = Gram Matrix Instead of the outer product ? ??, suppose we compute the inner product ?? ?. That s called the gram matrix: = ?? ? is an ? ? matrix. Its (?,?)? element is the dot product of ?? ? and ?? ? : 1 ? 1 ?? ?? ?? ? ???=
Eigenvalues of the Gram Matrix and the Sample Covariance Both the gram matrix and the sample covariance are symmetric positive semi-definite matrices, so we can write = ? ??, = ? ?? ?1 0 0 0 ?2 0 ?? ?1 0 0 0 ?2 0 ?? = , = ? = ?1, ,?? is ? ? ? = ?1, , ?? is ? ? ??? = ??? = ?? ? But ??? ?? ?and ??? ?? ?
Square Root Matrix Define the square root matrix of to be some matrix 1/2such that 1/2 1/2= In fact, for a diagonal matrix like , the square root matrix is easy to find. It s just ?1 0 0 0 1/2= ?2 0 ??
Inserting an Identity Matrix: Covariance Remember that ??? = ?. Therefore we can write ? ??= = ? ?? = ? 1/2 1/2?? = ? 1/2? 1/2?? = ? 1/2??? 1/2?? 1 2??)(? 1 2??)? = (?
Inserting an Identity Matrix: Gram Matrix Remember that ??? = ?. Therefore we can write ?? ? = = ? ?? = ? 1/2 1/2?? = ? 1/2? 1/2?? = ? 1/2??? 1/2?? 1 2??)?(? 1 2??) = (?
Singular Value Decomposition (SVD) Thus we have 1 2??)?(? 1 2??) ?? ? = (? And 1 2??)(? 1 2??)? ? ??= (? The only way these things can both be true is if ? = ???? Where ? (? ?) and ? (? ?), are both inner-orthonormal, and ?1 0 0 0 ?2 0 ?? ? = ,??= ??= ??
Singular Value Decomposition (SVD) ? = ????, where ? (? ?) and ? (? ?), are both inner- orthonormal, and ?1 0 0 ?2 0 ? and ? are called the left and right singular vectors of ? ??are called the singular values There s nothing special about ?. EVERY matrix has singular values and singular vectors, but The only way to find the singular values is by finding the eigenvalues of ?? ? or ? ??, whichever is smaller. 0 ?? ? = ,??= ??= ??
Why is this useful for MP2? The only way to find the singular values is by finding the eigenvalues of ?? ? or ? ??, whichever is smaller. In MP2, the sample covariance is ? ?, which is huge. The gram matrix is ? ?, which is actually a lot smaller. So you want to start out by computing = ? ??, not = ? ?? To find the principal components, though, you need ?. How do you find it, if you already know ? and ? Answer: use the data matrix: ? = ???? ?? = ????? = ?? ??? 1= ?
Why is this useful for MP2? But actually, in past semesters we ve discovered that, instead of using the orthonormal PCA, you actually get better results if you use the S- weighted PCA: ?? = ?1?1, ,????
Summary Principal components are what you get by projecting the data onto the principal component directions Principal component directions are eigenvectors of the covariance Variance of a PC is the eigenvalue of the covariance, which is also the eigenvalue of the gram matrix. Standard deviation of a PC is the singular value of the data matrix (square root of the variance). Once you have eigenvalues and eigenvectors of the gram matrix, you can find the eigenvectors of the covariance by multiplying through the data matrix.