
Singular Value Decomposition and Best-Fit Subspaces Overview
Explore the concepts of Singular Value Decomposition (SVD) and Best-Fit Subspaces in data science, including finding optimal subspaces and minimizing distances using SVD techniques like greedy strategy and projections. Learn about singular values vs. eigenvalues and constructing singular vectors for best-fit lines.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CSCI B609: Foundations of Data Science Lecture 6/7: Best-Fit Subspaces and Singular Value Decomposition Slides at http://grigory.us/data-science-class.html Grigory Yaroslavtsev http://grigory.us
Singular Value Decomposition: Intro ? ? data matrix ? (? rows and ? columns) Each row is a ?-dimensional vector Find best-fit ?-dim. subspace ?? for rows of ?? Minimize sum of squared distances from ?? to ??
SVD: Greedy Strategy Find best fit 1-dimensional line Repeat ? times When ? = ? = ????(?) we get the SVD: ? = ????
? = ????: Basic Properties ? = Diagonal matrix (positive real entries ???) ?,?: orthonormal columns: ?1, ,?? ? (best fitting lines) ?1, ,?? ? (~projections of rows of ? on ?? ??,?? = ???, ??,?? = ??? ? = ???????? ?) ?
Singular Values vs. Eigenvalues If ? is a square matrix: Vector ?such that ?? = ??is an eigenvector ? = eigenvalue For symmetric real matrices ? s are orthonormal ? = ???? ? s columns are eigenvectors of ? Diagonal entries of ? are eigenvalues ?1, ,?? SVD is defined for all matrices (not just square) Orthogonality of singular vectors is automatic ???= ????? and ????= ????? (will show) ?????= ??? 2?? ?? ? are eigenvectors of ???
Projections and Distances Minimizing distance = maximizing projection 2= ??????????2+ ???????? ?? ????2 ? 2
SVD: First Singular Vector Find best fit 1-dimensional line ? = unit vector along the best fit line ??= ?-th row of ?, length of its projection: ??,? Sum of squared projection lengths: ?? First singular vector: 2 2 ?1= arg max ?? 2 ? 2=1 If there are ties, break arbitrarily ?1? = ??1 2is the first singular value
SVD: Greedy Construction Find best fit 1-dimensional line, repeat ? times (until projection is 0) Second singular vector and value: ?2= arg max ?? 2 ? ?1, ? 2=1 ??2 ?2? = 2 k-th singular vector and value: ??= arg max ?? 2 ? ?1, ?? 1, ? ??? = 2=1 ??? 2 Will show:(?1,?2, ,??) is best-fit subspace
Best-Fit Subspace Proof: ? = 2 ? = best-fit 2-dimensional subspace 2+ ??2 2 Orthonormal basis (?1,?2) : ??1 Key observation: choose ?2 ?1 If ? ?1 then any vector in ? works Otherwise ?1= ?1 Choose ?2 ?1 ?2,?1 = ?2,?1 ??1 2 ??1 ??1 2 2 2 ||+ ?1 ||= projection on ? for ?1 ||: ||+ ?1 ||+ ??,?1 2 2+ ??2 = ?2,?1 2 and ??2 2 = 0 2 2 ??2 2 2 2 2+ ??2 2 ??1 2 2 2
Best-Fit Subspace Proof: General ? ? = best-fit ?-dimensional subspace ?? 1=????(?1, ,?? 1)best fit (?-1)- dimensional subspace Orthonormal basis ?1, ,??, where ?? ?? 1 ? 1 2 ? 1 2 ??? ??? 2 2 ?=1 ?=1 2 2 ?? ?? 1 by def. of ?? ??? ? ??? 2 2 ? 2 2 ??? ??? 2 2 ?=1 ?=1
Singular Values and Frobenius Norm ?1, ,?? span the space of all rows of ? ??,? = 0 for all ? ?1, ,?? ? 2 2 ?? = ??,?? 2 ?=1 ? ? ? ? ? 2 2= 2= ??? ?? = ??,?? 2 ?=1 ?=1 ?=1 ?=1 ?=1 2= ?=1 ? ? ? ? 2(?) 2= ?=1 ?=1 ?=1 ??,?? ??? ?? 2 ? ? 2= ? 2(?) ?=1 ?=1 ?=1 ??? A F(Frobenius norm) = ??
Singular Value Decomposition ?1, ,?? are right singular vectors 2= ??(?) are singular values ?1, ,??for ??= ??? ??? ??? are left singular vectors
Singular Value Decomposition ? ? Will prove that ? = ?=1 Lem. ? = ? iff ?:?? = ?? ?=1 ?????? ? = linear combination of ?? Duplicate singular values singular values are not unique, but always can choose orthogonal ?????? ? ???= ????= ??? ? + orthogonal
Best rank-? Approximation ? ? ??= ?=1 ?? = best rank-? approx. in Frobenius norm Lem: rows of ?? = projections on span(?1, ,??) Projection of ?? = ?=1 ??,???? Projections of ?: ?=1 ????? For any matrix ? of rank ? (convergence of greedy) ? ?? Recall: if ?? are orthonormal basis for column space: 2= ?=1 ?=1 ??,?? ?????? ? ? ? ? ?= ?=1 ?= ?? ?????? ? ? ? ? ? ? 2 maximum for projections A ?
Rank-? Approximation and Similarity Database ?: ? ?matrix (document term) Preprocess to answer similarity queries: Query ? ?= new document Output: ?? ? = vector of similarities Na ve approach takes ?(??) time If we construct ??= ?=1 ??? = ?=1 ????(?? Error: max ? 2 1 ? ?? ? ? first ?????? ? ??) ?(?? + ??) time ? ??? ? ?? 2 2= ?1? ?? = ??+1?
Left Singular Values and Spectral Norm See Section 3.6 for proofs Left singular vectors ?1, ,?? or orthogonal ? ?? For any rank ? matrix ? ? ?? ???= ????? and ????= ????? 2= ??+1 2 ? ? 2
Power Method ? = ??? is a ? ? matrix ?? ?=1 ? ? ? ?= ? = ?=1 ?????? ?????? ? ? ? = ?????? ?????? = ?=1 ?=1 ? ? ???)?? ?= 2???? ? ??????(?? ?? ?,?=1 ?=1 ?? ?=1 ? if ?1> ?2 take scaled 1st row ? 2???? ? 2???? ?= ?=1 ? 4???? ? ?2= ?=1 ??= ?=1 ?? 2????? ?? ?? ? ??
Faster Power Method PM drawback: ??? is dense even for sparse ? Pick random Gaussian ?and compute ??? ? = ?=1 ????(augment ?? s to o.n.b. if ? < ?) ??? ?1 ?=1 ??? = ??? ??? (???)? Theorem: If?is unit ?-vector, ???1 ?: ? = subspace spanned by ?? ? ? 2??1?1 2??1?1 ? ???? = ?1 ? for ?? 1 ? ?1 1 2?ln 1 ?? iterations of PM ? =unit vector after ? = ? has a component at most ? orthogonal to ?
Faster Power Method: Analysis ? ? and ? = ?=1 2????? ? ? = ?=1 ??? = ?=1 ?????? ? ?? ???? ? ?=1 ? ? 2????? ????= ?=1 ?? 2 ? ? 2 2????? 4??? 2 ?1 4??2 4??1 2 ?? ??? = ?? = ?? 2 ?=1 ?=1 2 (Squared ) component orthogonal to ? is ? 4??? ? 2 1 ?4??1 1 ?4??1 2 4? 4? ?? ?? ?=?+1 ?=?+1 Component of? ? 1 ?2?/? ?
Choice of ? ? random spherical Gaussian with unit variance ? ? 1 ? = ?: 1 ?? ??? 10+ 3? ?/64 20 ? ? ? ? 3? ?/64(Gaussian Annulus) ??? ? 0,1 Pr ??? ?? ? 1 10 1 10 2 1 Can set ? = 20 ?in the faster power method
Singular Vectors and Eigenvectors Right singular vectors are eigenvectors of ??? ?? Left singular vectors are eigenvectors of ??? ??? satisfies ?: ???? 0 ? = ??? ?:?????? Such matrices are called positive semi-definite Any p.s.d matrix can be decomposed as ??? 2 are eigenvalues of ??? 2???? ?? = (????)2 0 ?
Application of SVD: Centering Data Minimize sum of squared distances from ?? to ?? SVD: best fitting ?? if data is centered What if not? Thm. ?? that minimizes squared distance goes through centroid of the point set: 1 ? ?? Will only prove for? = 1, analogous proof for arbitrary ? (see textbook)
Application of SVD: Centering Data Thm. Line that minimizes squared distance goes through the centroid Line: = ? + ??; distance ????(??, ) 2= ???? ??, 2+ ?,?? Center so that ?=1 ??= ?by subtracting the centroid 2 ?? ? 2 ? 2 ?,?? ????? ??, 2= ?=1 ?( ?? ? 2) ? 2 ? 2+ ? 2 2 ??,? ?,?? 2) = ( ?? 2 2 ?=1 ? ? ? 2+ ? ? 2 2 2 = ?? ??,? ?,?? 2 2 ?=1 ?=1 ?=1 ? ? 2+ ? ? 2 2 = ?? ?,?? 2 2 ?=1 ?=1 Minimized when? = ?
Principal Component Analysis ? ? matrix: customers movies preference ? = #customers, ? = #movies ???= how much customer ? likes movie ? Assumption: ??? can be described with ? factors Customers and movies: vectors in ??and?? ? ???= ??,?? Solution: ??
Separating mixture of ? Gaussians Sample origin problem: Given samples from ? well-separated spherical Gaussians Q: Did they come from the same Gaussian? ? = distance between centers For two Gaussians na ve separation requires ? > ? ??/? 1 4) suffices Idea: Project on a ?-dimensional subspace through centers Key fact: This subspace can be found via SVD Apply na ve algorithm Thm. ? = (?
Separating mixture of ? Gaussians Easy fact: Projection preserves the property of being a unit-variance spherical Gaussian Def. If ? is a probability distribution, best fit line {??,? }is: ? ??? ? = ??????? =1 ?? ? Thm: Best fit line for a Gaussian centered at ? passes through ?and the origin
Best fit line for a Gaussian Thm: Best fit line for a Gaussian centered at ? passes through ?and the origin ??? = ?? ? ??? ? + ??? = ?? ???? ??+ 2(???)??? ? + (???)2 = ?? ?[??? ??] + 2(???)?? ?[??? ? ] + (???)2 = ?? ?[??? ??] + (???)2 = ?2+(???)2 Where we used: ?? ?[??? ? ] = ? ?? ?[??? ??] = ?2 Best fit line maximizes (???)2 ? ? ?? ?