Sketching for Least Squares Regression and Linear Algebra Algorithms

cis 700 n.w
1 / 22
Embed
Share

Explore how sketching techniques can optimize least squares regression and linear algebra computations. Learn about JL matrices, oblivious subspace embeddings, and more in the context of big data algorithms.

  • Sketching
  • Regression
  • Linear Algebra
  • Algorithms
  • Big Data

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. CIS 700: algorithms for Big Data Lecture 7: Sketching for Linear Algebra Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us

  2. Least Squares Regression Solving an overconstrained linear system For ? ? given: matrix ? ? ? vector ? ? Find ? ? that minimizes: ?? ? Normal equation: ???? = ??? If ? has rank ? then ? = ??? 1??? Takes ? ??2 time to compute (using na ve matrix multiplication) 2

  3. Sketching for Least Squares Regression ? ?2 ? Use JL matrix ? ? ?where ? = Solve min ??? ?? 2 instead ? Standard JL: time ? ??? + ??2> ?(??2) Sparse JL: time ? ??2/? + ??2 Fast JL: time ?(??log? + ??2) Subspace embeddings from JL: JL only gives a guarantee for a fixed vector We need the guarantee for the column space of ?

  4. Oblivious Subspace Embeddings Subspace embedding for ?: 2= 1 ? 2 ??? ?? 2 2 SE for ? SE for ? where ? is the orthonormal basis for the column space of ? Least Squares Regression: use SE for (A,b) min ? ?? ? 2 min ??? ?? 2=min ?(?? ?) 2 ? ? Oblivious Subspace Embedding (OSE): matrix ? chosen independently of ?, works for any fixed ? JL transforms can be used as oblivious subspace embeddings

  5. JLT ?,?,? JLT ?,?,? :? ? ? that for any ?-element subset ? ? for all ?,? ? satisfies that: ??,?? ?,? For unit vectors ?,? : ??,?? ?,? ??,?? = 1 2 2 =1 2 2 = ?,? ?(?) Suffices to take regular JL of dimension ? = (1/?2log?/?) 2? ? ? 2 ? 2 ?? 2 ? ? 2 ? ? + ? 2 2 2 1 ? 2 1 ? 2 ? + ? ? 1 ? ? 2 2

  6. OSE construction ? = ? ? ?: ? = ??, ? ?-net argument: find a set ? ? such that if ??,?? = ?,? ? 2= 1 ? ? = 1/2-net: 2= 1} ?,? ? 2 ? ? then ?? ? 2 2 2 1 ? ? ? ?: ? ? 2 1 2? and each ?? ? = ?0+ ?1+ ?2+ , where ?? is a multiple of a vector in ?.

  7. Net argument 1 2? and each ?? is a ? = ?0+ ?1+ ?2+ , where ?? multiple of a vector in ?. ? = ??+ (? ??) where ?? ?,||? ??||? 1 (? ??) = ??+ (? ?? ??) where ?? ? and (? ?? ??| ? ?? 2 2 + 2 ???,??? 2 ? ?? 1/4 2 2= 2 ? ?0+ ?1+ ?2+ 2 ??? = 2 0 ?<?< 2 ?? + 2 ??,?? 2?( ?? ?? 2) 2 2 0 ?<?< = 1 ?(?) 0 ? ?<

  8. -Net construction ? For 0 < ? < 1 there is a ?-net for ? of size 1 +2 Choose a maximal set ? of points on ?? such that no two points are within ? of each other Balls of radius ? 2 around the points are disjoint Ball of radius 1 +? 2 around the origin contains all balls ? = 1 +2 ? 1+? ? 2 ? # points 2 ? Size of -net 5? JLT of dimension ((? + log1 ?)/?2) gives OSE

  9. OSE constructions Running Times nnz(A) = # non-zero entries in A OSE from Sparse JL: time ? ??? ? ?/? Fast JL: time ?(??log?) [Clarkson, Woodruff 13] possible to construct OSE in time ?(???(?))

  10. Leverage Score Sampling Def (Leverage Score): For an ? ? matrix Z with orthonormal columns let the leverage score ??= ? ? where ? ?? 2 Note: leverage scores form a distribution If ?doesn t have orthonormal columns we can still pick an orthonormal basis ? for it Choice of ?doesn t matter (? = ??) where ? is orthonormal gives same leverage scores All ? 2 2 2 2= ?? = ?? 2 2 are at most 1

  11. Leverage Score Sampling Given: ? > 0 distribution (?1, ,??) with ?? ??? Leverage Score Sampling(?,?,?): Constructs matrices ? ? and ? ? ? For each column indep. with replacement pick row ? w.p. ?? Set ?,?= 1 and ???= 1/ ???

  12. LSS as a Subspace Embedding Thm.: If ? ? ? has orthonormal columns then for ? > 144?log ? LSS(?,?,?) then for all ? w.p. 1 ?: 1 ? ?? 2? /??2 if and ? are constructed via 2?? ?? 1 + ? (Matrix Chernoff): If ?1, ?? are i.i.d copies of a symmetric random matrix ? ? ? with ? ? = 0, ? 1 ? ?=1 ?? and ? > 0: 2 ? and ? ??? 2 ?2 then for ? = ? ??2 Pr ? 2> ? 2?exp 2?2+2?? 3

  13. Proof: LSS as a Subspace Embedding ??= ?-th sampled row of ? in LSS ?,?,? ??= j-th row of ? ??= ?? ?? ???/?? ??? ???? ? = ?? ??? = 0? ? ? ?? = ?? ?=1 ?? ??? ?? is a rank-1 matrix with operator norm 2 ?? ||??||2 ?? ? ?: ??? ?? ?? ?? 2 ?? 2+ 1 + ?/? 2

  14. Proof: LSS as a Subspace Embedding ??? ?? ????? ?? ??? ?? ??? ?? ?? ? ??? = ?? 2? + ? 2 ??? ????? = ?? ?=1 ? ? ? ??? ?? ?? ?=1 ? ? 1 ?? = ? ? 1 ??= ?? ?? ??? ?? ? ??? Take ? =1 By Matrix Chernoff for ? = (?log? 2 ? ? ?=1 ?/(??2)): 2> ? ? ?? ?? ??? ?? Pr

  15. LSS as a Subspace Embedding ? = ? ?? (SVD of ?) ?? ??? 2= ??? = 1 ? 2(all sing. values up to 1 ?) ? = 1 ? How to compute ? in O(??? ? log? + ????(?)) time? ?? 2 ( ?? 2= 2)

  16. Thin Singular Value Decomposition ? ? ?, ? ? ?, ? ? ?, ? ? ? ? = ? ? ?? (computed in ?(? ?2)) time ?has orthonormal columns, ? is diagonal, ? is unitary (??? = ???= ?) ???= ?? is the ?-thsingular value ??= i-th column of ? is the i-th right singular vector: ??? ? ? ???? = ?? Moore-Penrose pseudoinverse : ?+= ??? 1??= ?? ??? Least squares solution: ? =?+? = ?? ???? ? ? 2= 2= ? ? ?? ?= ?? ? ??

  17. Approximating Leverage Scores Thm. A constant-approx. leverage score distribution for ? ? ? can be computed with constant prob. in ? ??? ? log? + ????(?) time S = sparse embedding matrix with ? = ?(?2/?2)rows for constant ?(Count-Sketch matrix) One non-zero entry per column of S => ??computed in nnz(A) time QR-factorization: QR = SA where Q has orthonormal columns, S is upper triangular(takes ?(? ?2)) time using e.g. Gram- Schmidt 2 where ? ? ? is a matrix of i.i.d ??? ?? ??= ?(0,1/?) random variables for ? = ? ? ?? in ?(?2log?/?2), ?(? ??) in ?(??? ? log?/?2) ?? 2 log ? ?2

  18. Approximating Leverage Scores 2 2 ??? ?? ??? ? ??= Singular values of ?? ? 1 ?,1 + ? ?? 1 ? ?? 2 2 2 2 ?? ?? ??? ?? = 1 ? 2 2 2 = 1 ? ? ? 2 2 = (1 ?) ? 2 ? = ?? ??= o.n.b. for the column space of ? 2 Singular values of ? are [1 2?, 1 + 2?], otherwise ?? ??? 2 2 2= 1 1 2? 1 + ? < 1 but ?? ??? = ?? 2 2 2 2 2 ??? ? ??? ? ?? ?? = ?? 1 2? ?? = 1 2? ?? 2 2 2 Thus, ?? 1 ? 1 2 ? ??

  19. Least Squares Regression Dimension ? (?2/?2) can be reduced to ? where ? is a dense OSS Instead of using leverage scores we could just use ? as OSS and solve LSR in ?(??? ? + ????(?/?)) time Skylark: https://github.com/xdata- ? ?2 by using sketch matrix ? = ? ? https://github.com/xdata-skylark/libskylark https://github.com/xdata-skylark/libskylark skylark/libskylark

  20. ?1-regression ?2-regression is too sensitive to outliers min ? No closed-form solution Best running time by LP in ???? ?,? time Maximum Likelihood Estimators for noisy data: ?2= MLE if noise is Gaussian ?1= MLE if noise is Laplacian ?1 subspace embedding: ?: ??? 1= 1 ? Next time: approximate ?1 regression in O(n ????(?)) time. ? 1= ?=1 ?? ? |?? ??, ,? | ?? 1

Related


More Related Content