
Efficient Algorithms: From Robustness to Efficiency Explored
Discover how the design of robust algorithms can lead to efficient solutions in the context of iterative methods corrupted by noise. Explore the relationship between robustness and efficiency in algorithms like gradient descent and the ball walk method, considering their resilience to different types of noise and potential efficiency improvements.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
From Robustness to Efficiency Yin Tat Lee University of Washington
Theme How designing robust algorithms lead to fast algorithms? Consider an iterative method corrupted by noise: ? is a vector function, is the step size ? ? + (?(? + ??????) + ???????). An algorithm is robust to certain noise if the noise does not hurt its convergence rate. Warning: This talk focus on asymptotically faster weakly polytime algorithm.
Problem: min ? ?(?) Example 1: Gradient Descent Question: How is robustness related to efficiency? Gradient descent ? ? ??(?) Fact*: Gradient descent is robust under unbiased output 2noise. Namely, the algorithm ? ? (??(?) + ?) gives a similar result as gradient descent if ? ? = 0 and ||?||2 is bounded. Why is this useful for efficiency? We can replace ??(?) by some random ? if ? ? = ??(?) and ||?||2 is bounded. This leads to stochastic gradient descent (by sampling the examples) coordinate descent (by sampling the coordinates) * Not a precise result because the convergence depends on the function class.
Problem: Sample ? uniformly from {?? ?}. Example 2: Ball Walk ? = # of constraints ? = # of variables Consider the ball walk: (Lov sz 90) Loop Generate ?~?(0,?). ? + ? ?? for all ? [?] If ?? ? ? + ? ? Each step takes ?(??) time. Is this algorithm robust? ? ??, changing ? a lot does not affect for this particular ?. Robust to input biased noise. If ?? Why is this useful for efficiency? It leads to ?(?) amortized time. ? ??. (Normalize ||??||2= 1.) Let ??= ?? Lazy Update Do not check all ?? every step. When the algorithm is robust, we can delay some calculations. If ??= ?, check that ?? again after roughly ?2 iterations. (Mangoubi-Vishnoi 19)
Newton method Gradient descent is 1-st order method and ball walk is 0-th order. How about Newton Method? 1?? ? . ? ? ?2? ? Theorem: [Nesterov-Nemirovski 89] Newton method converges if ? is self-concordant. Any convex function can be approximated by a sequence of self-concordant functions. ? 2 ? 3/2 Our Result (Robust Interior Point Method): ?) for some self-concordant ??, then If ?(?) = ???(?? Newton method is robust to biased input noise. This robustness leads to efficiency improvement.
Consequence of Robust Interior Point Method Extended to ??=?,? 0? ? min Linear Programs Generalized linear models Semidefinite programs Dantzig Linear Systems ?? = ? If we can solve ?? = ? in ??
Robust Interior Point Method Newton method can made to be robust to input. Cohen-Lee-Song 19 v.d.Brand 20
??= ?-th col of ? ??= ? (?? ? is diagonal: ???= ? (?? Formula for Newton Method ?) Consider the problem ?) We compute the gradient and Hessian of ?: ? is data matrix Newton Method: Interior Point Method: Self-concordant function For ? = 1,2, ,?( ? log(1/?)) Theorem:[Nesterov-Nemirovski 89] Outputs an ?-accurate solution. ?), ???= ?? (?? ?). where ??= ?? (?? Output ? For simplicity, you can think ??= ?.
Easier to work with ? = ? ? Robust Interior Point Method ?) becomes ?(??) ?(?? 2 version is well- known. Loop ?( ? log(1/?)) iterations Update ? such that Implicitly perform Theorem [Cohen-Lee-Song 19] Outputs an ?-accurate solution. Output ? (using ? = ? ?) ? ? Advantage: If we require 2 close, update will be dense. ? ? Update to ? can be sparse. Don t need to compute ? exactly. ?,? We only need a data structure for s that: Rest of the talk focuses on how to use RIPM. General case: ?2.38 Special case: ? Output ? at the end. Total Time Tell which ?? changed a lot.
Example: ??.??Time Update when necessary and batch update when possible Cohen-Lee-Song 19 v.d.Brand 20
Why ? is useful? Setting: Classical version (without fast matrix multiplication) Each step computes the term ???? 1. It takes ?3 time. assume ? ? In total, it takes ?3 #???? = ?3.5 time. Robust version Each step maintains the term ???? 1 instead. Sherman Morrison formula: Rank one update of the inverse takes ?2 time. Hence, each coordinate change in ? takes ?2 time. In total, it takes ?2 #??????? time.
# of coordinate changes Question: Does runtime depend on # steps? Lemma : Each step satisfies What if we use small step ? ? ?2? 1?? ? ||?(new) ?||2 1. Answer: Runtime depends on # ? changes. It depends on 2 movement. And it is independent of . Recall that our robust method only use ? such that ||? ?|| 1/2. Greedy schedule for ? Question: For every step, If ?? ?? 1/2, set ??= ?? What is the total number of updates on the coordinates of ? ? Answer: ?(?). This is INSANE. Each coordinate is updated only few times! This gives roughly ?3log(1/?) time. Ignored normalization in norms and many log terms
# updates at each step Batch Update Consider two cases: ? steps, each step updates the inverse by ? rank 1 step, each step updates the inverse by ? rank Greedy Preemptive Which is faster? Batched is better due to (fast matrix multiplication) or GPU or communication or Preemptive schedule for ? Idea: Update preemptively to batch the updates. For the ?-th step (with ? is a multiple of 2?), If ?? (? 2?) 1/log?, set ??= ?? (?) ?? (?) Greedy schedule for ? Namely, if ?? moved a lot in 2? steps, update ??. This gives ?2.38 time for generalized linear model. For every step, If ?? ?? 1/2, set ??= ??
Other Applications / Related Work Even faster algorithms for linear programs (if matrix multiplications takes ?2 time) ?2+1/6 [Cohen-Lee-Song 19, Lee-Song-Zhang 19, v.d.Brand 20] ?2+1/18 [Jiang-Song-Weinstein-Zhang 21, v.d.Brand 21] Nearly linear time for LP with small treewidth [Dong-Lee-Ye 21] Faster simplex method [v.d.Brand 21] Faster algorithms for semidefinite programming [JKLPS 20, HJST 21] Faster algorithms for bipartitle matching [BLNPSSSW 20] Faster algorithms for min cost flow [BLLSSSW 21, DGGLPSY 22] Using their l1 robust IPM, [CKLPGS 22] gives the first ?1+?(1)time algo for mincost flow!
Open Problem: LP in input sparsity time Given a linear program ?? ?? ?. min Suppose that there is ? constraints and ? variables ||?||2= 1, The feasible domain contains an unit ball and is contained in a ball of radius ?. Find ? ? ??? + ?? ?? ? in ||?||0+ ?? 1 time. log? 1 The current bound is ?( ||?||0 ?? + ?2.5).