Distributed Ridge Regression with Random Projections in Large-scale Optimization
In the realm of large-scale optimization, leveraging distributed computing clusters to solve impractically large datasets has become crucial. This article delves into the intricacies of distributed ridge regression using random projections, tackling challenges in distribution and communication to streamline computation. Explore the benefits and analysis of the CoCoA algorithm, SRHT transform, and more in achieving accurate coefficient estimations efficiently.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Loco: Distributing Ridge Regression with Random Projections Yang Song Department of Statistics
Introduction In the last few years there has been great interest in solving large-scale optimization and estimation problems. Some datasets are large enough such that they are impractical to store and process on a single machine and so the problem must be solved in a distributed manner on a computing cluster. Two obvious questions: 1. Distribution. 2. Communication.
Ridge Regression Linear Regression: To minimize the error: OLS: Ridge Regression:
Distributed ridge regression K workers P = {1,2, ,p}: The set of indices ? = p/K : The size of subsets P = ?=1 non-overlapping ? ?? sub-matrix of X ?? ? ?
CoCoA Use ??and Y to estimate the coefficient vector ?? Concatenate all the ??and get ?.
Subsampled Randomized Hadamard Transform (SRHT)
Computational, memory and communication costs. The cost of computing random projection in each block The memory cost :
Benefits of Loco The problem each worker solves becomes easier in a computational sense. Each local problem becomes easier in a statistical sense. the size of the random projections to be communicated by each worker decreases.
Analysis Is the coefficients estimated by Loco are close to the full ridge regression solution? Risk: Natural assumption: Most of the important signal lies in the direction of the first J principal components of X.
Experimental Results n = 4,000 p = 150,000 Rank r = 150 n_test = 1000 Within-block correlation : 0.7 Signal-to-noise ratio: 1 Loco 1, Loco 5, Loco 10
n = 8,000 p = 500,000 Rank r = 500 Loco 1, Loco 2
Climate data The data we consider is part of the CMIP5 climate modeling ensemble, specically the data are taken from control simulations of the GISS global circulation model. p = 10368 n = 1062 n_test: 213, n_train: 849
Conclusion In the case of p>>n, we should use ridge regression rather than linear regression. Loco is a distributed algorithm that decrease the cost of time and memory much but with a low additional prediction error Loco can be generalized to a larger class of estimation problems.