Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments, Michael Mahoney

Показать описание

Motivated by problems in large-scale data analysis, random-ized algorithms for matrix problems such as regression and low-rank matrix approximation have been the focus of a great deal of attention in recent years. These algorithms exploit novel random sampling and random projection meth-ods; and implementations of these algorithms have already proven superior to traditional state-of-the-art algorithms, as implemented in Lapack and high-quality scientiﬁc comput-ing software, for moderately-large problems stored in RAM on a single machine. Here, we describe the extension of these methods to computing high-precision solutions in par-allel and distributed environments that are more common in very large-scale data analysis applications.
In particular, we consider both the Least Squares Approx-imation problem and the Least Absolute Deviation prob-lem, and we develop and implement randomized algorithms that take advantage of modern computer architectures in order to achieve improved communication proﬁles. Our iterative least-squares solver, LSRN, is competitive with state-of-the-art implementations on moderately-large prob-lems; and, when coupled with the Chebyshev semi-iterative method, scales well for solving large problems on clusters that have high communication costs such as on an Amazon Elastic Compute Cloud cluster. Our iterative least-absolute-deviations solver is based on fast ellipsoidal rounding, ran-dom sampling, and interior-point cutting-plane methods; and we demonstrate signiﬁcant improvements over tradi-tional algorithms on MapReduce. In addition, this algorithm can also be extended to solve more general convex problems on MapReduce.