Finding Structures of Large Matrices Through Compression

Показать описание

Finding Structures of Large Matrices Through Compression

Kai Zhang (Lawrence Berkeley National Laboratories)
Chuanren Liu (Drexel University)
Jie Zhang (Fudan University)
Hui Xiong (Rutgers University)
Eric Xing (Carneigie Mellon University)
Jieping Ye (University of Michigan)

The problem of matrix decomposition is to learn compact representations of a matrix while simultaneously preserving most of its properties, which is a fundamental building block in modern scientific computing and big data applications. Currently, even state-of-the-art solutions still require the use of the entire input matrix in generating desired factorizations, causing a major computational and memory bottleneck. In this paper, we uncover an interesting theoretic connection between matrix low-rank decomposition and \emph{ lossy data compression}, based on which a cascaded compression sampling framework is devised to approximate an $m\times n$ matrix in only $\mathcal{O}{(m+n)}$ time and space. Indeed, the proposed method accesses only a small number of matrix rows and columns, which significantly improves the memory footprint. Meanwhile, by sequentially teaming two rounds of approximation procedures and upgrading the sampling strategy from a uniform probability to more sophisticated, encoding-orientated sampling, significant algorithmic boosting is achieved to uncover more granular structures in the data. Empirical results on a wide spectrum of real-world, large-scale matrices show that by taking only linear time and space, the accuracy of our method rivals those state-of-the-art randomized algorithms consuming a quadratic, $\mathcal{O}(mn)$, amount of resources.