Josh Alman. Algorithms and Barriers for Fast Matrix Multiplication

preview_player
Показать описание
Josh Alman. Algorithms and Barriers for Fast Matrix Multiplication. 11/19/2021

Matrix multiplication is one of the most basic algebraic operations. Since Strassen's surprising breakthrough algorithm from 1969, which showed that matrices can be multiplied faster than the most straightforward algorithm, algorithmic problems from nearly every area of computer science have been sped up by clever reductions to matrix multiplication. It is popularly conjectured that two n × n matrices can be multiplied in roughly O(n^2) time, but we don't yet have the algorithmic techniques needed to achieve this.

In this talk, I'll give an overview of what's known about matrix multiplication, including some exciting recent progress. I'll talk about the techniques used to design these algorithms, leading up to the fastest known algorithm for matrix multiplication, which runs in time O(n^2.37286), which I recently designed with Virginia Vassilevska Williams. I'll then describe new barrier results which help to explain why we've been stuck for so long on this problem, and describe what kinds of insights are needed for further improvements.
Рекомендации по теме
Комментарии
Автор

All these people seem to forget that for a 4096*4096 matrix (relatively small) multiplied with hanother one of this size they would need 6 recursive calls for Strassen, just to skip one multiplication. If you want real world speed don't do recursion, that's like adding loops within loops. And without recursion you can't divide both matrices into 4 by 4s to do the algorithm in the first place.
Recursion creates more speed overhead than just doing n^3 multiplications. Function calls aren't free.
In real world scenario you are better off *optimizing the multiplication* itself using bitwise operations to basically explicitly tell your CPU how to do it fast.
For example in javascript 142*2 will take minimum 0.080 ms, and when rewriting it as a bitwise for loop it consistently takes ~0.004ms. I know javascript is not treated well because "it's not performant" or whatever, but math scientists keep using python which has the same exact problems.

danser_theplayer
Автор

Josh, you seem like laughing all the time. Sure, decreasing w from 2.37287 to 2.37296 is ridiculous, so deserves a good laugh. 😂 Good work by the way.

paulbizard
Автор

Big Constants makes it meh. Speed ups due to SIMD intristinc; Cache Blocking; Fetching; Parallelization; Tensors using tailored Hardware Accelerators etc etc

axe