Can someone please explain strassen's algorithm for matrix multiplication in an intuitive way? I've gone through (well, tried to go through) the explanation in the book and wiki but it's not clicking upstairs. Any links on the web that use a lot of English rather than formal notation etc. would be helpful, too. Are there any analogies which might help me build this algorithm from scratch without having to memorize it?
Strassen's algorithm for matrix multiplication
26.7k views Asked by confused_by_strassen AtThere are 3 answers
Took a quick look at the Wikipedia and it appears to me that this algorithm is a slight reduction in the number of multiplications required by rearranging the equations.
Here's an analogy. How many multiplications in x*x + 5*x + 6
? Two, right? How many multiplications in (x+2)(x+3)
? One, right? But they're the same expression!
Note, I do not expect this to provide a deep understanding of the algorithm, just an intuitive way in which you can understand how the algorithm can possibly lead to an improvement in calculation complexity.
Consider multiplying two 2x2 matrices, as follows:
A B * E F = AE+BG AF+BH
C D G H CE+DG CF+DH
The obvious way to compute the right side is just to do the 8 multiplies and 4 additions. But imagine multiplies are a lot more expensive than additions, so we want to reduce the number of multiplications if at all possible. Strassen uses a trick to compute the right hand side with one less multiply and a lot more additions (and some subtractions).
Here are the 7 multiplies:
M1 = (A + D) * (E + H) = AE + AH + DE + DH
M2 = (A + B) * H = AH + BH
M3 = (C + D) * E = CE + DE
M4 = A * (F - H) = AF - AH
M5 = D * (G - E) = DG - DE
M6 = (C - A) * (E + F) = CE + CF - AE - AF
M7 = (B - D) * (G + H) = BG + BH - DG - DH
So to compute AE+BG, start with M1+M7 (which gets us the AE and BG terms), then add/subtract some of the other Ms until AE+BG is all we are left with. Miraculously, the M's are chosen so that M1+M7-M2+M5 works. Same with the other 3 results required.
Now just realize this works not just for 2x2 matrices, but for any (even) sized matrices where the A..H are submatrices.
In my opinion there are 3 ideas that you need to get:
You can split a matrix into blocks and operate on the resulting matrix of blocks like you would on a matrix of numbers. In particular you can multiply two such block matrices (of course as long as the number of block rows in one matches the number of block columns in the other) and get the same result as you would when multiplying original matrices of numbers.
The blocks necessary to express the result of 2x2 block matrix multiplication have enough common factors to allow computing them in fewer multiplications than the original formula implies. This is the trick described in Tony's answer.
Recursion.
Strassen algorithm is just an application of the above. To understand the analysis of its complexity, you need to read "Concrete Mathematics" by Ronald Graham, Donald Knuth, and Oren Patashnik or a similar book.