Bill Hart: Toom 7
<<TableOfContents>>
Motivation: be fast for certain ranges of integer multiplication. On core 2, will kick "their pants".
Work with 64-bit computer.
A "limb" is a "machine word".
To multiply integers
- split into polynomials
- evaluate polynomials
- multiplications
interpolate --> polynomials
- reconstruct our product
Now, write in symbols and , for sufficiently big , where , and , where the and are the "digits in base ".
Then . Evaluating (reconstruct -- part 5) is easy bit shifting. So now we focus on polynomial multiplication.
Karatsuba (1962)
- 3 distinct multiplications
- 5 additions/subtractions
For larger polynomials, divide and conquer.
Knuth Multiplication
Very very hard to find in the literature (impossible?). I couldn't find a reference.
This is better, since some intermediate results are smaller. Overflowing into the next limb in Karatsuba is potentially quite nasty. Worrying about sign is less work than dealing with an extra limb. You have to use signed representations -- there is extra branching in both cases, and it really does make a difference. GMP really uses this and it does make a difference. At this level, saving one cycle per limb is a big improvement.
Another way of thinking
The above is difficult to generalize without being more systematic. Toom 3 is the next logical thing, where we break into 3 pieces instead of 2.
Consider
where have length , where the length is the degree plus . Then has length . To find the coefficients of we only need values of at distinct points.
Example: Karatsuba
Evaluate at .
.
So we end up doing the same three multiplies as with Karatsuba.
Example: Knuth
Same, but evaluate at .
Interpolation
Now generalizing we evaluate at points and need to find the resulting polynomial.
and we have for values . We recover by solving a linear system with rows: so .
Theorem: generically is invertible.
So solve the system explicitly and get a set of linear expressions.
Toom 3 (Toom 1963, Cook 1966)
Evaluate at points. One gets a messy identity.
New Idea: Winograd (1980)
Winograd made some suggestions:
Evaluate at and fractions. Amazingly, nobody thought of this for quite a while -- sometimes the good ideas are the simple ones. Say , so
This greatly improves the linear algebra we have to do.
Toom 3 matrix. Evaluate at . The matrix is: $$\left(\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0 \\ 16 & 8 & 4 & 2 & 1 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right)$$ One then worries about finding a minimal list of row operations to reduce this to the identity.
Evaluate at : $$\left(\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 & 16 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right)$$ One can solve the resulting systems more efficiently on actual hardware with this matrix.
Unbalanced operands
and
Compute simply by evaluating at distinct points.
Toom 4 2
For example, suppose we have and . Evaluate at points. Pretty much the same as Toom 3... the interpolation (the most involved part) is exactly the same as Toom 3.
Odd/even Karatsuba (Hanrot-Zimmerman)
- David Harvey reinvented this, not knowing about the original paper.
The trick here is to write
and similarly
Then
Advantage: and do not have to be the same size, so get balanced multiplication no matter what.
So far though, nobody has figured out how to make this really practical/fast/better than anything else.