Bill Hart: Toom 7
<<TableOfContents>>
Motivation: be fast for certain ranges of integer multiplication. On core 2, will kick "their pants".
Work with 64-bit computer.
A "limb" is a "machine word".
To multiply integers
- split into polynomials
- evaluate polynomials
- multiplications
interpolate --> polynomials
- reconstruct our product
Now, write in symbols and
, for sufficiently big
, where
, and
, where the
and
are the "digits in base
".
Then . Evaluating
(reconstruct -- part 5) is easy bit shifting. So now we focus on polynomial multiplication.
Karatsuba (1962)
- 3 distinct multiplications
- 5 additions/subtractions
For larger polynomials, divide and conquer.
Knuth Multiplication
Very very hard to find in the literature (impossible?). I couldn't find a reference.
This is better, since some intermediate results are smaller. Overflowing into the next limb in Karatsuba is potentially quite nasty. Worrying about sign is less work than dealing with an extra limb. You have to use signed representations -- there is extra branching in both cases, and it really does make a difference. GMP really uses this and it does make a difference. At this level, saving one cycle per limb is a big improvement.
Another way of thinking
The above is difficult to generalize without being more systematic. Toom 3 is the next logical thing, where we break into 3 pieces instead of 2.
Consider
where have length
, where the length is the degree plus
. Then
has length
. To find the coefficients of
we only need values of
at
distinct points.
Example: Karatsuba
Evaluate at
.
.
So we end up doing the same three multiplies as with Karatsuba.
Example: Knuth
Same, but evaluate at .
Interpolation
Now generalizing we evaluate at points and need to find the resulting polynomial.
and we have for
values
. We recover
by solving a linear system with
rows:
so
.
Theorem: generically is invertible.
So solve the system explicitly and get a set of linear expressions.
Toom 3 (Toom 1963, Cook 1966)
Evaluate at points. One gets a messy identity.
New Idea: Winograd (1980)
Winograd made some suggestions:
Evaluate at and fractions. Amazingly, nobody thought of this for quite a while -- sometimes the good ideas are the simple ones. Say
, so
This greatly improves the linear algebra we have to do.
Toom 3 matrix. Evaluate at . The matrix is: $$\left(\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0 \\ 16 & 8 & 4 & 2 & 1 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right)$$ One then worries about finding a minimal list of row operations to reduce this to the identity.
Evaluate at : $$\left(\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 0 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 & 16 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right)$$ One can solve the resulting systems more efficiently on actual hardware with this matrix.
Unbalanced operands
and
Compute simply by evaluating at
distinct points.
Toom 4 2
For example, suppose we have
and
. Evaluate at
points. Pretty much the same as Toom 3... the interpolation (the most involved part) is exactly the same as Toom 3.
Odd/even Karatsuba (Hanrot-Zimmerman)
- David Harvey reinvented this, not knowing about the original paper.
The trick here is to write
and similarly
Then
Advantage: and
do not have to be the same size, so get balanced multiplication no matter what.
So far though, nobody has figured out how to make this really practical/fast/better than anything else.