20090408: Bill Hart: Toom 7

Summary of last time

Toom 3: 5 mults Toom 4 2:

Toom 4: 7 mults Toom 5 3: Toom 6 2:

Odd/Even: $A_1(x^2) + x A_2(x^2)$ and $B_1(x^2) + x B_2(x^2)$ To date, nobody has ever beat ordinary or unbalanced Tooms using odd/even.

Question from last time: Why was one of the two matrices (one with [1,2,4,8,16]) better than the other?

eval sequence, int sequence.... something having to do with details of assembly programing.

Complexity

Schoolboy $O(n^2)$

Karatsuba: $K(2n) = 3 K(n)$ , so $K(n) = n^k$ , where $k=\log(3)/\log(2)=1.585$ .

Toom 3: $K(3n) = 5 K(n)$ , so $K(n) = O(n^{\log(5)/\log(3)}) = O(n^{1.465})$

Toom $k$ : $K(k n) = (2k-1) K(n)$ , so $K(n) = O(n^{\log(2k-1)/\log(k)})$ . As $k\infty$ , this goes to $1$ .

FFT Multiplication

Evaluate at roots of unity.

FFT (Fast Fourier Transform, Cooley-Tkey). Asymptotically fast method for evaluating a polynomial of length $n$ at $2n$ roots of unity.

Schoenhage-Strassen: work in a ring $\mathbf{Z} / p\mathbf{Z}$ , where $p=2^{2^{\ell}} + 1$ . Note that $2$ is a $2^{\ell+1}$ th root of unity. This gives an algorithm that is $M(n)=O(n \log(n)\log(\log(n)))$ to multiply two integers.

However, the cutoff is about 7500 limbs in GMP. I.e. for smaller numbers this isn't worth it.

But what about smaller than 7500?

Toom 4/5

Bodrato/Zanoni (2007): Published optimal evaluation and interpolation sequences.

There is almost no point in going from Toom 4 to Toom 5, since the cutoff is about the same (about 190 limbs on an Opteron, say; less on a core 2).

Toom 7

For Toom 7, cutoff is abut 190 limbs, and it always beats Toom 4/5.

No optimal sequence known. However,

I have one of length 103 (and can get it down to 99) steps for interpolation.
Bodrato implementation (GPL V3+), which is 89 steps (less scalar divisions than mine -- his has 9 and mine 12).

Bill then shows us a projector of code and Bodrato's steps.

Heuristic A: After row operation, row $i$ has at least one more zero.

E.g.,

r1 = 1 1 0
r2 = 1 2 3
--->
r2 = r2 - r1 = 0 1 2

*or*

Heuristic B: After row operation, there exists row $i_2\neq i_1$ such that $$

\tilde{M[i_1, j_1]}/M[i_2,j_1] = \tilde{M[i_1, j_2]}/M[i_2,j_2]

$$ occurs for more entries than before.

r1 = r1 + r3
r1 0 2 2 2 0
r2 0 4 4 6 1
r3 0 0 0 0 1

William: Why is Toom 7 hard to implement in GMP, but easy in MPIR?

mpz_t: length, sign, allocation <-- takes care of everything

whereas

mpn: array of limbs (nothing else!)
length: integers
sign

For example, to do a subtraction, $a-b$ have to do maybe $b-a$ and keep of sign separately.

Dan argues that one can keep track of barrows... etc., and Bill tends to agree. Lots of subtlety involving data structures.

Start discussing quadratic sieve

One word of caution: the quadratic and number field sieves are enormous black holes that claims a lot of academic lives. Make sure that if you start a project on this, that you have a goal in mind and reasonable time frame. This will be a multi-year effort, probably about 6 years, and will be a minimum 9000 line program. In effect, aiming for an optimal number field sieve is a complete waste of time.

My first project: 60 decimal digit numbers:

in C, quadratic sieve, 12 minutes
1 week: 1 minute
3 years: 9.3 seconds

Code looks very similar! You would barely be able to tell the difference between "12 minutes" and "9.3 seconds". The code becomes organic and takes on a life of its own. Small changes can make it way slower.

up to 90 decimal digits: always use quadratic sieve, if suitably random...