PIPELINED LINEAR ALGEBRA CIRCUIT

Willi Geiselmann, Adi Shamir, Rainer Steinwandt and Eran Tromer, 2005

This is a highly sophisticated pipelined approach to matrix-vector multiplication in the block Wiedemann method. A highly simplified description will be as follows.

Basic Pipelined Idea

This simplified device consists of several stations connected in a pipeline. The i-th station is in charge of the i-th matrix row, and contains a compressed representation of the non-zero entries in that row. It is also in charge of the i-th entry of the output vector, and contains a corresponding accumulator. In each multiplication, the input vector is fed into the top of the pipeline, and moves down as in a shift register. As the entries of the vector pass by, the i-th station looks at all vector entries passing through it, identifies the ones corresponding to the non-zero matrix entries in row i, and for those entries adds 1 to its accumulator. Once the input vector has passed all stations in the pipeline, the accumulators contain the entries of the product vector. These can now be off-loaded and fed back to the top of the pipeline in order to compute the next multiplication.

Optimizations and Improvements

  1. As the matrix is sparse, each station is given a horizontal strip of multiple rows to handle instead of a single row.

  2. The vector is fed in parallel consecutive chunks instead of a single stream of bits to add parallelism to the structure.

  3. For the feedback loop of iterated multiplications, the stations are arranged in a closed chain.

  4. To obtain further parallelization without additional chip cost, these arrangements can be run over several copies in parallel.

There are several ways to tweak the set up and optimize the performance of this device. But, by far, this is the best device proposed for the linear algebra system in NFS.