SchÃ¶nhageâ€“Strassen algorithm

The Schönhage - Strassen algorithm is an algorithm for multiplying two n- digit integers. It was developed in 1971 by Arnold Schönhage and Volker Strassen. The algorithm is based on a " super-fast " version of the discrete fast Fourier transform and a skillful change between the cosets and cyclic arithmetic in finite number of rings.

The Schönhage - Strassen algorithm was from 1971 to 2007, the most efficient known algorithm for multiplication of large numbers; Published in 2007, Martin Furer an evolution of the algorithm with an even lower asymptotic complexity.

The Schönhage -Strassen algorithm terminates in (see Landau notation ) when the bit complexity as a measure of efficiency on multi-volume Turing machines, ie, the maximum running time of the algorithm is measured as a function of bit operations required bit length of the input variables is selected. This complexity is an improvement over both the " naive " from the school known algorithm of running time as well as against the 1962 developed the Karatsuba algorithm with a running time of and its improved variant, the Toom - Cook algorithm dar. with runtime

2.3.1 recursion for odd m 2.3.1.1 Determination of residues modulo 2n 2
2.3.1.2 Determination of residues modulo ( D 1)

2.3.2.1 Determination of residues modulo 2n 1
2.3.2.2 Determination of residues modulo ( D 1)

Importance

Until 2007, no efficient algorithm has been found. As a lower bound for the general case, there is only the ( trivial ) linear running time, to which the algorithm approaches with increasing number length. However, researchers have found evidence that the barrier can be undercut never. Even with modern computers, this method of calculation is more efficient than the Karatsuba algorithm only for numbers with several thousand points. This is probably not so much on the overhead of the Schönhage - Strassen algorithm, but rather to the decades typical design optimization of computer processors, the faster floating point operations are reaching in preference to the arithmetic in finite residue class rings of integers.

For those looking for the algorithms with the best (time) complexity in the computer algebra of Schönhage - Strassen algorithm enjoys central importance.

Algorithm

The basic concept and terminology

To multiply two integers and, roughly following scheme is applied:

The step to be carried out in the middle "small" multiplications are carried out again in the recursive sense by the Schönhage - Strassen algorithm.

To understand why this result is the product of the numbers A and B, considering the polynomials

Sets a one, the result is just the binary representation of the numbers a and b. To calculate the Produktpolynom

We determine the Fourier transform of the Koeffiziententupel of A and B:

In other words evaluates one of the two polynomials at the points. If we multiply these function values , the result is the corresponding function values of the product polynomial

To win the polynomial itself, we must make the transformation reversed:

By the definition of unit roots is considered. This satisfies the following identity geometric sums of roots of unity:

Because

Thus, the following applies:

Article in discrete Fourier transform, the mathematical foundations of this transformation is further executed. Since the transformation sums, each with terms arise, we have in a classical calculation of the terms (such as the Horner scheme ) is still a quadratic term. By means of the fast Fourier transform can be calculated, these values rapidly. This calculation is based on the following divide and conquer principle:

It is part of solutions by means of simple operations (addition and simple multiplication ) together. Thus, the transformations can be computed in time. Because of rounding of the complex roots of unity to fixed number of digits long, however, result in calculation errors. To compensate for this, is to be expected for a resultant bit with at least bits. This gives a total running time of. When Schönhage - road variant, we expect instead in a residue class ring and thus avoid the miscalculations of complex numbers.

Furthermore, the multiplication not a pure convolution, but it can also lead to unearned; after performing the FT and iFT they must be treated appropriately.

The task of multiplying two integers is now concretized as follows:

Let the two given numbers to be multiplied in Binärzifferdarstellung. Next, the maximum length is (ie Binärziffernanzahl ) of the two numbers.

After appropriate treatment of the signs of the two numbers and the trivial special cases, and ( what is feasible with linear effort) we can assume that natural numbers. The Schönhage - Strassen algorithm solves this problem in.

Theoretical preparations

" Super-fast DFT "

The above- mentioned " super fast DFT ", which represents the core of the algorithm needs to be explained more in detail since it is used here very special.

It is a commutative unitary ring. In the element is a unit; continue to be a root of unity ( so ) that satisfies the equality. Then allows the calculation of the discrete Fourier transform ( DFT) in the product space (this is a shorthand notation for and the term vector space is here just in case that a body is usual) perform as follows in a fast variant (as FFT):

To calculate the transform with

By writing down the indices and in binary, but we do so in the number in reverse order, the transform is optimized calculated as follows:

There are

And

The closed form for this intermediate terms is

(For Recalculating this representation, note ).

This recursion provides the desired Fourier coefficients.

Due to the property, we can transform the recursion calculation somewhat more friendly to

And

With the same exponent.

The inverse transformation, ie the inverse FFT, succeed, since we have assumed that the ring is invertible:

As well as

Which is in turn.

In the application in Schönhage - Strassen algorithm is actually just a " halved FFT " is required; meant is thus the following: Let's start at the first step of the recursion with the calculation

Only and we restrict the further steps of recursion as a way, we charge just for all odd values . Will you reversed from those for odd (ie pieces) recover only the difference of the original, as is also sufficient in the return direction the " halved " recursion.

In Schönhage - Strassen algorithm described the fast Fourier transform for finite number of rings needed with Fermat.

Notation: Throughout the remainder class ring here we use the shorter notation, which could only result in the context of p- adic numbers to confusion.

As a root of unity, the number (or, depending on the context, a suitable power of 2 ) may be used in the ring. The multiplication to be performed in the FFT algorithm is then removed from the mold; However, they are not as pure shift operations feasible because reducing a larger intermediate result modulo must still be pushed. This is where one of the brilliant ideas of Schönhage and streets: embed the ring (equipped with the residue class arithmetic) fits into a larger, equipped with the cyclic arithmetic About ring. About This ring has a 2- power as an order, so that in him the appropriate multiplication is actually feasible as a pure shift operation. This trick can be summed up in a beautiful structure theorem on cosets and cyclic arithmetic in finite number of rings.

Structure theorem on cyclic arithmetic

The structure theorem on cyclic arithmetic can be formally summarized as follows:

For a power of two with a natural number

Here, displayable by the representative residue classes modulo designated with the residual arithmetic, i.e. addition and modulo multiplication. The designations appear in this residue class ring numbers can be represented by binary digits.

The occurring on the right side structure called the residue classes modulo the number, but these are not equipped with the residue class arithmetic, but differing with the cyclic arithmetic. This carries are at intermediate results, that are too large, lifted and pitched an additive effect on the final result. This corresponds to a shift of the Binärzifferdarstellung " on permanent " binary digits (right aligned placed on the least significant digit positions ) with subsequent addition. For example, results in the addition with not the value, but the value. From the resulting number structure with cyclic arithmetic modulo the factor ring is now formed yet. So the final results are still reduced modulo.

This implies that structure set the following: The modulo arithmetic can be replaced in just by cyclic modulo arithmetic in the larger number space with subsequent reduction.

Critical to the success of the structure presented in this font embedding is the property that the largest representable number in the cyclic number space (here, this is the number ) represents the number of the residue class ring. This condition is necessary. But that the cyclic arithmetic can be useful defined at all, must be a power of two on the other. Together, that represents the optimal choice for the size of the cyclic embedding space.

The classical quotient ring would, however, not suitable for embedding, because applicable in this ring, that is, the number is in this ring is a zero divisor.

Implementation

Do we have the numbers to be multiplied exist with binary digits, so we perform depending on whether is even or odd, different from recursion to the logarithm of the number of digits in a single step:

Recursion for odd m

This step is the return of on we carry through with the complexity.

It had to be multiplied by and Fermat. We will accomplish this step, the return on the Fermat number.

For the belonging to the two Fermat powers of two, we introduce the abbreviations

And

One. The number of points will be halved our denomination size, that is, we develop and in powers of:

Where for the " unique pieces " applies. In binary, this decomposition corresponds to a simple grouping of bit strings into pieces of length bits.

A small weakness of the algorithm ( which, however, reached the complexity barrier does not detract ) revealed now. To apply the " super fast DFT " on the piece and consequences, they must be filled to the next power of two in length with zeros; the numerical representation is therefore artificially extended to

By virtue of the above-mentioned structure theorem for cyclic arithmetic we move now from the residue class ring on the quotient space with the cyclic arithmetic. In this room is calculated for the multiplication problem

Where we in the last step uses the property in our cyclical " computing space."

In summary, the multiplication thus obtains the form

With the result coefficient

We can estimate upwards.

There now follows a description of the empirical formula, so that we in the FFT applied to a " halved FFT " to restrict.

It is true, that is

With in. By suitable addition we can shift the range of values into a positive, it is in fact, and with the definition

Applies

For the non-trivial (indices to) the estimate applies. Since the two numbers is prime, it is sufficient for the determination of the calculation of the radicals and.

Namely one has the remains and determined, so you can expect in complexity as follows: Compute first and then.

Determination of residues modulo 2n 2

Here we use a very typical for the computer algebra trick: We put the pieces consequences and long enough by inserting zero sequences with " safety margins " together so that by -product formation, the individual results are also lined up yet without overlaps in pieces. There are, then, and in. We now form

And have thereby. The product contains then in disjoint pieces of the bit length of the sums

With, because it is. For the terms of our original multiplication problem we see

For to determining residues, we obtain

The complexity required for the formation of all as well as the extraction of is; the multiplication of "costs ", overall, this is so.

Determination of residues modulo ( D 1)

Here, the DFT is used. We subject the " vectors" and with the DFT with and as the number -th root of unity. Since we only need the differences, the " halved DFT " is sufficient:

DFT for determining and only for the odd with
Multiplications for all odd
Inverse DFT to obtain all the differences from the odd

The complexity of this effort consists of steps of the single expense for the DFT (ie, total); added and the addition of the reductions modulo for the recovery of what can be covered in.

Recursion for even m

Also for this step the return of the complexity is achieved.

It had to be multiplied by and Fermat. We will also take place in this step, the return on the Fermat number.

For the belonging to the two Fermat powers of two, we perform analogous abbreviations

And

One. Again, the halved number of digits of our denomination size is, that is, we develop and in powers of:

Where for the " unique pieces " applies.

As above, we extend the representation of numbers on a power of two length

And similarly for.

Referring again to the aid of the structure theorem for cyclic arithmetic we now switch from the residue class ring on the quotient space with the cyclic arithmetic.

So that we can again

With the result coefficient

Represent. We can estimate upwards.

For we can again

Conclude, and with

Applies

With. For the non-trivial (indices to) the estimate applies. Because of coprimality of the two numbers and it is enough again to determine which residues and calculate.

Determination of residues modulo 2n 1

We use again the trick of inserting " safety margins " on: Let, then, and in. We form

And have thereby. The product contains then in disjoint pieces of the bit length of the sums

With. For this our original multiplication problem we see

For to determining residues, we obtain

Determination of residues modulo ( D 1)

With we undergo again the " vectors" and with the DFT, where this time we choose the number as a root of unity. Since we only need the differences, it is sufficient here again the " halved DFT ":

DFT for determining and only for the odd with
Multiplications for all odd
Inverse DFT to obtain all the differences from the odd

Summary

Starting with and with a complexity of digits length is achieved by the illustrated recursion.

Swell

Toom–Cook multiplication Horner's method Unit (ring theory)

716650