Quadratic sieve

Quadratic sieve is a term from the field of number theory in mathematics and designates one of the fastest known algorithms for factoring large natural numbers. It is a general factorization, ie the running time depends only on the size of the factorized number and not on specific properties of the number (or its divisors ). For numbers with up to 100 decimal places, it is the fastest ( general ) factorization. For larger numbers, the number field sieve is faster. The running time for factoring a number n with the quadratic sieve ( applicable under some probable assumptions) in the order of

Genesis

Building on the continued fraction method of John Brillhart and Michael Morrison and inspired by the linear sieve by Richard Schroeppel invented in 1981 by Carl Pomerance theoretical considerations, the Quadratic sieve, which was faster than all previously known factorization.

Shortly thereafter, James Davis and Diane Holdridge and Peter Montgomery independently found a variant of the quadratic sieve with multiple polynomials ( called MPQS ). Another improvement, the so-called special quadratic sieve comes from M. Zhang, which can be applied only to specific numbers but.

In 1994 it was possible to factorize using the quadratic sieve the 129- digit number RSA -129.

More on the history of the quadratic sieve is to be found in the article History of factorization.

Operation

Like most modern factorization uses the quadratic sieve the representation of a product as a difference of squares. Due to the third binomial formula:

So rather than investigate the divisibility of numbers, looking to a representation of the number as the difference of squares. From a display, you see the dividers as well as.

In the factorization of Fermat you calculate the value for different numbers until you get one that is a perfect square. First, select the smallest number that is greater than the root of. Then, it counts up at each step by one. Applying this method, for example for factorization of the number 1649, one obtains for the values ​​in the following table.

The factorization method of Fermat arrived in 16 steps to the target and then by the number and the square compute the factorization of:

If we multiply to each other and the function values ​​, we get the square 25 × 23 · 52 = 28 · 52 = (24 × 5 ) 2 We would like to take this square for a decomposition. However, the equations can not be multiplied without further notice. Maurice Kraitchik extended the representation of Fermat. He looked equations in which is a multiple of:

If, however, n neither x y nor divides xy, then applies gcd > 1 and gcd is a nontrivial factor of n, instead of the equation, we consider the following congruences:

These congruences can now be multiplied. We multiply the congruences x = 41

And x = 43

We have the following quadratic congruence found:

With gcd (41 · 43-80, 1649 ) = 17 and gcd (41 · 43 80, 1649 ) = 97, we have 1649 = 17 · 97 broken down into its factors. Not every quadratic congruence provides real divisors, but on average every second quadratic congruence provides a real factorization. You have to consider now much less function values ​​to obtain a square, and thus a separation. How can you efficiently determine which function values ​​aufmultiplizieren to a square? If you know the prime factors of the numbers, the multiplication of numbers to addition of the exponents of its prime factorization. One considers therefore only numbers q (x ) whose prime factorization consist of pre-determined factors. A congruence is exactly then a square, if the exponents of the prime factors are. Under these restrictions, one can determine the congruences that multiply to a square, using methods from linear algebra.

In general you are looking for in a first phase by congruences. One has found a sufficient number, a subset of congruences are searched, which, multiplied together, to provide both sides of a square.

Let us look at another example. Wanted is the prime factorization of n = 87463rd We will consider only numbers that consist of small factors. Be the largest prime factor of 29, since for p = 5,7,11 and 23 has no solution come these numbers never before as divisor of. The so-called factor base, which contain all possible factors of the prime factorization, and thus consists of the prime numbers 2,3,13,17,19,29. The matrix of the exponents of the prime factorization mod 2 is as follows:

Wanted is a linear combination of rows that yield the zero vector. A solution consists of line 3 and line 6

Gcd (xy, n) = 587, gcd ( x y, n) = 149 Thus we obtain the factorization 87463 = 587 · 149

This basic idea is used sieve in Dixons, where one uses random values ​​for x. In the quadratic sieve is considered successive values ​​of x, from which one can determine the prime factorization quickly. Determining such useful congruences is called Seven. The algorithm can be divided into two steps:

Sieving

In the screening step are congruences of the form

Found, and the prime factorization of q is known and only small prime numbers is ( in other words, q is relative to a fixed barrier be smooth ). Is chosen, the numbers x in the vicinity of the root of n, so that the values ​​are small. The probability of finding smooth numbers q (x ), is therefore high. Determining the prime factorization by trial division is time-consuming. To check efficiently whether a number is smooth, we used the following property:

So if you have found positions x, where q ( x) is divisible by p, one can determine an entire sequence of which are divisible by p. p divides exactly when. Determining the Square roots modulo a prime number is efficiently solvable ( for instance by means of the Shanks - Tonelli algorithm). The sequence of numbers is also divisible by p is similarly determined by a screening process of the Sieve of Eratosthenes. The quadratic sieve derives its name from the release of the ' quadratic ' equation and the ' Seven ' the divider.

In principle, one proceeds as follows:

Step 1: Select a factor base.

Take all primes p up to a limit S, is for the n quadratic residue, ie the equation x2 = n (mod p) can be solved. Numbers, is for the n quadratic non- residue, can be excluded, since they do not occur as divisors of x2 -n. The larger the barrier is selected, the greater the probability that X2 is N only prime factors up to this limit. The disadvantage is that we need more relations to solve the resulting system of equations. If the barrier is too small, decay very few numbers as required, and we have to consider a lot of numbers.

Therefore, it selects the threshold S in the order of

Step 2: Screens with the factors of the factor base.

Choose a Siebintervall in the order of

For numbers x with | x - √ n |

The numbers to be tested q ( x) are of the order of n divisions on these numbers are expensive ( for typical n are these stored no longer hardware-related formats). Since the Seven is running time-critical, we modified Step 2 It does not store the numbers q (x ) itself, but rather its rounded to whole numbers logarithms (or n as the upper bound for q ( x)). These small numbers can be treated with primitive data types. From the division by a divisor p is a subtraction to the logarithm of p. For speed reasons we omitted in practice on the sieves with powers of factors.

It is estimated from the calculation error ( and ignoring multiple divider ) by a barrier T in the order of the logarithm of the largest factor of the base factor. The numbers from the list that are less than T after sieving are smooth with high probability and are recorded in a list. Not all noted in the list numbers are necessarily smooth. In an additional step the prime factorization of these numbers is determined and indicated whether the number is smooth or not.

Determining the prime factorization of numbers probably smooth over the factor base can be done with a factorization, that is suitable for factors of this size. For small factors, the Pollard 's rho method offers. Another method is to provide a second time to seven. If you hit it on a smooth probably number, dividing it by the factor by which one seventh, or their powers. Since the hit rate for large factors is small, offers this for the larger factors of the factor base. The screening can be further accelerated when determining the gcd of the test number with the product of the factors, the factor base.

Selection step

Q consists only of factors of the factor base to a ( smooth ) congruence. q can be fully described as a vector of the exponents of its known factorization. To the exponent vectors of the congruences provides you to a system of linear equations over the finite field F2, in which each row is the exponent vector of a congruence modulo 2. A number is accurate then a square if all exponents of its prime factors are. So if you find a non-trivial linear combination of rows that yield the zero vector, one also has a quadratic congruence found. This provides by calculating gcd ( uv, n) by a factor of n, which is neither 1 nor n in at least half of all cases.

To the solution of this step using linear algebra method such as Gaussian elimination, the method of conjugate gradients or Lanczos methods. The block Lanczos method, an extension of the Lanczos method, such large - solve matrices in a fraction of the Siebschritts save space (linear in the number of rows) - but very sparse.

Area of ​​application

The quadratic sieve is suitable for large numbers to about 110 decimal places that are not Primpotenz. For larger numbers, the number field sieve is more appropriate.

1994, the number RSA -129 was factored with the Multiple Polynomial Quadratic Sieve taking advantage of partial relations. This number with 129 decimal places was in their two dividers (one with 64 and the other with 65 decimal places) disassembled. The screening step was performed distributed by 600 volunteers. These collected for 8 months congruences that were transmitted by e -mail (or ftp) to the central computer. The selection step on the 298 GB of data was carried out in 45 hours on a supercomputer. The factor base consisted of 524338 prime numbers, the matrix had a size of 569466 rows and 524338 columns.

All other factoring records were set with the number field sieve.

If one measures n the running time of the algorithm with respect to the length of the input, then we can write the term of the quadratic sieve as:

For results in an exponential growth. The sample Divison has such a run-time behavior for. With there would be an polynomial-time algorithm with running time. It is therefore the quadratic sieve to an algorithm with a superpolynomialen but subexponential running time. When Zahlenkörpersieb constant on 1/3 could be reduced. However, c that the runtime asymptotically less affected than, there is substantially greater. There are improvements to the basic idea of ​​the quadratic sieve, reduce the runtime continues:

Partial relations

Even relations which are not smooth, can be ( smooth ) relations are combined, which are useful in the selection step. If you have two partial relations, whose prime factorization contains a factor P ( outside the factor base )

So give this a congruence

Multiplying by P-2 even one obtains the following relation smooth

In the previous relation, all factors come with odd exponent of the factor base. This relation can thus be used for the selecting step. If one limits the factor P in size, one can determine it with a little additional effort: It increases the barrier T for the interesting numbers in the screening step to log ( P). The factor P finally remains in determining the prime factorization by dividing factors with the factor base. By partial relations can increase the useful for the selection step relations. The running time can thus be halved.

Multiple polynomials

The size of the numbers generated in the quadratic sieve increases linearly with the distance from the zero point. In the Multiple Polynomial Quadratic Sieve ( MPQS ) we define different ( disjoint as possible ) functions, each of which contains a fixed factor, and show the same growth. The search interval can be split into several polynomials. So that the divider to be examined and smaller numbers are likely to produce a smooth speed increases.

Modifying the function q ( x) = x2 - n, by using in place of X now a first degree polynomial.

The Multiple Polynomial Quadratic Sieve considered a set of polynomials

Here b is chosen so that b2 - n is divisible by 4: b2 - 4ac = n. This is true

And the value produced with this qa (x ) contains 4a as a factor. The choice of straight factors 2a generates besides the factor 2a an additional factor of 2 in the generated numbers.

The screening process works similarly to the normal quadratic sieve, but you have to each factor p of the factor base the multiplicative inverse of a mod p can be calculated. It is important that the change between two polynomials ( with different values ​​for a) does not devour too much time. The inversion of a ( for example with the erweiterteren Euclidean algorithm) can be done in linear running time, yet this change of the polynomial requires a large proportion of the total computation time.

In the Multiple Polynomial Quadratic Sieve ( MPQS ) is chosen as a square of a prime. Thus, the equation b ^ 2 = n mod a for b exactly two solutions and is efficiently determined. The procedure was developed in 1983 by JADavis and DBHoldridge.

When Self Initializing Quadratic Sieve ( SIQS ) a is chosen as a product of factors of the factor base. This exist to an a more values ​​for b as the MPQS. In order to to be examined numbers to get the same number ( in the same order ), you have less values ​​of a must now examine. This must be calculated globally less multiplicative inverse. Thus, the SIQS is faster than MPQS. This method was discovered by René Peralta and WR Alford and Carl Pomerance 1995.

Most computer architectures with many small search intervals to work more efficiently than with a big, as they may be kept rather in a cache. It therefore follows by handing the search interval ( by the multiple polynomials ), a further increase in speed. This can be supported by the fit another dividing the search interval into regions in the cache.

Prefactors

Can be the whole process instead of the number n and to a multiple of k x n apply. By changing the number to be factorized changes usually also the factor base. The clever choice of the prefactor can integrate additional factors in the factor base. Small factors in the factor base reduce the length to be examined numbers when more than seven major factors. Due to the variation of k one can increase the number of small factors in the factor -based and thus increase the number to obtain a smooth probability. The integration of the factor 2 in the factor base has certain additional advantages. To divide this factor from the candidates for smooth numbers out, you have to do is run shifts instead of complex divisions. If we have n following for k ·

Then for all generated numbers

That is, we examine only odd numbers and all numbers generated include a factor of 8 as multiple polynomial with a = 1, one would expect a factor of 4. Thus, the numbers thus produced contain an additional factor of 2 through the multiplication by k to grow the numbers generated, but this growth is often accompanied by the additional factor of 2, and the reduction of the length of the numbers to be screened for more than compensated. A k that maximizes the number to obtain a smooth, the probability can be determined efficiently by an estimate of the quality of the factors in the factor base ( Knuth Schroeppel function).

Implementations

  • Msieve, an implementation of the Multiple Polynomial Quadratic Sieve, with the support of partial relations. Written by Jason Papadopoulos.
  • YAFU, Ben Buhrow, is arguably the fastest implementation of the Self Initializing Quadratic Sieve.
  • Factoring applet by Dario Alpern. A Java implementation of the SIQS.
666740
de