Google-Matrix

The Google matrix is ​​a square matrix that arises in the design of the PageRank algorithm. Since it is often very large ( with many millions of rows and columns), the numerical and algebraic properties of this matrix for quick and exact identifiability of the PageRank of great importance.

Definition

The normalized Google matrix of a network or directed graph with nodes is the real matrix

The individual components of the Google matrix are defined as follows:

  • The link matrix is ​​the normalized adjacency matrix row by row on the examined graph:
  • The vector is defined component-wise as
  • Is a real number between, and the damping factor is called
  • Is a vector of length one, so a vector of all ones has as entries. Thus, the matrix is exactly the one matrix.

Properties

PageRank

To calculate the PageRank one is particularly interested in the existence and multiplicity of links eigenvectors of the matrix. These correspond exactly to the ordinary eigenvectors of the matrix corresponding to the eigenvalue. If one interprets the eigenvalue problem

As the calculation of the stationary state of a Markov chain, so the vector is a stochastic vector consisting of the PageRank. This reduces the eigenvector problem for the linear system of equations

In order to solve this system of linear equations efficiently, raises the question of the regularity of the matrix and its condition number.

Standardize

Both the matrix and the matrix are substochastisch in general. Together, these two, we obtain a zeilenstochastische matrix, since the complementarity of the non-zero rows of the matrices. Since zeilenstochastisch is to be ( strictly speaking, even double - stochastic) and formed by the attenuation parameters only convex combinations ( with respect to which the stochastic matrices are completed) is the Google matrix also a zeilenstochastische matrix. This applies to the row sum norm of the Google matrix

And thus also for the column sum norm of the transpose

Eigenvectors and eigenvalues

The existence of an eigenvector of the eigenvalue follows directly from the fact that the matrix is ​​a stochastic matrix. That even the greatest amount of positive eigenvalue to which there is a simple strictly positive eigenvector, it follows from the Perron - Frobenius theorem of, as applies. Important here is that only the introduction of the damping parameter guarantees the positivity of the matrix and thus the solvability of the eigenvalue problem.

Furthermore, can still show that applies to all other eigenvalues. The separation of the eigenvalues ​​is thus determined only by the attenuation parameter. Thus, for many of the numerical method for the eigenvalue calculation, such as the power method, a good convergence rate guaranteed as long as the attenuation factor is not set too close to. Normally applies.

Regularity condition and

Because

Applies, provides the Neumann series, the invertibility of the matrix

Thus, the problem can be solved as a linear system of equations. The same applies also to the standard of the inverse

And therefore for the estimation of the condition number

Thus, only the selection of the attenuation parameter responsible for the condition, and should not be chosen too close to back.

Numerical calculation of the eigenvector

The largest amount eigenvector of the Google matrix is ​​typically determined by the power method approximation. Here, the matrix-vector product of the Google matrix is formed starting with the current approximation of the eigenvector of an initial guess in each iteration. In each iteration, therefore,

To calculate. If the initial guess is a stochastic vector, then each successive approximation vector is stochastic. After the eigenvalues ​​of the matrix are well separated Google, a slow convergence speed of the power method is excluded.

In the calculation of the special structure of the Google matrix can be exploited. The link matrix is occupied extremely thin in the rule, that is, almost all of its entries are zero. This allows it to be stored very little space and are very efficient multiplied to the other with a vector to a. Also, the vector is sparse in general, thus the term can also be calculated very quickly.

Example

Considering as an example the rightmost directed graph with 8 nodes, the nodes 5 and 6 are dangling nodes. Then the row-wise normalized adjacency matrix

And the vector

Then, with the above construction, and a damping parameter of

The eigenvector of the eigenvalue 1 is then

This has given the nodes 7 and 8, the highest PageRank ( 0.2825 and 0.2654 ) and the nodes 1 and 6 the lowest ( depending 0.0675 ). The amount second eigenvalue, the above estimate is so sharp. Furthermore, the condition number

Even this estimate is so sharp.

272823
de