PageRank

The PageRank algorithm is a procedure that a lot of linked documents, such as the World Wide Web to assess the basis of their structure or weighted. In this case, each element a weight, the PageRank assigned due to its link structure. The algorithm was developed by Larry Page (hence the name PageRank ) developed and Sergey Brin at Stanford University and an account of this patent. He served the search engine Google was founded by Brin and Page company, Google Inc. as the basis for the assessment of sites.

The PageRank algorithm is a special method to determine the link popularity of a page or a document. The basic principle is that the more links point to a page, the higher is the weight of this page. The higher the weight of the referring pages, the greater the effect. The goal of the procedure is that links to sort the weight accordingly so as to produce a result sequence in a search query, that is, links to more important pages earlier in the results display.

The PageRank algorithm emulates a randomly surfing through the net users. The probability of this comes across a website that correlates to the PageRank.

The PageRank algorithm

Construction

The principle of the PageRank algorithm is that each page has a weight ( PageRank ) is the greater, the more pages link ( with the highest possible its own weight) on this page. The weight of a page that is calculated from the weights of the on linked pages. Linked to a total different pages, so the weight is distributed proportionally from these pages. The following recursive formula can be viewed as a definition of the PageRank algorithm:

In this case, each side is the total number of pages and an attenuation factor between 0 and 1, with a small proportion of the weight () removed and will be distributed evenly on all sides detected by the algorithm. This is necessary so that the weight is not on side " flows " that refer to any other page. Often the above formula is given without the normalization factor.

Similarly, the equation as a line by line notation of the linear system

With

Is interpreted, the Kronecker delta and denotes the matrix by

Is defined.

, This equation leads to the eigenvalue problem

Wherein the matrix is ​​known as Google and denotes the transpose of the matrix.

For the solution of the equation system unique ( set of Perron - Frobenius ). A smaller damping factor leads to a more homogeneous distribution of the PageRank.

The solution of the linear system of equations can be analytically or numerically. By using the Jacobi iteration for the numerical solution results in the above recursive equation. Other numerical methods for matrix inversion, such as the Minimal residual, the Überrelaxations or the Gauss- Seidel method, however, converge faster in general.

Random Surfer Model

The random surfer model offers an alternative interpretation of the Page Rank Algorithm, which comes from the stochastic. Normalizing the PageRank of 1, one can interpret the weight of a page as the probability that a random surfer ( random path ) is on this page. A random surfer moves through the network by randomly selects one of the outgoing links of the current page with probability d. With probability 1-d selects any new page. The model can be seen as a Markov chain.

Rational Surfer Model

The Rational surfer model is a 2010 patent filed by Google. It builds on the random surfer model dar. Here, the importance of the links is differentiated by placement according to empirical data. The aim is to weight more links that are clicked by a rational surfer with higher probability. Thus, link buying is to be counteracted.

History

The idea of ​​the PageRank algorithm originates from the sociometry and can be used in the technical literature in 1953 demonstrate Katz. In 1949 Seeley used the procedure to declare the conclusion of the status of an individual, but there is in his description no normalization to the number of outgoing edges and no damping term. The latter was introduced in 1965 by Charles H. Hubbell.

Brin and Page developed the algorithm in 1996 at Stanford University, Page announced in 1997 a patent (which was entered on the Stanford University) and published the algorithm in 1998. In their original paper they quote Massimo Marchiori (University of Padua, developer of Hyper Search) Eugene Garfield, the citation analysis developed in the 1950s, and Jon Kleinberg, who developed about the same time as Brin and Page Hubs and Authorities ( HITS).

In addition to Brin and Page, Kleinberg not only, but also to Robin Li developed in China in 1996 a similar algorithm ( RankDex ), the (patent 1999) he used in Baidu.

According to the Google founding the Stanford University received Google shares for $ 1.8 million for the patent, which was exclusive to Google. In 2005 she sold the shares for $ 336 million.

Researchers at Washington State University indicate that Google's PageRank algorithm can also be adapted to the geometrical orientation of water molecules relative to other molecules in a solution, eg to calculate those toxic chemicals approximation.

Adjustments

The algorithm used by Google today has probably no longer exactly this form, but goes back to this formula. Alternative algorithms are the method of Hubs and Authorities by Jon Kleinberg, the Hilltop and the TrustRank algorithm.

2013, the ranking was replaced by the new algorithm Hummingbird. The PageRank is only one aspect, the Hummingbird includes in its assessment.

Toolbar and directory values

Information about the PageRank can be inferred from the Google Toolbar and the Google Directory. The displayed by Google in the Toolbar PageRank is between 0 and 10 The value specified in the Google Directory in early 2008 0-7 corresponds, in the meantime but the value displayed in the toolbar. The displayed values ​​are the real PageRank off on a logarithmic scale and enter the result as rounded integer value.

The displayed in the Google Toolbar PageRank was previously updated every thirty days. Meanwhile, the interval between updates is performed very irregular, the interval length fluctuates between slightly less than thirty to over one hundred days.

Manipulation

Due to the economic importance it has now come to targeted manipulations and falsifications. Thus, the system in practice Suchmaschinenoptimierern by search engine spamming in the guest books, blogs and forums, was committed by the operation of link farms and other dubious methods. This includes, among other things, the text displayed in the Toolbar PageRank of a poorly rated by forwarding page to an existing page with a high PageRank to reflect. The forwarding causes a copy of the display of high PageRank of the destination page with the following update. If the forwarding then removed, so the visitor the actual page content is presented in conjunction with the mirrored PageRank for the duration of the then current interval. The actual PageRank value and ranking in search algorithm is unaffected, only the display is manipulated. This can for example be used fraudulently for it to achieve a higher price on the sale of the domain or of links.

In early 2005 Google implemented with rel = " nofollow" a new attribute for references in an attempt to combat spam. Links that are marked with this attribute will not be considered for the PageRank calculation. By marking of outgoing links can be, for example, to counteract the guestbook, blog and forum spamming. However, this method is controversial because it is not all search engines pay attention to the attribute and the other, the links are not taken into account for computing PageRank, the linked sites are still being crawled by most search engines.

Disadvantages

The disadvantages of PageRank at a glance:

  • The decisive factor is not the reader's interest, but only the other website operator.
  • Financially strong site operator could be earned back links and are thus positioned higher in search results. This means that instead of high-quality content often determine the financial ability of the order of search results.
  • Webmasters often look at PageRank is the only measurement attribute for the link exchange. The contents of the linked pages into the background.
  • The PageRank does not contribute to the qualitative measurement of websites.
262207
de