Elo rating system

The Elo rating is a number that describes the playing strength of Go and chess players. The concept has since been adapted for various other sports.

Arpad Elo developed the underlying objective scoring system in 1960 for the U.S. Chess Federation USCF. It was taken over in 1970 by the World Chess Federation FIDE ( at the congress in Siegen ).

The World Chess Federation calls his system " FIDE rating system ." A rating of officially called " FIDE rating" is, but colloquially usually referred to simply as " Elo ". In addition to the international rating system of FIDE and national rating systems with different names exist. In Germany, this means the national rating system DWZ, in Austria (national) Elo ratings are calculated and in Switzerland there is a guide list with guide numbers. These systems cost a lot more from local tournaments, calculate the rating numbers but also by the methods of Arpad Elo, usually with only minor modifications and different factors.

  • 2.1 Tournament Category
  • 2.2 Statistics
  • 2.3 Problems of rating systems 2.3.1 intransitivity of probability relations
  • 2.3.2 deflation and inflation
  • 2.3.3 The Thousand- games - problem

Calculation

Who, for example, just joined the chess club, has no Elo rating. After a series of games against different players his Elo rating is assessed first. After this phase, the actual results of the games will be counted for the Elo score. For each calculation of the new Elo article, the expected number of points is important, the player reaches A vs. Player B expected. Where: A win there is one for a draw and a half for a loss no points.

Note: If there were no draws, so the expected number of points would be just the probability that A wins. As a chess game can end in a draw, the expected score is equal to the probability of winning plus half times to remisieren of probability. The chances for victory, draw and defeat are not needed in the Elo system, but only the expectation values ​​.

( If the difference in rating of more than 400 points, the value is used 400 or -400 instead of the actual difference. )

The expected value for A is now EA · 100 %. The new Elo of player A is

  • Note 1: The number contained in the Formula 400 and the original k- factor were chosen by Arpad Elo so that the Elo ratings with the rating of the numbers previously used rating system by Kenneth Harkness are as well compatible. Indeed, one can interpret the Harkness model as a piecewise linear approximation to the Elo model.
  • Note 2: It is easy to show mathematically that applies.
  • Note 3: The profit expectations of a player as a function of the difference in points to another is a logistic function in Elos model. To avoid a misunderstanding: This does not mean that the skill levels are modeled as logistically distributed random variables, this is actually not the case - the characteristic of Elos model property of the expectation values ​​can be calculated from any plausible distributional assumptions (such as a normal distribution ) can be derived.

A ( fictional ) example

The chess player Garry Kasparov (Elo 2806 ) playing against the chess player Zsuzsa Polgár (Elo 2577 ). According to the first formula one expects that Kasparov ( Player A ) against Polgár (player B) the average EA = 0.789 points per match:

After a game, there are three possibilities.

Polgár wins

So SA = 0 The new Elo scores R'A for Kasparov and R'B for Polgár are

So Kasparov loses eight Elo points, while Polgár won eight Elo points.

Kasparov wins

So SA = 1 Kasparov gets two more Elo points, Polgár loses two:

Draw

So SA = 0.5. Kasparov loses three Elo points, Polgár won three:

Chess

Before the introduction of the Elo rating is downgraded the player a chess into nine classes or categories. A difference of a class meant that the better players can expect as a result of a game of 0.75 points. In the Elo system, this skill level difference corresponds to a difference of ( pretty much ) 200 rating points.

It should be noted that receiving the various titles Grandmaster (GM) and International Master (IM) not only because of a certain Elo, but determined by the fulfillment of other standards. To get the title after meeting all standards, a budding GM must, however, an Elo rating of at least 2500, have achieved a number of at least 2400 once an IM. The requirements of Title for women are both around 200 Elo points lower than at corresponding title for men.

The extent of a class is 200 Elo points. The system is calibrated so that a difference of 200 points of profit expectation of the stronger player of 76 % corresponds to 400 points correspond to 92 % profit expectation, the formula P = 1 / ( 1 10-D/400 ) makes up only approximately. The comparison is based on statistical methods. At 600 points difference, the stronger player wins practical and statistically almost always (98 %), and that although the skill level in humans, of course, depends on the daily form and motivation. For computers, the distribution is not only by definition equal to 200 -point, but also very similar from cornering forth beyond, but there are at similarly strong machine another skill level spread in the different game phases.

Tournament Category

Also round robin tournaments are divided into categories according to the average Elo rating of the participants. This corresponds to a difference to a category 25 Elo points. As a tournament of Category 1 while a tournament is classified, whose participants have on average from 2250 to 2274 Elo points. The currently strongest tournaments achieve category 22, which represents an average from 2775 to 2799 Elo points. In the " Zurich Chess Challenge 2014" unique category 23 (having a Elo average of 2801 ) reached in January 2014 for the first time.

Statistics

Elo system divides the chess players by means of a rating of nine classes, wherein the lower limit of the highest class 2600 and the upper limit of the lowest class is 1200. The rating of an individual player numbers are interval- scaled and approximately normally distributed and vary with a standard deviation of 200 to a mean value. There are many players with skill levels below 1200, the Elo system is at this level of play in the prediction security but only a limited value. It is especially important to recreational players level that a player can defend his number against stronger opponents without having to concentrate on special features, such unconscious mental weakness or poor time management by newcomers. Utopian high values ​​are corrected by defeat quickly, accurately and reliably. The fairly stable Elo rating is determined by various methods. Some go a few games or from similarly strong tournament participants, after many games all achieve very similar equilibria.

Basis of the calculation is the hypothesis that the distribution of skill level in the whole of the player corresponds mathematically to the normal distribution ( bell curve ). Based on this hypothesis can be statistically predicted for two opponents, is the probability of a player winning. In the special case of identical rating of the probabilities are equal. In a tournament to be based on the rating of a player and the average of the rating numbers predict his opponents, which score he will probably achieve. After completion of the tournament, the actual result is compared with the statistically predicted result and calculates the player's new rating of from the deviation.

Problems of rating systems

Intransitivity of probability relations

If Player A vs. Player B the favorite and B over C, then A has a higher rating than B and B is greater than C. This means that A has a higher rating than C and should be favorites against C.

This conclusion is by no means mandatory, since likelihood or preference relations are not necessarily transitive. This problem is of course not peculiar to the Elo system, but a fundamental problem of all rating systems. (see Condorcet paradox, "Chinese dice " or " Intransitive dice " )

However, transitivity is a necessary prerequisite for a meaningful rating system. To ensure this property, additional special assumptions shall be made on the probability distributions of the skill levels that are to be interpreted as a random variable. To this end, Arpad Elo sat in the development of its rating system as an additional hypothesis a quantitative statement regarding the relationship of the skill levels of A and C advance.

If we let even the possibility of draw disregarded, says the basic idea of the Elo system is that if, for example player A vs. Player B a 3-1 favorite (ie A wins 75 % of games against B), and B over C a 2:1 favorite, so calls or follows from Elos model that A over C is a 6-1 favorite. Without this requirement would need A not even have to be the favorite.

General: If A is an x :1- favorite over B and B a y :1- favorite over C, then according to Elo's model A is an xy :1- favorite against C.

This can be easily recalculate - this requirement is of course far beyond the purely qualitative statement of transitivity beyond. However, this is not a multiplicative consequence of a normal distribution. Although one often reads that the Elo model assumes a normal distribution, but satisfies this assumption only to a very rough approximation, the demand for multiplicative, so the demand for multiplicativity is the better starting point for the development of the model - in particular for the calculation of skill levels of players from earlier eras.

Deflation and inflation

If one wants to use the Elo ratings - or other ratings, this applies not only to the Elo system - compare the strengths of players from different eras, it should have a rating of 1600, for example, in 1970, equivalent to a rating of 1600 be from the year 2000. In particular, should there not at least impairs the average skill level in the course of time due to the evolution theory, not decrease, the average rating number.

When Elo the winner will win a game as many rating points as needed, as the loser loses: the average skill level of both is the same. Includes the ratings - only pool top players, so is this phenomenon to observe: Whenever a player is newly included in the ratings, it occurs with a certain (low ) a number of points. Throughout his career, he improves his strength, gains additional points, and separates later with a ( high ) number of points - thus the total points will be withdrawn and the average rating number decreases; that is, the system is deflationary.

If you enlarge the ratings pool, the opposite effect occurs: Many players leave the ratings pool with a lower rating than they were allotted at entry - the system will now be inflationary.

This was particularly the case earlier, when the World Chess Federation FIDE chess players only recorded from a rating of 2200 in the rankings. Since the Elo - evaluation of tournaments is not included and thus represents a source of income for the FIDE, this threshold was gradually lowered, most recently in July 2009 to 1200. Nevertheless, it is inevitable that many players the ratings pool with lower rating numbers leave when they received upon entry. However, a moderate inflation is welcome, it should take into account in their degree of development of abilities over time, but here results in most cases the problem of too much inflation.

Thus, the Elo ratings could always reach new records without actually still be a measure of the skill level absolutely. About 20 years ago there were only two players with an Elo rating greater than 2700, and only about 10-20 players reached a value greater than 2600. In July 2010, had over 200 active players an Elo rating greater than 2600, of which at least 37 2700; three players even have an Elo rating of 2800 or higher, which seemed unthinkable 20 years ago.

The average Elo rating of the top 100 players in the world ranking rose between July 2000 and July 2012 from 2644 to 2704 points, ie an increase of 60 rating points.

The thousand - games - problem

Another phenomenon is the so-called thousand - parts problem. Often meet players of the same skill level to each other again and again. Suppose that two players with Elo 2000 play ten games in which one reaches the 80 % of the points. After calculating the new Elo rating, the values ​​for the winner in 2080 and 1920 for the loser arise. However, wearing the two players 1000 games with the same ratio of points, without the score to be updated, it is clear for the winner a new rating number that is higher than the current world champion. However, this scenario is rather constructed. According to the statistical law of large numbers may be expected that the two equally strong players ( both had so Elo 2000) approach after many games the expected 50%. Furthermore, there will in practice never 1000 games without a rating upgrade.

The development of the numerical values ​​is also affected by the evaluation period. Until 2002, half-yearly, quarterly evaluated until 2009. From July 2009 to July 2012 every two months was evaluated. As of August 2012 will be evaluated monthly. The minimum rating of 1000 points is from then, so far, it stood at 1200. Would make sense in principle, an evaluation after each tournament since such shape variations of players can be compensated better. However, this is not currently planned.

Skill levels of selected chess players

Once the Elo rating was introduced as a rating system in 1970, initially had Bobby Fischer's record of 2785 points of July 1972 for many years inventory. Reached in 1999, the then PCA world chess champion Garry Kasparov, the Elo rating of 2851 points, which was surpassed by Magnus Carlsen with 2861 points until January 2013. Meanwhile, he was able to increase the record to 2881 ( List in March 2014).

Grandmaster usually come to an Elo rating of at least 2500, from 2600 points, one can speak of the advanced world class. The state of the FIDE evaluation from April 2014 shows the following table with the twenty highest-rated active players, supplemented by the best wife and the best male and female players from Germany, Austria and Switzerland (in brackets: place in the women's rankings):

Historical Elo rating in chess

For the comparison of today's top players with grandmasters before the introduction of the so-called Elo Historical Elo rating will be used.

Computer Chess

This Elo ratings can not be compared readily with those of human chess players, as they were in most cases determined by games between computers and not by participating in official tournaments.

Go

At Go the skill level in traditional kyu ( pupil ) and Dan - degrees ( master ) is specified. The determination of these skill level is based within the European Go Federation and many Go servers on the Internet on a secondary Elo system that Kyu and Dan Grade maps as follows:

Football

An adaptation of the Elo system for men national teams in football are the World Football Elo Ratings. The FIFA rankings of the women on the other hand officially determined with an adapted Elo system. Unofficial Elo ratings, are also made for football clubs.

Table tennis

The Swiss Table Tennis Federation STT uses since the season 2010/2011 a slightly modified Elo formula for the calculation of rating points

The expected value for A is now EA · 100 %. The new points - score of player A is

Scrabble

For global Scrabble (Global Scrabble ) a Elo ranking of the World English -language Scrabble Players ' Association ( WESPA ) is performed. Ranked one of the Elo ranking the New Zealander Nigel Richards is ( 2094 Elo points ).

Since 2009, an Elo ranking is also performed for the German -language Scrabble - based on tournaments from 2005 Of 178 players from 5 countries here is the German Ulla Trappe with 1801 Elo points on rank 1. (As of March 6, 2012 ).

305113
de