The Elo Rating System for Chess and Beyond

preview_player
Показать описание
The Elo Rating system is a method to rate players in chess and other competitive games. A new player starts with a rating of 1000. This rating will go up if they win games, and go down if they lose games. Over time a player's rating becomes a true reflection of their ability - relative to the population.

Below are some of the things I wanted to talk about, but cut so the video wasn't too long!

Some explanations of the Elo rating system say it is based on the normal distribution, which is not quite true. Elo's original idea did model each player's ability as a normal distribution. The difference between the two players strengths would then also be a normal distribution. However, the formula for a normal distribution is a bit messy so today it is preferred to model each player using an extreme value distribution. The difference between the two players strengths is then a logistic distribution. This has the property that if a player has a rating 400 points more than another player they are 10 times more likely to win, this makes the formula nicer to use. Practically, the difference between a logistic distribution and the normal distribution is small.

We replace e with base 10, s=400, mu=R_A - R_B and x=0 in the cdf.

For the update formula I say that your rating can increase or decrease by a maximum of 32 points, and I said there was no special reason for that. This value is called the K-factor, and the higher the K-factor the more weight you give to the players tournament performance (and so less weight to their pre-tournament performance). For high level chess tournaments they use a K-factor of 16 as it is believed their pre-tournament rating is about right, so their rating will not fluctuate as much. Some tournaments use different K-factors.

In the original Elo system, draws are not included, instead they are considered to be equivalent to half a win and half a loss. The paper by Mark Glickman above contains a formula that includes draws. Similarly the paper contains a formula that includes the advantage to white.

On the plus side, the Elo system was leagues ahead of what it replaced, known as the Harkness system. I originally intended to explain the Harkness system as well, so here are the paragraphs I cut:

"In the Harkness system an average was taken of everyone's rating, then at the end of the tournament if the percentage of games you won was 50% then your new rating was the average rating.
If you did better or worse than 50% then 10 points was added or subtracted to the average rating for every percentage point above or below 50.
This system was not the best and could produce some strange results. For example, it was possible for a player to lose every game and still gain points."

This video was suggested by Outray Chess. The maths is a bit harder, but I liked the idea so I made a in-front-of-a-wall video.
Рекомендации по теме
Комментарии
Автор

The fact that Elo isn't acronym is the biggest plot twist in the history of anime.

tobiaschaparro
Автор

"Your rating is a measure of your ability relative to the population"
That is an important factor to remember

thelastcube.
Автор

i plugged my rating against carlsens and it returned "LOL"

ihmcfly
Автор

Thank you for explaining through mathematics that I'm trash at chess.

codyjackson
Автор

I love how James looks incredibly normal throughout the whole video, but he picked a frame where he looks like he's biting air for the thumbnail

sb_dunk
Автор

I am 62 years old and I am a chess master and a mathematician. A great achievement of the Elo system has been to give value to every game. Previously, when a player had no more goals to achieve in a tournament, their game also got worse and there were "false" results.
But the Elo system has also been shown to have a dangerous flaw.
What happens when a player thinks he has reached or even surpassed his best possible elo?
If his goal is to have a good score, more than to fight in tournaments, then he will be tempted ... to stop playing.
And that's what happens for a lot of players!

PatoPat
Автор

The 32 that is mentioned in the video is the k factor which is in fact a variable. K factors are used in order to exercise some control over ratings drift. A high k factor is applicable to young and fast improving players who will quickly suck points from the rating pool leading to deflation unless some mechanism exists to counter it. On the other hand, long established and highly rated players are normally associated with a low k factor which has the effect of dampening the vicisstudes of tournament play. Such players are likely to have arrived at a plateau in their development, and their rating should contain a greater historical element. The control of a the average and distribution of a rating pool is the major task of all rating officers. The aim is to make ratings consistent over time so that a good club player rated 1800 today has the same ability as somebody rated 1800 fifty years ago - not a trivial task.

fredharte
Автор

English is not my native language and I really wanted to say that the way you express yourself as well as your accent are perfect for people like me ! Thank you for being that smooth in your speaking and clear in your explanations.

Thepokeshasseur
Автор

4:53
that 32 is a very important factor called K-factor, which is the sensitivity of each game affecting the elo system. The calculation of K-factor could be another topic. it might be depended by Elo Tiers, Continuous Win/Lose, Day of last previous game, etc.

timanlam
Автор

I've played multiple online games that use this system, I've always wanted to know how it works. Thanks for making this video!

cookiecan
Автор

Your 'random number drawing competition' analogy to explaining the normal distribution and probability of overlap was simply brilliant. That would've helped me a lot in my earlier statistics classes!

RetroLPGames
Автор

I need a separate Elo rating for sober playing vs high af or an equation to factor it all in

zxxkcxxz
Автор

Wish there was discussions of the weaknesses of the Elo system, such as that it's really only good for 1v1 games and that it creates incentive at the top end for players to get to a high ranking and then stop playing so as to not risk losing.

yokokuramaful
Автор

There is one unintended side effect of this though in online scrabble for instance. High rated players ONLY want to play other high rated players because scrabble is partly about luck. Even a low rated player can win against a high rated one given the right letters at just the right moment.
Whereas low rated players want to play those with a much higher rating than themselves.


Maybe not a big problem in tournaments but in day to day play, new players will have a hard time finding anyone else to play as lower than 1000 rated players will quit playing leaving only higher than 1000 players left playing the game and they don't want to play a 1000 rated player due to the risks of randomness.


Chess isn't random so it works just fine there.

RealCadde
Автор

Thanks for the explanation! I often see Elo lumped together with Glicko and Glicko2 (the rating systems that Prof. Glickman made), and apparently Glicko is more telling of your actual skill because instead of giving you a number, it gives you a number and a confidence interval. However, how that actually gets calculated, I have no idea, the explanations of that also just throw some formulas at you. Can you make a video for that as well?

LeoWattenberg
Автор

4:03 "You win half the games and you lose half the games"

Draws - Do we look like a joke to you?

RobloxKid
Автор

I forget where I originally learned this, but I always found it far more intuitive than thinking of Elo calculation as a function of a function as presented here: a game is always worth the same number of rating points—winner taking from loser—but in order to compensate the weaker player, the higher-rated opponent gives them a percentage of the difference in their ratings in advance of the game. To use the same example numbers as at 5:23 in the video, the difference in rating is 107. The weaker player gets 4% of this, rounded up, taken from the stronger, so five points change hands; then the winner of the game gets 16 points (this is half the K-factor) from the loser. This means A will have gained a total of 21 from B if they won, or will have given a net of 11 to B if they lost. In the event of a draw, the initial five-point shift is the only one that happens. These results match what one gets from using the full formulae. This "quick-and-dirty" calculation breaks down when the difference in player ratings exceeds 400, obviously, but within that range it matches pretty much exactly—I'm fairly certain it's never off by more than a point. I used this version of the formula to maintain league rankings for several popular board games in my university gaming club. I hope this little blurb helps someone understand this whole thing better, just as it did for me.

TheZotmeister
Автор

Helpful explanation.

3 October 2021
2:31pm NZST

SuperSight
Автор

The ELO system was used for Magic: the Gathering tournaments for a while. While it was active I noticed a "geographical clumping" effect where certain areas hoarded points. This meant that players of the same skill level in different parts of the country has different ELO ratings. I wonder if the same thing happens in Chess?

CannarWilm
Автор

Very nice explained with Gauss bell curve, I appreciate your effort. Thanks.

MamToCos
visit shbcf.ru