Uncertain Ratings I: Provisional Ratings

US Chess treats all ratings as if they were equally accurate measurements of a player's current playing strength. Because we use ratings to seed people into playing sections to provide a grouping by ability, if some players' ratings are biased or more inaccurate, they could have an unfair advantage in any section with a rating limit. There are many situations that can lead to uncertain ratings, but the first one I want to talk about is players with provisional ratings.

As a reminder, unrated players receive a provisional rating after they have played four rated games, and there is a special ratings formula that is used for the first eight games. After that, the regular ratings formula is used but with a slightly higher "K factor" that allows ratings to change quicker. Once a player has played 25 games, their rating is called "established." When you look up a rating in the Members Service Area (MSA) on US Chess, a player with a provisional rating will have the letter "P" after their rating followed by the number of rated games that they have player. For example, 1235P13 indicates the provisional rating for someone who has played 13 rated games.

Note that the ratings imported into pairing software does not distinguish between provisional and established ratings. That information is not directly available to the TD running the tournament.

To investigate various rating uncertainty problems, I scraped the US Chess MSA to obtain the tournament results from about 30,000 players. In general, these were major tournaments with significant prize funds. In a future post, I'll go into more details about the data analysis. In this post, I will compare the results of players with established and provisional ratings.

The statistic I used to measure how well a person did relative to their rating is what I call their performance excess. This is how high their performance rating was in the tournament compared to their pre-tournament rating, where performance rating is the rating they would have had to not lose or gain any ratings points during the tournament.

First I looked at the average performance excess for established and provisionally rated players.

It's pretty clear that players with provisional ratings are performing about 40 points higher than their rating. Meanwhile, players with established ratings are underperforming by about 2 ratings points.

Another interesting thing to look at is how the number of games a provisionally rated player has played affects their performance excess.

Here again there is a big difference between the two groups. Players with provisional ratings have a much wider spread in their performance than players with established ratings. This is to be expected because a player's provisional rating is based on a much smaller sample of tournament games.

The unexpected result is that the excess doesn't just decrease as the number of games increases. Instead there is a big jump at 10 games and then a decrease. This implies that players get better after they have played a couple of tournaments.

The second thing I looked at was how much variation there was in the performance rating of the two types of ratings. I determined what is called the standard deviation of the performance ratings of the two types of rated players.

Again, the two types of ratings show significant differences. Players with provisional ratings have a 30% larger spread in their performance ratings during a tournament. This is to be expected because they have a rating that is based on a much smaller number of rated games compared to the players with established ratings.

Overall, the combination of a positive bias in the performance rating of provisionally rated players with their higher spread of performance ratings means that they have a much higher chance of winning prizes in a section with a maximum rating (Under XXXX).

On the basis of this analysis, I will be adding 100 points to the ratings of all provisionally rated players for the determination of section eligibility. This will provide a better match in skills between established and provisionally rated players.

If you have any comments, please let me know.