US Chess treats all ratings as if they were equally accurate measurements of a player's current playing strength. Because we use ratings to seed people into playing sections to provide a grouping by ability, if some players' ratings are biased or more inaccurate, they could have an unfair advantage in any section with a rating limit. There are many situations that can lead to uncertain ratings, but the first one I want to talk about is players with provisional ratings.
As a reminder, unrated players receive a provisional rating after they have played four rated games, and there is a special ratings formula that is used for the first eight games. After that, the regular ratings formula is used but with a slightly higher "K factor" that allows ratings to change quicker. Once a player has played 25 games, their rating is called "established." When you look up a rating in the Members Service Area (MSA) on US Chess, a player with a provisional rating will have the letter "P" after their rating followed by the number of rated games that they have player. For example, 1235P13 indicates the provisional rating for someone who has played 13 rated games.
Note that the ratings imported into pairing software does not distinguish between provisional and established ratings. That information is not directly available to the TD running the tournament.
To investigate various rating uncertainty problems, I scraped the US Chess MSA to obtain the tournament results from about 30,000 players. In general, these were major tournaments with significant prize funds. In a future post, I'll go into more details about the data analysis. In this post, I will compare the results of players with established and provisional ratings.
The statistic I used to measure how well a person did relative to their rating is what I call their performance excess. This is how high their performance rating was in the tournament compared to their pre-tournament rating, where performance rating is the rating they would have had to not lose or gain any ratings points during the tournament.
First I looked at the average performance excess for established and provisionally rated players.
It's pretty clear that players with provisional ratings are performing about 40 points higher than their rating. Meanwhile, players with established ratings are underperforming by about 2 ratings points.
Another interesting thing to look at is how the number of games a provisionally rated player has played affects their performance excess.
Here again there is a big difference between the two groups. Players with provisional ratings have a much wider spread in their performance than players with established ratings. This is to be expected because a player's provisional rating is based on a much smaller sample of tournament games.
The unexpected result is that the excess doesn't just decrease as the number of games increases. Instead there is a big jump at 10 games and then a decrease. This implies that players get better after they have played a couple of tournaments.
The second thing I looked at was how much variation there was in the performance rating of the two types of ratings. I determined what is called the standard deviation of the performance ratings of the two types of rated players.
Again, the two types of ratings show significant differences. Players with provisional ratings have a 30% larger spread in their performance ratings during a tournament. This is to be expected because they have a rating that is based on a much smaller number of rated games compared to the players with established ratings.
Overall, the combination of a positive bias in the performance rating of provisionally rated players with their higher spread of performance ratings means that they have a much higher chance of winning prizes in a section with a maximum rating (Under XXXX).
On the basis of this analysis, I will be adding 100 points to the ratings of all provisionally rated players for the determination of section eligibility. This will provide a better match in skills between established and provisionally rated players.
If you have any comments, please let me know.
Michael- I sort of encountered this when I ran my first ES tournament with a colleague. All UNR players. We had no way of ranking the kids except relative to one another. So what I tried to do was estimate their strength in a random round of 8 games, and gave them 50 points for each win. This number was used as a seed when the tournament came. So 4 wins gave you a 200 point seed. This was used as input into a USCF based XLS sheet, and the wins and losses used to modify their score according to the 0-8 game K factor. As it appeared that even 8 wins would not budge the score high enough from that low initial seed, I ran it twice more. The result of rerunning was greater and greater spread. If you do it infinitely they all go to 0 or a zillion. What this exercise told me was- the provisionally ranked kids who were at the top had scores that were not rising fast enough to keep up with their actual ability. Even winning completely only left the top kid at 653, and that was after 2 successive iterations.
I agree with you, there is more spread, and they are underrated. But, this seems to go away over time. The question is, how fast? Does it go away by the time the provisional goes away too?