24 February 2010

Calculating Collusion

Continuing with the series on Soviet collusion in the 1950s and 1960s (see the previous post Solutions to Collusion), in 2006 Charles C. Moul and John V. Nye, both university professors in economics, published a study of the subject: Did the Soviets Collude? A Statistical Analysis of Championship Chess 1940-64.
Abstract: We expand the set of outcomes considered by the tournament literature to include draws and use games from post-war chess tournaments to see whether strategic behavior is important in such scenarios. In particular, we examine whether players from the former Soviet Union acted as a cartel in international tournaments - intentionally drawing against one another in order to focus effort on non-Soviet opponents - to maximize the chance of some Soviet winning. Using data from international qualifying tournaments as well as USSR national tournaments, we estimate models to test for collusion. Our results are consistent with Soviet draw-collusion and inconsistent with Soviet competition. Simulations of the period's five premier international competitions (the FIDE Candidates tournaments) suggest that the observed Soviet sweep was a 75%-probability event under collusion but only a 25%-probability event had the Soviet players not colluded.

Of the five Candidate Tournaments in the study -- Budapest 1950 through Curacao 1962 -- the Curacao event has always received the most attention, largely due to Fischer's public accusation of Soviet cheating. It was, however, the 1953 Zurich Candidates Tournament where collusion, if it occurred, caused the most damage to the chances of the non-Soviet players, Reshevsky in particular. Moul and Nye's findings for the event are summarized in the following table.

Focusing on the line for Reshevsky, the authors wrote, 'Our calculations indicate that he had a 27% chance of winning a fair tournament. With collusion, his chances fell to 8%.' As convincing as these numbers might be, the result is flawed, relying, as it did, on the Sonas Chessmetric historical ratings. On several occasions in the past I've taken issue with Sonas's results when they failed the test of common sense. This is another case.

The last column in the table ('No cartel: % win') is based on the 'Rating' column, as calculated by Sonas. The top three ratings are Reshevsky (2780.99), Smyslov (2764.92), and Najdorf (2753.04), which is the first red flag (the six digit accuracy is also suspicious, but I won't dwell on it). The authors infer from these ratings that 'Retroactive grading has shown that Reshevsky was the favorite going into the 1953 Candidates' tournament.' While he was certainly one of the favorites, I can't imagine that many chess historians, after examination of the games between the two players (Vasily Smyslov vs. Samuel Reshevsky, Chessgames.com), where the post-WWII results give a +4-0=9 advantage to Smyslov, would argue that Reshevsky was stronger than Smyslov in 1953.

The second red flag concerns four players bunched within a range of three rating points -- Bronstein (2723.87), Boleslavsky (2722.33), Stahlberg (2721.93), and Keres (2721.02) -- implying that Stahlberg was the equal of Bronstein, Boleslavsky, and Keres. A look at the historical record shows that Bronstein drew a title match with Botvinnik in 1951, that Boleslavsky drew a playoff match with Bronstein after the 1950 Candidates, losing only in tiebreak, and that Keres finished tied with Reshevsky in the 1948 title tournament. Stahlberg finished behind Bronstein and Boleslavsky in the 1948 Interzonal (Keres was exempt) and behind all three in the 1950 Candidates; had a lifetime negative score against each of the three; and finished 15th and dead last in the 1953 Candidates, 3.5 points behind second-to-last Euwe (Bronstein and Keres finished tied with Reshevsky for 2nd-4th). Stahlberg was an excellent player, one of the West's best at that time ('among the world's best ten for a few years around 1950', according to Hooper & Whyld), but he was not at the same level as the three Soviets.

Why are the Sonas calculations so misleading? My theory is that they fail to account for the Soviet era social barrier ('Iron Curtain', anyone?) between Soviet players and Western players. Soviet players rarely played in Western events and Western players were even rarer participants in Soviet events. The two groups played in different, almost separate chess universes. A comparison of their performances requires a calibration of the separate calculations, using the few events where they actually met. It is as though the Soviets were measured with a meterstick, the Westerners measured with a yardstick, and no one bothered to check that the meterstick and the yardstick were the same length. I also suspect that Sonas overlooked many Soviet events. Moul and Nye wrote,

[The Sonas] rating does require a minimum number of observed games to construct. Games without Sonas chess-ratings for both players are dropped from the sample. While this leads to the omission of a few Interzonal games, the vast majority of dropped games are from URS championships. These omissions will presumably drive up the average observed skill of Soviet players in URS championships, and thereby make our comparison to FIDE events even more compelling.

I have a problem with that last sentence. Because the 'dropped games' would be predominantly wins by the better players, and the kept games predominantly draws between players of equivalent ability, I would presume exactly the opposite. Near the start of their paper, the authors wrote,

For the purposes of econometric analysis, chess has numerous advantages which are not common in other sports. [...] Most important of all is that there exists a rating system which is a precise and accurate reflection of the performances of players and which is an excellent indicator of the relative strengths of players.

This prerequisite was not delivered by the Sonas system. There might have been collusion among the Soviet players, but it is not shown by this study.

No comments: