Monday, May 28, 2012

Uncovering Warp-Tainted Dice

I've mentioned before that I've had some pretty horrible luck with my dice. In the past few months I've failed four out of about seven 3+ cover saves, failed five out of six 3+ armor saves immediately after failing five out of six 3+ cover saves, rolled absurd numbers of ones and twos during two out of three games, and, in last week's game, rolled multiple snake eyes, failed to wound three out of five Orks after hitting them with a strength 10 Demolisher Cannon, and failed three out of five artificer armor saves. As a Space Marine player, I spend a lot of points for awesome armor saves and a high ballistic skill, all of which can be completely neutralized by an excessive number of ones and twos. Space Marine armies simply aren't large enough to tolerate that kind of rolling.

Being paranoid doesn't mean
that they aren't out to get you
At first I was willing to believe that I simply perceived the rolls as being skewed due to the human tendency to selectively remember extraordinary events. After a while, I started testing my dice, repeatedly rolling each die in my box of 36 dusty blue and copper Chessex dice (yes, I chose the dice to match my army). I averaged the results for each die respectively and found that they usually produced a result close to the expected 3.5. This seemed to confirm that I was selectively remembering the bad rolls.

After last week's game, I came home convinced that something was awry with many of my dice. After a Google search, I came across several blogs and websites claiming that mass-produced dice such as those made by GW or Chessex can be irregular enough to produce skewed results. This is particularly true of the d20s that various RPGs use. Many of these unbalanced dice will tend to produce huge numbers of opposites; a die that produces too many ones or twos can often produce too many fives and sixes as well. I soon realized that my initial tests were worthless; an unbalanced die that favors mostly ones and sixes will tend to have an average roll of 3.5. As a Space Marine player who usually needs only 3+ and 4+ to do what I want, I don't want dice that will produce an unusual number of fives and sixes (which are overkill) but also give me a ton of ones and twos (which represent failed armor saves and missed shots). Instead of making the expected 66.7% of my armor saves or to hit rolls, extremely skewed dice could give me an armor save and to hit rate of nearly 50%.

I eventually came across this D&D blog, which described using a Pearson's chi-square test to validate whether or not a d6 or a d20 is fair. This test has the advantage of detecting when individual numbers come up too often or not often enough rather than depending on averaging results. The equation for a d6 is:

Σ[(O-E)^2/E] > 11.070

Where "O" is observed frequency and "E" is expected frequency. The value of 11.070 is the chi-square value for a system with five degrees of freedom; i.e., the number of possible results (six for a d6) minus 1. The equation essentially says that if the result of the equation is greater than the chi-square value, then the claim that the die is fair is probably false.

To use this equation you roll a die a certain number of times and record how many times each side shows up. For each of the six possible die results, you subtract the expected frequency that the side should appear from the observed number of times that the side actually showed up. You divide that value by the expected frequency and then add up the resulting numbers from each of the six possible results. If the sum is greater than 11.070, it is statistically probable that the die is not fair. Although 30 rolls (with an expected frequency of 5 hits for each side) is considered to be a minimum, the same D&D blog noted that a 30 roll test is likely to miss slightly unbalanced dice. A better test would use more than 100 rolls per die.

I decided to apply this test to my own dice and started by placing them in a grid and assigning each die a letter and number designation from A1 to F6. I'm crazy, but not crazy enough to roll all 36 of my dice over 100 times, so I started with just 30 rolls per die. I put the data into a spreadsheet that executed the equation and also calculated how often the die gave a result of 2+ (e.g., a Terminator armor save, the usual value needed to kill infantry with anti-tank weapons), of 3+ (e.g., a power armor save, the value needed for a Marine to hit), and of 4+ (e.g., a Scout armor save, the value to wound with most sniper weapons). I used the conditional formatting feature to automatically color code results that I thought were suspicious. After crunching the numbers, I found that none of my dice failed after only 30 rolls, but that several were close. The blog's warning about a 30 roll test being insufficient became apparent when one die failed to roll a single 1 but was still able to pass the test (although just barely).

Those dice that were close to failing or that I found suspicious (18 out of the 36) were given an additional 60 rolls (90 total rolls with an expected frequency of 15 hits per side). The additional results showed that a few of the 18 dice merely had a bad series of rolls earlier and that they were relatively well balanced. However, four dice outright failed the chi-square test. The worst offender would roll a one more than a quarter of the time (27.78%) and would make a 3+ armor save only 53.33% of the time versus the expected 66.67%. Another die would only roll a 4+ 33.33% of the time rather than the expected 50%. Of the four failures, only one rolled unusually high.

I would call this one "The Widowmaker" if Astartes had widows

Some would call this one "lucky". I retired it, too.

After more than 2,000 individual dice rolls, I decided to "retire" the four failed dice as well as five others that gave results greater than 6 (a little more than half of the chi-square value). And yes, I also removed the high rolling die, which had the second highest degree of bias. In short, I removed 1/4 of all my dice from my set based on this test. Fortunately I play Space Marines, so I don't really need all 36 dice anyway. Oddly enough, I can't bring myself to throw out the biased dice. They're segregated in their own little Ziploc bag labeled with dire warnings.

I would recommend the above test for any player who thinks his dice are giving him a raw deal. I was somewhat relieved to find that, although a lot of my bad luck may be a matter of perception, a significant amount of it may be due to biased dice. And if your dice don't turn out to be biased, then at least you have mathematical proof that it's all in your head.

I guess the question is, what can a tabletop gamer do about the problem? There are a variety of dice companies that promise balanced dice, but they charge as much for five dice as Chessex does for 36. Plus, Chessex sells a greater variety of colors and styles that are great for those players who want to match their dice to their army. I've simply decided to order a second box of Chessex dice (blue and white to match Ultramarine veterans) with the intention of performing another set of chi-square tests to weed out the biased ones. The test takes me about as long to finish as two long games of 40K. If that saves me from any more dice-sabotaged games, then it was worth it.

No comments:

Post a Comment