Rather than write another convuluted post on poker and probability, I decided to play around a bit with the statistics program R. A post by Harlan Schreiber at the much enjoyed site Hoops Analyst and a recently purchased book on R inspired the effort. I highly recommend R and this post will demonstrate some of the things one can do in R, even one inexperienced in R and statistics such as myself.
Mr. Schreiber explores the thinking that defense wins championships and offense is not as important. He presents a table of Offensive and Defensive Ranks of past champions that shows that they are equally important. The data leads to a simple analysis and conclusion: defense and offense have been equally important for past champions. Mr. Schreiber still manages to make the article interesting by adding depth and insight to a simple table. He can find history and anectdotes in the driest of data.
Statistical analysis can support Mr. Schreiber's conclusion. A paired t-test and sign test are two simple ways to compare data. I copied the data into Excel and then loaded it into R. I added two more columns to the data table. One labelled "Difference," which is Offensive Rank minus Defensive Rank. The second column I labelled "Sign" and it assigned a 1 for values of Difference that were positive and 0 for Differences that were not. To perform a paired t-test the data has to be normally distributed. A Schapiro test for normality (R command>schapiro.test(Difference))on Difference produced the following result:
Shapiro-Wilk normality test
data: Difference W = 0.9682, p-value = 0.5107
data: Difference W = 0.9682, p-value = 0.5107
Typically we look for a p-value <>qqnorm(Difference) >qqline(Difference)) and histogram test (R command>hist(Difference)) confirm this.
The pictures show a roughly normal distribution. A t-test can be used. Simply type t.test(Offensive.Rank,Defensive.Rank,paired=T) and R does the rest of the work:
Paired t-test
data: Offensive.Rank and Defensive.Rank t = 0.1257, df = 28, p-value = 0.9009
data: Offensive.Rank and Defensive.Rank t = 0.1257, df = 28, p-value = 0.9009
alternative hypothesis: true difference in means is not equal to 0 95 percent
confidence interval: -2.637672 2.982499
sample estimates: mean of the differences 0.1724138
With such a high p-value, we fail to reject the null hypothesis. There is no proof that Offensive or Defensive Rank has been statistically more significant for prior champions.
Another possible test is the non-parametric sign test. This test may be preferable to the t-test in this instance since the data we are working with are ranks rather than continuous variables. Like many non-parametric tests, the sign test has fewer necessary conditions and does not require the data to be normally distributed. A simple binomial test can be used. A binomial test works much the same way that the binomial probability function does. A binomial probability function calculates the chances that there will be k successes in n trials. For example, the binomial probability function can tell you the chance of getting 2 heads in 10 trials. To use binomial, the outcome of the test must only have two outcomes. Heads or tails, success or failure, infected or not infected, ale or bad beer and so on.
So how can we apply the principals of binomial probability to our data? That is the purpose of the Sign column explained earlier. All Differences > 0 were labelled as successes and assigned a value of 1. Running the binomial test in R is simple > binom.test(14,27) where 14 is the number of successes and 27 is the total number of trials (29 trials - the 2 trials where the Differences = 0). I typed binom.test(sum(Sign),length(Sign)-2) which will make sense as you familiarize yourself with R. The result:
Exact binomial test
data: sum(Sign) and length(Sign) - 2
data: sum(Sign) and length(Sign) - 2
number of successes = 14, number of trials = 27, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval: 0.3194965 0.7133275
sample estimates:probability of success 0.5185185
So such an event as our data would occur roughly 51.8% of the time. There is no reason to reject the null hypothesis. Once again, the data does not support the claim that defense has been more important to past champions than offense. How many success would we need to get a p-value less than .05? 頑張って R-さん。 The command is > qbinom(.975,27,.5), which returns a value of 19. 19 successes would be needed for there to be statistically significant difference between Offensive and Defensive ranks. For the heck of it here is a box plot and summary of the data.