Saturday, July 26, 2008

Hoops Analyst and R

Rather than write another convuluted post on poker and probability, I decided to play around a bit with the statistics program R. A post by Harlan Schreiber at the much enjoyed site Hoops Analyst and a recently purchased book on R inspired the effort. I highly recommend R and this post will demonstrate some of the things one can do in R, even one inexperienced in R and statistics such as myself.

Mr. Schreiber explores the thinking that defense wins championships and offense is not as important. He presents a table of Offensive and Defensive Ranks of past champions that shows that they are equally important. The data leads to a simple analysis and conclusion: defense and offense have been equally important for past champions. Mr. Schreiber still manages to make the article interesting by adding depth and insight to a simple table. He can find history and anectdotes in the driest of data.

Statistical analysis can support Mr. Schreiber's conclusion. A paired t-test and sign test are two simple ways to compare data. I copied the data into Excel and then loaded it into R. I added two more columns to the data table. One labelled "Difference," which is Offensive Rank minus Defensive Rank. The second column I labelled "Sign" and it assigned a 1 for values of Difference that were positive and 0 for Differences that were not. To perform a paired t-test the data has to be normally distributed. A Schapiro test for normality (R command>schapiro.test(Difference))on Difference produced the following result:

Shapiro-Wilk normality test
data: Difference W = 0.9682, p-value = 0.5107

Typically we look for a p-value <>qqnorm(Difference) >qqline(Difference)) and histogram test (R command>hist(Difference)) confirm this.




The pictures show a roughly normal distribution. A t-test can be used. Simply type t.test(Offensive.Rank,Defensive.Rank,paired=T) and R does the rest of the work:

Paired t-test
data: Offensive.Rank and Defensive.Rank t = 0.1257, df = 28, p-value = 0.9009

alternative hypothesis: true difference in means is not equal to 0 95 percent
confidence interval: -2.637672 2.982499
sample estimates: mean of the differences 0.1724138

With such a high p-value, we fail to reject the null hypothesis. There is no proof that Offensive or Defensive Rank has been statistically more significant for prior champions.
Another possible test is the non-parametric sign test. This test may be preferable to the t-test in this instance since the data we are working with are ranks rather than continuous variables. Like many non-parametric tests, the sign test has fewer necessary conditions and does not require the data to be normally distributed. A simple binomial test can be used. A binomial test works much the same way that the binomial probability function does. A binomial probability function calculates the chances that there will be k successes in n trials. For example, the binomial probability function can tell you the chance of getting 2 heads in 10 trials. To use binomial, the outcome of the test must only have two outcomes. Heads or tails, success or failure, infected or not infected, ale or bad beer and so on.

So how can we apply the principals of binomial probability to our data? That is the purpose of the Sign column explained earlier. All Differences > 0 were labelled as successes and assigned a value of 1. Running the binomial test in R is simple > binom.test(14,27) where 14 is the number of successes and 27 is the total number of trials (29 trials - the 2 trials where the Differences = 0). I typed binom.test(sum(Sign),length(Sign)-2) which will make sense as you familiarize yourself with R. The result:
Exact binomial test
data: sum(Sign) and length(Sign) - 2

number of successes = 14, number of trials = 27, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval: 0.3194965 0.7133275
sample estimates:probability of success 0.5185185
So such an event as our data would occur roughly 51.8% of the time. There is no reason to reject the null hypothesis. Once again, the data does not support the claim that defense has been more important to past champions than offense. How many success would we need to get a p-value less than .05? 頑張って R-さん。 The command is > qbinom(.975,27,.5), which returns a value of 19. 19 successes would be needed for there to be statistically significant difference between Offensive and Defensive ranks. For the heck of it here is a box plot and summary of the data.



Sunday, July 20, 2008

The Basics of Poker

No post last week. The author was a bit worried that someone had give Martin another box of silver, but in fact it was the something quite different. Enough personal stuff, onto the math.

Say one was to go play a game of Texas Hold'em. What should the player know? Blackjack and craps have fairly simple strategies to follow to maximize expected play, or rather to minimize expected losses. For blackjack all someone has to do is remember a simple table. Poker is different. The basic requirement is to know the rules. What hands beat which hands, the order in which people bet, and generally how the game unfolds. Once the basics are learned, the two most important concepts are the Fundamental Theorem of Poker and bluffing. The fundamental theorem dictates that a player play his hand as if he could see everyone's cards and always applied the correct pot odds. Every time a player does this he increases his expected gain. Every time a player fails to do this, he is losing money.

Bluffing adds depth to poker. A player cannot follow the fundamental theorem perfectly. He can only guess what his opponents have. If a player plays his hand based on the fundamental theorem and never bluffs, his opponents will be able to accurately guess what cards he has and put him at a disadvantage. The skill in poker comes from balancing the concepts of the fundamental theorem and bluffing.

Those are the basics. Next it is important to get a sense for how often certain hands occur. Playing repeated hands of poker will give a player a good feel. A player gets two cards before he has to decide whether to play the hand or fold. What types of hands should a player ante up for and how often do those hands occur? Let us say a player likes to play the following types of hands: a pair, cards of the same suit that are adjacent (suited connectors), and two high cards (two cards that come from the set of 10, Jack, Queen, King, and Ace). Those are generally regarded as good hands to ante up on as they can lead to strong hands. Let the sets be defined as A for pairs, B for suited connectors, and C for high cards. The probability of getting one of those hands is the union of those three sets:

P(A U B U C) = P(A) + P(B) + P(C) -[P(AB) + P(AC) + P(BC)] + P(ABC)

Where 'U' indicates a union of sets and 'AB, ABC etc.' indicates an intersection of sets. For unions of sets, the basic rule is to add the sets of odd intersections and subtract the even intersections.

P(A) = probability of getting a pair. The first card can be anything. For a pair to occur the second card had to be one of the 3 remaining cards from the deck of 51. The first card in P(B) can be any card. For any card, there are two remaining cards that are suited connectors. If the first card was the Ace of Spades, the second card has to be the King or 2 of Spades. P(C) can only have a 10, Jack, Queen, King or Ace for its first card and second card. There are 20 such cards in the deck and 19 remaining after the first one has been dealt.
P(A) = (52/52)*(3/51)
P(B) = (52/52)*(2/51)
P(C) = (20/52)*(19/51)

The intersection P(ABC) and P(AB) cannot occur since two cards cannot be both a pair and suited connectors. P(AC) is the set of cards that are pairs of 10, Jack, Queen, King, or Ace. There are 13 different types of cards and P(AC) makes up 5 of those types (10, 10; Jack, Jack etc). P(AC) can be expressed as the portion of A that falls into C. Likewise with P(BC) with the caveat that only 4/13 types of suited connectors of B occur in C (since 10, 9 and Ace, 2 do not occur in C).
P(AC) = P(A)*(5/13)
P(BC) = P(B)*(4/13)

Plugging those into the formula:
P(A U B U C) = 20.6%
Thus a player that plays these types of hands will play roughly one of every five hands. To keep opponents from getting a clear read on what type of hands you prefer it may be advisable to play a junk hand occasionally.

I like playing suited connectors and pairs because if the next five cards improve your hand, a clear advantage can emerge. For example, if you play a suited connector and three of the next five cards are the same suit you get a flush. Unless there is a pair among those five cards, a flush will very likely be the best hand. But given that you played a pair or suited connector, what are the odds that the next five cards will improve your hand? I will attempt this question in the next post.

Sunday, July 6, 2008

The Gauntlet Revisited

An earlier post described how a player in the TV show The Gauntlet III could calculate his odds of winning a Gauntlet and who he should select as an opponent. This post will explore team strategy, in particular why the men of the Veteran team decided to purposely lose team challenges.

If a team loses a challenge, two of its members have a duel in the Gauntlet. The losing player has to leave the show. After an unknown number of Gauntlets, the show has one final challenge in which the two teams compete for $300,000. The winning team divides the $300,000 equally between the remaining members. Going into the team challenges, the participants know whether it is a "guys'" or "girls'" day. On guys' days two men from the losing team duel in the Gauntlet. On girls' days two women compete. On guys' days the women face no punishment (i.e. the Gauntlet) for losing, and the men face no punishment for losing on girls' days. To entice the opposite sex to compet on their respective days, the show offers a prize to the winning team. On girls' days, men of the winning team each get a prize of roughly $500. However, if a team loses then one of its members will leave the show, meaning a larger portion of the $300,000 grand prize will go to those who remain.

First let's take a look at how much each remaining member stands to gain when a teammate loses in the Gauntlet. The left column is the number of people remaining on the team, the middle column is how much each member will receive if the team wins the final challenge, and the right column is how much more a player gets when a teammate leaves the show.

Teams start with 16 people and every time someone leaves, the remaining members can potentially win more prize money. The 'Difference' column shows that the more people leave, the more the rest stand to gain. When the first person leaves, everyone stands to win $1,250 more. When the 10th person leaves, the team members can win over $7,000 more. For example, in the final challenge the Veteran team stood to win roughly $30,000 each and the smaller Rookie team $60,000.

Let's take a look at the normal form representation of a single stage of this game. Let us assume that it is a girls' day and we are looking at the representation from the men's view. Assume that the men think they have a 50% possibility of winning the final challenge, thus they stand to gain $625 if they lose the challenge and a teammate (1250*.5 = 625). The men from both teams have the same strategy set: Try or Shirk. A team that chooses to Try will win 100% of the time against a Shirking team. Or perhaps more accurately, the Shirking team can make sure they lose with 100% certainty. If both teams Try they each have a 50% chance of winning, if both teams Shirk they each have a 50% chance of winning.



Thus Shirk dominates Try, since the chance of $1250 ($625) is greater than the $500 prize for winning the challenge. In the next stage game, the team that lost a member has a chance to win even more than $625, since the figures in the Difference column grow as more people leave the show. Using the binomial theorem one can calculate how much present value one gains by having the chance to lose more teammates. In the first stage matrix, the $625 jumps to over $3,000 in present value. The question is not why the Veteran men decided to Shirk challenges, but why didn't they start Shirking earlier?

The payoff matrix shows that the $500 prize is no deterrent. Are there other deterrents to Shirking? The final challenge determines the grand prize. If teams with more members had an advantage in that challenge, then that would be a deterrent to Shirking. However the format of the final challenge, common knowledge to the players because of previous versions of the show, does not favor large teams. It favors teams that do not have weak links and ones with strong athletes. Many Veteran men purposely Shirked because they thought it would improve their chances of winning the final challenge. The women realized that their probability of winning the final challenge would decrease if they lost strong athletes and were less likely to Shirk. A third deterrent might be an emotional reason. Perhaps pride, a competitive nature or a connection to members of the opposite sex motivated players not to Shirk. That might explain some of the romantic relationships.

The only tool the women have to deter Shirking is threat of reciprocal punishment. "If you Shirk and send one of us to the Gauntlet, we will Shirk and send one of you to the Gauntlet tomorrow." This strategy is undermined since losing too many athletic members hurts everyone on the team and because the remaining women also benefit from losing weaker members. By Shirking and losing strong men, the women hurt themselves. By losing the remaining women, they stand to make more money, and have a better chance of winning the final challenge. The Veteran men chose to Shirk, and the women had little recourse.

If both teams' men had realized the dominance of Shirk, there would have been some interesting repercussions. Both teams' men would be trying to lose on girls' days. To try to deter this, both teams' women might play the 'Mad President' strategy (act irrational-not a stretch for this group) and try to Shirk, the final challenge be damned. The Nash Equilibrium would probably involve an agreement between the men and women on each team with both sides agreeing that a certain amount of Shirking, to lose the weaker male and female competitors, is ideal and to only Shirk strategically in order to maximize the probability of winning the final challenge.