Sunday, June 29, 2008

Late Goals in Euro 2008

While watching Spain defeat Russia in the semi-finals of the Euro 2008 soccer tournament, a stat flashed on the screen. 23 out of the 79 goals in the tournament were scored after the 75th minute. Disregarding overtime and stoppage time, the 75th to 90th minute is about 1/6 of the game. So one would presume that 1/6 (roughly 17%) of the goals would occur in that interval. But in fact 23/79, roughly 30%, of the goals were scored then. Is this statistically significant?

Let p^ = the sample statistic (23/79) and p(o) = the expected population statistic (1/6). Let the α =.05 be the threshold. If the discovered p-value is < .05 we reject the null hypothesis. The null hypothesis is that p^ = p(o). For this let's use a One-proportion z-test.


At this point in the tournament there had been 29 games, so let n = 29. p^ = p(hat) (I am unsure how to type hats or subscripts), p(o) = p and z-score = z. Plug the values into the formula to get the z-score... the z-score = 1.79, making the p-value < .05 so we reject the null hypothesis.

Does it make sense that such a high percentage of goals would be scored in the final 15 minutes? It brings to mind an analysis I read about the US presidential elections. The author asserts that the trailing candidate should pursue the "pull the goalie strategy" used in hockey. The trailing hockey team is so desperate to score, the coach pulls the goalie and puts a better goal score into the game. By pulling the goalie the team expects to have more chances to score but will certainly be easier to score upon. The team that is trailing wants to utilize a strategy that increases the overall variance at the expense of the optimal strategy. Over the long-term this high variance strategy is not as effective as the optimal strategy, but the trailing team is not playing for the long term. It is playing for the short term. The trailing team will use a high variance strategy, resulting in it and the opposition scoring more goals.

This was evident today in the Euro 2008 finals. Germany pursued the high variance strategy in the final 15 minutes at the expense of the optimal strategy. The German team had more chances to score, but at the same time allowed the Spanish side some great opportunities.

Which Half is the What Now?

The New York Times posted an article on the questionable efficacy of cardiac CT scans. There is a suprising quote from the website of doctor who supports CT scans. "Half of Americans have died of heart attacks and strokes. Which one are you?" This statement is absurd. Rather than dissect it gramatically, I am unsure why the NYTimes would include it, it can be taken apart with logic and algebra. There are about 300 million Americans alive right now. Let x equal the amount of Americans who have died of heart attacks or strokes (I assume the doctor meant 'or' (the union of two sets) not 'and' (the intersection)) and let y equal the amount of Americans who have died of anything else. Making x + y = the total amount of Americans who have died. From that quote we know that:

x / (x+y+3.0*10^6) = 1/2
2x = x + y +3.0*10^6
x - 3.0*10^6 = y

This tells us that x > 3.0*10^6 and x > y, assuming that x + y > 0. The percentage of dead Americans who have died of heart attacks or strokes is equal to x/(x+y). Since x > y, x/(x+y) > 1/2. For example, suppose that 500 million Americans have died. Plugging that number into the formula:

x/(5.0*10^6+3.0*10^6) = 1/2
x = 4.0*10^6
x / (x+y) = % of dead Americans who have died of heart attacks or strokes
4.0*10^6 / 5.0*10^6 = .8 or 80% of dead Americans have died of heart attacks or strokes

I am unaware of what twisted game the doctor is playing. He is trying to emphasize the risk of heart attacks and strokes with the "1/2 comment", when he could quote the x / (x+y) = % of Americans who have died of heart attacks or strokes. The latter is larger than 1/2. A mystery of a man who is trying to shock the public and undersell the problem at the same time.

On a less mathematical note, there are worse things than heart attacks or strokes to have as the leading cause of death in a society. I am reading the book Chances Are... . I am surprised by how short the life expectancy for Londoners was in prior centuries and how resistant to statistical studies the medical community was. Unfortunately, as the NYTimes article details, there are still resistance to evidence based medicine.

Friday, June 20, 2008

When to Foul the Shooter

The conventional basketball wisdom is to foul the shooter rather than give up an easy lay-up. If a shooter has an easy shot, say a shot he makes 95% of the time, it is better to foul him and let him attempt two free throws rather than take the easy shot. As long as he does not make over 95% of his free throws (which almost nobody does), the defensive team will allow less expected points. If the offensive player only makes 50% of his free throws, fouling him saves .9 expected points. 2*.95 - (1*.5+1*.5) = .9 . Fouling the league average shooter, who makes 75.2% of his free throws saves about .4 points. Fouling has another detriment that is not accounted for in that math. Once a team has commited five fouls, the other team goes to the free thow line for every subsequent defensive or loose ball foul, regardless of whether the player was in the act of shooting.

During the 2006-2007 NBA season, teams scored about 1.1 points per possession and made 75.5% of free throws. If the offensive team is in the bonus, a non-shooting foul results in the defense allowing .43 more expected points than they do on an average possession. 2*.752 - 1.1 = .43. It is better to let the possession elapse without commiting a non-shooting foul. The defensive team would rather not be in the bonus. Should you still foul the shooter on an easy shot?

Before estimating the additional penalty that committing a foul detracts, here are some statistics:


Free throw % and the other stats needed to calculate Points Per Possession and Mean Fouls were found here. Points Per Possession was found by taking the league average Points Per Game, and dividing by league average 'Pace' statistic. PPG/Pace = PPP. I calculated Mean Fouls by taking the league average minutes per season, divided by 5 to make the stat minutes per team, and divided it by the league average fouls per team to get team fouls per minute. I multiplied it by 12 (minutes in a quarter) to get fouls per quarter. (Minutes/5)/(Fouls per Team) * 12 = FPQ.

I will take an extreme case to see if there is an instance where it would be better to let the offensive player have any easy shot, rather than foul. Suppose that the offensive player has an easy shot at the start of the quarter. Should he be fouled? The Poisson Distrubution, can be used to show how often a team reaches a certain number of fouls.

In this case λ = 5.51 (average fouls per quarter) and k = fouls in a quarter. On the first line of the table below is the random variable k, ranging from 0 to 14. The percentage below it the Poisson probability of that exact amount of fouls happening in a quarter. For example, the most likely outcome is a team commiting 5 fouls in a quarter, which happens 17.1% of the time. Below that number is 'Points Lost.' Points Lost is the amount of points the defense loses by fouling a player and letting him shoot free throws. As shown above, by fouling the defense allows .43 more points than they would if it did not foul. Multiplying it by the Poisson probability gives the points lost. For example, by fouling 5 times in a quarter, on the 5th foul the team goes to the free throw line and gets .43 more points. This happens in 17.1% of quarters. The penalty increases as the team fouls more often. Fouling 6 times a quarter send the opposing team to the free throw line on two occassions, allowing the offensive team to get .86 more points per quarter. The formula is .43*Poisson%*(k-4) = Points Lost. 4 being used because every time a team commits k > 4 fouls the opposing team goes to the free throw line k-4 times. Total Points Lost is the sum of expected Points Lost for each value of k.



To return to the extreme case, fouling at the beginning of the quarter has the effect of the offensive team needing to draw only 4 more fouls (rather than 5) in order to shoot free throws. Compared to the previous example, this increases the Total Points Lost by the defense. Committing 4 fouls results in Points Lost and the penalty for committing more fouls increases. For this case, Points Lost = .43*Poisson%*(k-3).

If the difference between Total Points Lost (Early Foul) and Total Points Lost (normal case)increases by more than .4 points, the amount of points the defense saves by fouling the league average player on an easy shot, then the defense would be better off not fouling. The table below shows the comparison:



The difference in Total Points Lost is .343 which is < .4. Although the defense's Total Points Lost increases with the early foul it does not increase enough to justify letting a player have an easy shot. Thus the conventional wisdom is reaffirmed for the league average player; foul him rather than let him have an easy shot. If Total Points Lost had been more than .4, the next step would have been to calculate a more accurate Total Points Lost by accounting for offensive fouls (which do not result in free throws) and shooting fouls (which always result in free throws). However that is not needed and the reader will never learn that 9.8% of all fouls committed during the 2006-2007 NBA season were offensive fouls.

While all I did was reaffirm the conventional wisdom, I believe the data is suggests that there may be special cases when the defense should not commit the early foul. If the offensive player makes easy shots somewhat than 95% of the time and makes free throws somewhat more than 75% of the time, it may be worth not commiting an early foul on him. NBA teams with access to more exact stats would be advised to, especially during 7 game playoff series, calculate some exact figures for specific players to see who and when not to foul.

Sunday, June 15, 2008

Bringing Math to the Gauntlet

Real World/Road Rules Challenge: The Gauntlet III is a reality show that I embarrisingly happen to enjoy. Two teams compete in random challenges. The losing team has to participate in the Gauntlet, a one-on-one duel where the winner gets to stay on the show and the loser leaves the show. The winning team selects the first player to enter the duel and the losing team chooses his opponent. Often the losing team would let the first player pick who he would face in the duel. I hope to show how a player can choose an opponent that will give him the highest probability of winning the Gauntlet.

To determine which game the players will play in the Gauntlet, a wheel with six outcomes (five different games and a spin again) is spun. Since all five games have an equal chance of being selected, Player 1 can calculate his odds by estimating his subjective probability of winning in each individual game, summing the probabilities and dividing by five (the number of games). For example, if Player 1 believes that he is evenly match with his opponent in all five games, his chances of winning are 50%, shown by the formula Pv:

Pv (Probability of Victory) = (.5 + .5 + .5 +.5 + .5)/5 = .5 = 50%

The games are varied and certain games favor certain types of players. It is unlikely that a player would estimate himself as evenly matched with his opponent in all five games. Force Field-basically tug of war with pulleys-favors body strength and weight. Ankle Breakers-reverse tug of war with a rope tieing a player to his opponents ankle-and Ram it Home-a shoving match of sorts also favor strength and body weight. Sliders is a puzzle game that does not require athletic ability. Ball Brawl-a race to grab and carry a ball across a goal line-gives the advantage to the faster player. Let's do another example. This time Player 1 can select a weaker, smaller Player 2, who is faster and smarter (one would assume giving Player 2 an advantage in the puzzle game) than Player 1. Player 1 calculates the games in his favor of giving him a 70% chance of winning and only 30% for the games that favor Player 2.

Pv = (.7 + .7 + .7 +.3 + .3)/5 = .54 = 54%

Another wrinkle. Ball Brawl is a repeated stage game. The winner is the first who scores 4 points. Repeated stage games, compared to a single elimination games, favor the team with a higher probability of winning. Much like the 7 game series of the NBA playoffs favor the better team more than the NCAA college basketball tournament does. If Player 1 believes that his chances of scoring in each stage game of Ball Brawl is 30%, his overall odds of winning the game drop to roughly 18%. The logic being it is easier for Player 1 to convert one 30% chance than it is to convert multiple 30% chances. The actual math can be calculated using the binomial theorem:

The winning player needs to score 4 points. There are five stage games in which to score points. In the first three stages, grabbing a ball and returning it over the goal line is worth 1 point. In the last two stages, a successful score is worth two points. The winning player needs to score 2 or 3 points in the first three stages and 2 points in the final two stages or to score 4 points in the final two stages. Say the probability of scoring in a stage game is .3. The function b(x) can be created to calculate the chance of winning ball brawl. b(.3) =


= .181 = 18% =

Or in Excel:

=BINOMDIST(2,2,.3,FALSE)+BINOMDIST(1,2,.3,FALSE)*(BINOMDIST(3,3,.3,FALSE)+BINOMDIST(2,3,.3,FALSE))

BINOMDIST(s, n, p, false) where s = number of successes, n = number of trials, p = probability of success for a trial, false = not cumulative

Formally, Pv = (p1 + p2 + p3 + p4 + b(p5)) / 5

where p1 = estimated probability of winning game 1, p2 = game 2, ..., p5 = prob of winning a stage in ball brawl

So what should Player 1 look for in an opponent? Since three games emphasize strength and body weight, Player 1's first criteria is to choose a weaker opponent. After that, the repeated stages of Ball Brawl, and the advantage it gives to the faster player, dictate choosing a slower opponent. The last criteria to evaluate is your potential opponent's intelligence. Big players will pick on small players and small players will choose weaker, slower, and or less intelligent small players.

More formally, the dominant strategy is to pick a person i ∈ S (set of of all players) for Player 2 such that Pv(i) ≥ Pv(j) for all players j ∈ S.

While I would be surprised if anybody on the Gauntlet reasoned his or her opponent selection out to this degree, it does explain why a player like Eric survived to the end of the competition. Eric was not suited for the final competition, but his sizable body weight and strength advantage made him an opponent no one wanted to face in the Gauntlet.