Costaguanan: sports

Showing posts with label sports. Show all posts

Friday, June 10, 2011

NBA Amnesty Clause: Worst Contracts

If the NBA Amnesty Clause is to be re-enacted, the NBA owners and players' union have to agree to it. I believe the players would largely be in favor of it. Eliminated contracts still get paid in full and teams would then have more money to spend; either from having more salary cap room or from not having to pay luxury tax. However there are many NBA teams that would not benefit from the Amnesty Clause. The following 10 teams hardly stand to benefit from the Amnesty Clause: Boston Celtics, New York Knicks, Toronto Raptors, Indiana Pacers, Chicago Bulls, Miami Heat, Houston Rockets, Memphis Grizzlies, Oklahoma City, Sacramento Kings. I doubt the owners of those 10 teams would like to see the Amnesty Clause enacted. If those teams agree to the Amnesty Clause they would be agreeing to give their opponents a large competitive advantage. Also, if the luxury tax remains those teams receiving luxury tax money would stand to earn even less money if the teams over the cap could cut some of their luxury tax causing contracts.

*Clever Segway*. Here are the top 5 best uses of the Amnesty Clause:

5. Travis Outlaw, New Jersey Nets. 3 years and $21m remaining on his contract. The dollar amount is small compared to some of the other potential targets, but Outlaw has been awful these past two seasons (under 1 WS each season, 8.8 PER last year).

4. Rashard Lewis, Washington Wizards. 2 years and $46m remaining on his contract. Yes, he only has 2 years remaining on his deal but $46m for his lack of production on a rebuilding team is a ridiculous waste of cap space.

3. Ben Gordon, Detroit Pistons. 3 years and $37m remaining on his contract. I am not a huge fan of volume scorers and neither is PER or WS. If Detroit cuts Gordon's deal and trades Richard Hamilton, then the Pistons could get a great start on rebuilding.

2. Brandon Roy, Portland Trailblazers. 3 years and $49m remaining on his contract. Not Roy's fault as his pre-knee injury play was deserving of this contract. After multiple surgeries and the scary rumors about the condition of his knees, the Trailblazers could use the Amnesty Clause to move on.

1. Gilbert Arenas, Orlando Magic. 3 years and $62m remaining on his contract. Arenas has been awful since his knee injuries and arrest on gun charges. If the Magic jettison him and move another deal in a trade, they could have some cap space.

Honorable Mentions: Brandon Heywood, Dallas Mavericks (4 years, $35m), Richard Jefferson, San Antonio Spurs ( 3 years, $30m). Cutting either deal could give these contending teams a chance to reload for another title run.

Sunday, June 5, 2011

NBA Amnesty Clause: Southeast Division

Next up for discussion, the contracts of the Southeast Division. See my first post on the Amnesty Clause for a little background.

Some Numbers from the 2008 Olympics

During the opening ceremonies, the US broadcast displayed a graphic with the size of the delegation and population of the country. I wondered what the correlation was between population and the number of athletes competing and what other variables might influence the delegation size. In addition to population GDP, climate, and some metric measuring civil rights for women might also explain delegation size. When I went to research this I had trouble finding the country and delegate data. What I did find was a list of all the athletes online. I turned the data into an excel file and played around a little with the data. The file can be downloaded here if anyone wants it. It is a .csv, which makes it easy to inport and analyze with R.

I created some pivot tables in Excel. It is possible to do similar things in R with the tapply() function but Excel makes pivot tables so easy to create and alter I did not bother. I uploaded them to a Google spreadsheet and embedded a few of them at the bottom of this post. The rest of the data I mainly found using R. There are 204 countries competing in this Olympics. The largest delegation belongs to the United States, with 618 athletes. 10 countries have one athlete competing: Arba, Belize, Burundi, Central Africa Republic, Dominica, Gabon, Niger, and Nauru (which was featured in a surreal This American Life episode). The mean delegation size is 49.24 athletes. Surprisingly, the median delegation size was only 9. Roughly half the countries send 9 or fewer athletes. Random fact: Only 27 of the 204 delegations have more women competing than men. Which two countries have the largest female-positive (ie more women than men) delegation?

Another column of data on the offical site listed the disciplines (sports) which athletes compete in. There are 38 discipline classifications. The most competed in discipline is Athletics (Track & Field I would guess) with 1943 competitors. Cycling BMX is the smallest event with only 24 competitors. The median and mean are 182.5 (between Baseball and Table Tennis) and 264 respectively. Random fact: there are five sports that are specific to only one gender. Which ones are they?

A little manipulation with R turned the Date of Birth data into Year of Birth and then into an 'age estimate' where I took 2008 and subtracted Year of Birth to get current age. The oldest athlete is Hoketsu Hiroshi, a 67 year-old man representing Japan in Equestrian. The youngest competitor is 12 year-old swimmer Antoinette Joyce Guedia Mouafo from Cameroon. The Median and Mean ages estimates are 26 and 26.37 years. The Random Fact was going to be the average oldest and youngest delegation, but Excel started acting up and I was a bit tired to do more R (date modification can be tricky). There is plenty to explore in this data. Let me know if you find anything neat in the above file and don't be afraid to add more columns of data.

Answers: Norway and Sweden; baseball, softball, boxing, synchronised swimming, rhythmic gymnastics

Saturday, July 26, 2008

Hoops Analyst and R

Rather than write another convuluted post on poker and probability, I decided to play around a bit with the statistics program R. A post by Harlan Schreiber at the much enjoyed site Hoops Analyst and a recently purchased book on R inspired the effort. I highly recommend R and this post will demonstrate some of the things one can do in R, even one inexperienced in R and statistics such as myself.

Mr. Schreiber explores the thinking that defense wins championships and offense is not as important. He presents a table of Offensive and Defensive Ranks of past champions that shows that they are equally important. The data leads to a simple analysis and conclusion: defense and offense have been equally important for past champions. Mr. Schreiber still manages to make the article interesting by adding depth and insight to a simple table. He can find history and anectdotes in the driest of data.

Statistical analysis can support Mr. Schreiber's conclusion. A paired t-test and sign test are two simple ways to compare data. I copied the data into Excel and then loaded it into R. I added two more columns to the data table. One labelled "Difference," which is Offensive Rank minus Defensive Rank. The second column I labelled "Sign" and it assigned a 1 for values of Difference that were positive and 0 for Differences that were not. To perform a paired t-test the data has to be normally distributed. A Schapiro test for normality (R command>schapiro.test(Difference))on Difference produced the following result:

Shapiro-Wilk normality test
data: Difference W = 0.9682, p-value = 0.5107

Typically we look for a p-value <>qqnorm(Difference) >qqline(Difference)) and histogram test (R command>hist(Difference)) confirm this.

The pictures show a roughly normal distribution. A t-test can be used. Simply type t.test(Offensive.Rank,Defensive.Rank,paired=T) and R does the rest of the work:

Paired t-test
data: Offensive.Rank and Defensive.Rank t = 0.1257, df = 28, p-value = 0.9009

alternative hypothesis: true difference in means is not equal to 0 95 percent

confidence interval: -2.637672 2.982499

sample estimates: mean of the differences 0.1724138

With such a high p-value, we fail to reject the null hypothesis. There is no proof that Offensive or Defensive Rank has been statistically more significant for prior champions.

Another possible test is the non-parametric sign test. This test may be preferable to the t-test in this instance since the data we are working with are ranks rather than continuous variables. Like many non-parametric tests, the sign test has fewer necessary conditions and does not require the data to be normally distributed. A simple binomial test can be used. A binomial test works much the same way that the binomial probability function does. A binomial probability function calculates the chances that there will be k successes in n trials. For example, the binomial probability function can tell you the chance of getting 2 heads in 10 trials. To use binomial, the outcome of the test must only have two outcomes. Heads or tails, success or failure, infected or not infected, ale or bad beer and so on.

So how can we apply the principals of binomial probability to our data? That is the purpose of the Sign column explained earlier. All Differences > 0 were labelled as successes and assigned a value of 1. Running the binomial test in R is simple > binom.test(14,27) where 14 is the number of successes and 27 is the total number of trials (29 trials - the 2 trials where the Differences = 0). I typed binom.test(sum(Sign),length(Sign)-2) which will make sense as you familiarize yourself with R. The result:

Exact binomial test
data: sum(Sign) and length(Sign) - 2

number of successes = 14, number of trials = 27, p-value = 1

alternative hypothesis: true probability of success is not equal to 0.5

95 percent confidence interval: 0.3194965 0.7133275

sample estimates:probability of success 0.5185185

So such an event as our data would occur roughly 51.8% of the time. There is no reason to reject the null hypothesis. Once again, the data does not support the claim that defense has been more important to past champions than offense. How many success would we need to get a p-value less than .05? 頑張って R-さん。 The command is > qbinom(.975,27,.5), which returns a value of 19. 19 successes would be needed for there to be statistically significant difference between Offensive and Defensive ranks. For the heck of it here is a box plot and summary of the data.

Sunday, June 29, 2008

Late Goals in Euro 2008

While watching Spain defeat Russia in the semi-finals of the Euro 2008 soccer tournament, a stat flashed on the screen. 23 out of the 79 goals in the tournament were scored after the 75th minute. Disregarding overtime and stoppage time, the 75th to 90th minute is about 1/6 of the game. So one would presume that 1/6 (roughly 17%) of the goals would occur in that interval. But in fact 23/79, roughly 30%, of the goals were scored then. Is this statistically significant?

Let p^ = the sample statistic (23/79) and p(o) = the expected population statistic (1/6). Let the α =.05 be the threshold. If the discovered p-value is < .05 we reject the null hypothesis. The null hypothesis is that p^ = p(o). For this let's use a One-proportion z-test.

At this point in the tournament there had been 29 games, so let n = 29. p^ = p(hat) (I am unsure how to type hats or subscripts), p(o) = p and z-score = z. Plug the values into the formula to get the z-score... the z-score = 1.79, making the p-value < .05 so we reject the null hypothesis.

Does it make sense that such a high percentage of goals would be scored in the final 15 minutes? It brings to mind an analysis I read about the US presidential elections. The author asserts that the trailing candidate should pursue the "pull the goalie strategy" used in hockey. The trailing hockey team is so desperate to score, the coach pulls the goalie and puts a better goal score into the game. By pulling the goalie the team expects to have more chances to score but will certainly be easier to score upon. The team that is trailing wants to utilize a strategy that increases the overall variance at the expense of the optimal strategy. Over the long-term this high variance strategy is not as effective as the optimal strategy, but the trailing team is not playing for the long term. It is playing for the short term. The trailing team will use a high variance strategy, resulting in it and the opposition scoring more goals.

This was evident today in the Euro 2008 finals. Germany pursued the high variance strategy in the final 15 minutes at the expense of the optimal strategy. The German team had more chances to score, but at the same time allowed the Spanish side some great opportunities.

Friday, June 20, 2008

When to Foul the Shooter

The conventional basketball wisdom is to foul the shooter rather than give up an easy lay-up. If a shooter has an easy shot, say a shot he makes 95% of the time, it is better to foul him and let him attempt two free throws rather than take the easy shot. As long as he does not make over 95% of his free throws (which almost nobody does), the defensive team will allow less expected points. If the offensive player only makes 50% of his free throws, fouling him saves .9 expected points. 2*.95 - (1*.5+1*.5) = .9 . Fouling the league average shooter, who makes 75.2% of his free throws saves about .4 points. Fouling has another detriment that is not accounted for in that math. Once a team has commited five fouls, the other team goes to the free thow line for every subsequent defensive or loose ball foul, regardless of whether the player was in the act of shooting.

During the 2006-2007 NBA season, teams scored about 1.1 points per possession and made 75.5% of free throws. If the offensive team is in the bonus, a non-shooting foul results in the defense allowing .43 more expected points than they do on an average possession. 2*.752 - 1.1 = .43. It is better to let the possession elapse without commiting a non-shooting foul. The defensive team would rather not be in the bonus. Should you still foul the shooter on an easy shot?

Before estimating the additional penalty that committing a foul detracts, here are some statistics:

Free throw % and the other stats needed to calculate Points Per Possession and Mean Fouls were found here. Points Per Possession was found by taking the league average Points Per Game, and dividing by league average 'Pace' statistic. PPG/Pace = PPP. I calculated Mean Fouls by taking the league average minutes per season, divided by 5 to make the stat minutes per team, and divided it by the league average fouls per team to get team fouls per minute. I multiplied it by 12 (minutes in a quarter) to get fouls per quarter. (Minutes/5)/(Fouls per Team) * 12 = FPQ.

I will take an extreme case to see if there is an instance where it would be better to let the offensive player have any easy shot, rather than foul. Suppose that the offensive player has an easy shot at the start of the quarter. Should he be fouled? The Poisson Distrubution, can be used to show how often a team reaches a certain number of fouls.

In this case λ = 5.51 (average fouls per quarter) and k = fouls in a quarter. On the first line of the table below is the random variable k, ranging from 0 to 14. The percentage below it the Poisson probability of that exact amount of fouls happening in a quarter. For example, the most likely outcome is a team commiting 5 fouls in a quarter, which happens 17.1% of the time. Below that number is 'Points Lost.' Points Lost is the amount of points the defense loses by fouling a player and letting him shoot free throws. As shown above, by fouling the defense allows .43 more points than they would if it did not foul. Multiplying it by the Poisson probability gives the points lost. For example, by fouling 5 times in a quarter, on the 5th foul the team goes to the free throw line and gets .43 more points. This happens in 17.1% of quarters. The penalty increases as the team fouls more often. Fouling 6 times a quarter send the opposing team to the free throw line on two occassions, allowing the offensive team to get .86 more points per quarter. The formula is .43*Poisson%*(k-4) = Points Lost. 4 being used because every time a team commits k > 4 fouls the opposing team goes to the free throw line k-4 times. Total Points Lost is the sum of expected Points Lost for each value of k.

To return to the extreme case, fouling at the beginning of the quarter has the effect of the offensive team needing to draw only 4 more fouls (rather than 5) in order to shoot free throws. Compared to the previous example, this increases the Total Points Lost by the defense. Committing 4 fouls results in Points Lost and the penalty for committing more fouls increases. For this case, Points Lost = .43*Poisson%*(k-3).

If the difference between Total Points Lost (Early Foul) and Total Points Lost (normal case)increases by more than .4 points, the amount of points the defense saves by fouling the league average player on an easy shot, then the defense would be better off not fouling. The table below shows the comparison:

The difference in Total Points Lost is .343 which is < .4. Although the defense's Total Points Lost increases with the early foul it does not increase enough to justify letting a player have an easy shot. Thus the conventional wisdom is reaffirmed for the league average player; foul him rather than let him have an easy shot. If Total Points Lost had been more than .4, the next step would have been to calculate a more accurate Total Points Lost by accounting for offensive fouls (which do not result in free throws) and shooting fouls (which always result in free throws). However that is not needed and the reader will never learn that 9.8% of all fouls committed during the 2006-2007 NBA season were offensive fouls.

While all I did was reaffirm the conventional wisdom, I believe the data is suggests that there may be special cases when the defense should not commit the early foul. If the offensive player makes easy shots somewhat than 95% of the time and makes free throws somewhat more than 75% of the time, it may be worth not commiting an early foul on him. NBA teams with access to more exact stats would be advised to, especially during 7 game playoff series, calculate some exact figures for specific players to see who and when not to foul.

Costaguanan

Friday, June 10, 2011

NBA Amnesty Clause: Worst Contracts

Sunday, June 5, 2011

NBA Amnesty Clause: Southeast Division

Sunday, August 10, 2008

Some Numbers from the 2008 Olympics

Saturday, July 26, 2008

Hoops Analyst and R

Sunday, June 29, 2008

Late Goals in Euro 2008

Friday, June 20, 2008

When to Foul the Shooter

Pages

Dota 2 Links

Blog Archive

martin.decoud@gmail.com