Sunday, April 26, 2009

Yet Another Way to Look at Free Throw Percentage

John Branch wrote an article in the New York Times noting that in the past 50 years, Free Throw Percentage (FT%) has not improved in the NBA. The Freakonomics blog posted a follow up to this article that suggested that free throw percentage among the best Free Throw (FT) shooters has improved during that period. Unfortunately, I read this post and wasted the best weather of the year fooling around on R rather than going outside and getting sunburnt.

Ashley Smart, the person who supplied the data and analysis the blog post is based on, seems intelligent and probably knows more about stats than I do. The data she has for her analysis is deeply flawed. The biggest mistake she made was getting the data from N.B.A. Encyclopedia. The place for NBA data is Basketball Reference. You can download several spreadsheets of NBA data stretching back to 1946. As several commentators noted, Ms. Smart's other mistake is not controlling for the number of players.

That said there are two major flaws in Ms. Smart's analysis:
  1. The NBA Encyclopedia is arbitrary and inconsisent. The data collectors changed their standards on who qualified as a top 20 FT shooter 14 times between 1950 and 2007. The biggest supposed gain in FT% in Ms. Smart's study occurs between the 1972 and 1973 seasons when qualification standards go from 350 Free Throw Attempts (FTA) to 160 FTA. 
  2. The study does not account for the number of players in the league. Of course the top 20 players in the league now are going to have a better FT% than those in 1950. In 1950 there were 135 players, in 2007 595 players. If I took a group of 100 random people and made them shoot a bunch of Free Throws and then took a group of 500 people and made them shoot a bunch of Free Throws which group do you think would have the best average among the top 20 Free Throw shooters? 
I corrected the first problem by getting my data from Basketball Reference. Anyone who had 100 FTA qualified for my analysis. Let us take a look at the Top Twenty players in FT% versus the number of players in the league.

And a look at the number of players by year:

For what it's worth, there is an R^2 value of .82 for Number of Players vs. Top Twenty FT% and an R^2 value of .76 for Year vs. Top Twenty FT% for fitted linear models. The two models have too much serial correlation to be taken seriously. Year and Number of Players has a .93 correlation value.

A hopefully correct method for examining the best FT shooters is to look at the Top Tenth Percentile of FT% for each year:
There appears to be a slight increase in the FT% of the top tenth percentile but once again there is serial correlation. A Durbin Watson test confirms this; D-W Statistic of .978--anything below 1.38 for this sample size is suspicious--with a p-value of 0 and estimated rho value of .51. I did a basic transformation of the Year (X) and Top Tenth Percentile FT% (Y) variables:
Ytransformed = Yi+1 - rho*Yi 
Xtransformed = Xi+1 - rho*Xi

I then fit a new model of Xtransformed verus Ytransformed. The D-W statistic was 1.89 with a p value of .58 and an estimated rho of .05, thus removing the serial correlation. A summary of this model:

With such a poor fitting model (R^2 of .1056) the data does not really explain anything. If I were going to make an inference I would point out the underlined value of .0004034 (b1). The regression coeffecient for X for the transformed model will remain the same when the model is tranformed back to Y = b1*X + b0. So over the 57 years of data, 57*b1 = 2.3% increase in the Top Tenth Percentile FT%. This is really stretching the model but there may be a slight increase (2.3% over 57 years) among the best FT shooters.