Small update
Re-ran the numbers with more data from 2012-13 and 2013-14
The coefficients for
Offense make sense, for the most part, until you reach age 41
I'm happy with the fact there are very few conflicting data points. On the up-slope (18-23) each year has a more positive coefficient than the preceding year, and on the down-slope (31 and after) most years have a more negative coefficient than their preceding year, with the exceptions of coeff(39)>coeff(38)>coeff(37)
After removing the coefficients for 41 and over, polynomial fitting (thanks to
http://www.arachnoid.com/polysolve/) leads to
and
Code:
def age_infl_off(age):
return -5.1855886560913811e-001 * pow(age,0)
+ 4.9112390028866172e-002 * pow(age,1)
+ -1.4598588208904030e-003 * pow(age,2)
+ 1.3428060693723941e-005 * pow(age,3)
Unfortunately the coefficients for
Defense are not as 'pretty'
Goes up almost steadily until age 29, then steadily drops until age 33. After that, the coefficients are all over the place.
I've decided to not use coefficients for ages 37 and over for the polynomial fit
Code:
def age_infl_def(age):
return -1.3905924679440346e-001 * pow(age,0)
+ 1.2958074760491843e-002 * pow(age,1)
+ -3.5330169150904782e-004 * pow(age,2)
+ 2.9414942568037581e-006 * pow(age,3)
There are two reasons for the inconsistent coefficients at age 37 and above:
- Sample size: There simply aren't many players that play after age 37, let alone age 40. The smaller the number of players of that age group in our sample, the harder it becomes for the regression to estimate a reasonable coefficient.
Example: Suppose we had only one single player that played at age 42 and 43. For one single player it is not entirely unlikely that he, for random reasons, has better +/- numbers (after adjusting for teammates) at age 43 compared to 42. Since the regression has only his performance to go by for 42/43-year-olds, the coefficient for 'Age_43' would be higher than the coefficient for 'Age_42'. If we had more players that had played at 42 and 43, chances are that most of them played a little worse at age 43 and the coefficients would look more reasonable
- Survivor bias: Only those players that play exceptionally well up to a very high age do get some playing time at high age, and are thus in our sample. Players which were more heavily (negatively) affected by age don't remain in the league as long, are thus not in the matchupdata and not in the sample. This skews results