Page **1** of **2**

### RAPM aging curve

Posted: **Mon Jul 15, 2013 11:15 pm**

by **J.E.**

Hey

Here's the analysis regarding influence of age on player performance I originally submitted to SSAC '13 but did not present. It uses matchupdata from '00 to Dec. 2012.

What I had done is treat player age as an additional player on the floor, then compute coefficients for each age with RAPM.

So, essentially, instead of the matchupfiles looking like

LeBron Wade Bosh .. .. | Anthony Smith Chandler .. .. | 1 0

they'd look like

LeBron AGE_28 Wade AGE_31 Bosh AGE_28 .. .. | Anthony AGE_28 Smith AGE_27 Chandler AGE_30 .. .. | 1 0

Survivor bias is a problem here. These numbers don't represent "expected performance", but instead "expected performance IF the coach actually decides to give that player minutes"

The individual lines for offense and defense are jerkier.

It also has to be noted that the y-axis actually represents influence per 200 possessions, not 100.

That is if one, like me, assumes ~192 possessions total in a game.

### Re: RAPM aging curve

Posted: **Tue Jul 16, 2013 7:49 am**

by **AcrossTheCourt**

I'll assume the low value for age 19 is because of the lack of priors, right?

The polynomial fit says peak NBA age is 28, which close to the consensus (at least that I've seen.)

I assumed you used integer ages instead of decimal (incorporating days.) Then ages 19 and older then would all be dummy variables. Did you think about making age a continuous variable? If you want to model a polynomial of that shape, you can use something easy like beta1*age-beta2*age^2+beta3.

Do offense and defense peak in different ages?

### Re: RAPM aging curve

Posted: **Tue Jul 16, 2013 2:29 pm**

by **bbstats**

Jerry,

This is a pretty cool concept. Is everything still done in a "bayesian" updating in this iteration of RAPM? If so, I would guess that this graph could be used to represent year-to-year deltas.

If not, the deltas would be something more like Y-value(year1) minus Y-value(year2).

### Re: RAPM aging curve

Posted: **Tue Jul 16, 2013 8:32 pm**

by **J.E.**

In this specific analysis there are no priors involved except for the standard zero prior everyone gets in standard RAPM.

It's one large regression over multiple years of data, with coefficients for age added in

### Re: RAPM aging curve

Posted: **Thu Aug 15, 2013 2:10 pm**

by **schtevie**

Just wanted to add a few observations (mostly questions) about this very interesting result.

(1) Does it make sense that the average (fitted) performance of players is positive between about ages of 20 and 35? Isn't there some adding-up constraint that makes overall average performance equal zero? The vast majority of possessions are played by that cohort, and shouldn't that be balanced out by the negative performances of the young and old? What am I missing?

(2) Do age-adjustments really matter "much"? I get that including them improves the fit of +/- regressions, but how much of an adjustment does it impart to an average player rating?

Consider the case of Kevin Durant. He is a young player, one whose ratings are diminished by not taking into account his age-related improvement. As I read Jeremias' plot, the average 24 year old was expected to be about 0.3 points per 100 possessions better than the year before. How much might his age-adjusted (x)RAPM improve by incorporating this fact?

Here's my stab at an answer. KD played 42% of his 2012-13 minutes as part of the following line-up: Durant (age 24), Ibaka (23), Perkins (28), Sefolosha (29), and Westbrook (24). And this is a pretty good representation of the age structure of the remaining 58%. And what you get when you take the average of the expected age-related changes of all these players and subtract these from those of the individuals is, well, not much.

KD's expected 24 year old improvement of about 0.3 is shared in part with Perkins and Sefolosha, but he in turn is subsidized to smaller degree by Ibaka (23 year olds having a slightly larger expected improvement than 24 year olds). In the instance, given how I eyeballed the plot, I get that KD's rating, per this line-up, is about 0.06 lower than a non-age-adjusted would be. And this would seem to be a pretty good representative figure for the rest of his minutes.

Now, such cross-subidies will vary in lineups depending upon the age structure. Pity the young player who only plays with teammates significantly over 29 years of age and above (and whose average competition is also over the hill). He would be more screwed. But such results should be very rare, and the upper-bound is still not "large".

Is my intuition incorrect on this point? Is it possible for age-adjusted +/- measures to differ more from their unadjusted counterparts, and if so, why?

(3) Coaches are the real beneficiaries of the failure to age-adjust +/- regressions.

The average age in the NBA is about 27, and according to Jeremias' results (assuming I am interpreting them correctly) such players are expected to improve by about 0.1 per 100 possessions. Perhaps I am not thinking about this correctly, but when you then throw coaches into an age-unadjusted +/- regression, all expected age-related player improvements will be assigned to coaches, implying that on average coaches' ratings are bumped up by about 0.5.

Now, J.E.'s coaches regression is no longer available, but as I recall the results, the average rating of a coach was already slightly negative (about -0.5 or so). This argument implies that the "true" average coaching contribution is lower still. (This average negative result, by the way, is consistent with other research I have seen.)

And I suppose in this context I should make particular note of Gregg Popovich, because I do recall having previously commented on his slightly negative rating in the aforementioned regression. Gregg Popovich, of course, is well known (or should be) for having consistently coached above average age rosters. I am too lazy to check at the moment, but I think they have averaged slightly above 29 years. As such, in line with the argument above, GP might not in fact be a "net negative" coach. Rather, adjusting for his slightly past-prime players, he could be expected to have an approximately zero rating, which would make him a bit above average.

### Re: RAPM aging curve

Posted: **Thu Aug 15, 2013 7:31 pm**

by **J.E.**

schtevie, for the aging curve I artificially adjusted the chart so the data points (somewhat) run from -2 to +2. It's supposed to be interpreted so that a player who is a +X (in xRAPM or whatever) now is projected to perform at

+X+(y_coordinate of player age next season - y_coordinate of player age last season)

(and take the whole thing *0.85 because of regression to the mean)

Now, J.E.'s coaches regression is no longer available, but as I recall the results, the average rating of a coach was already slightly negative (about -0.5 or so). This argument implies that the "true" average coaching contribution is lower still. (This average negative result, by the way, is consistent with other research I have seen.)

I think the coach ratings back then were just not centered correctly, so the average coach rating should probably not have been negative

### Re: RAPM aging curve

Posted: **Sun Feb 09, 2014 1:48 pm**

by **J.E.**

Small update

Re-ran the numbers with more data from 2012-13 and 2013-14

The coefficients for

**Offense** make sense, for the most part, until you reach age 41

I'm happy with the fact there are very few conflicting data points. On the up-slope (18-23) each year has a more positive coefficient than the preceding year, and on the down-slope (31 and after) most years have a more negative coefficient than their preceding year, with the exceptions of coeff(39)>coeff(38)>coeff(37)

After removing the coefficients for 41 and over, polynomial fitting (thanks to

http://www.arachnoid.com/polysolve/) leads to

and

Code: Select all

```
def age_infl_off(age):
return -5.1855886560913811e-001 * pow(age,0)
+ 4.9112390028866172e-002 * pow(age,1)
+ -1.4598588208904030e-003 * pow(age,2)
+ 1.3428060693723941e-005 * pow(age,3)
```

Unfortunately the coefficients for

**Defense** are not as 'pretty'

Goes up almost steadily until age 29, then steadily drops until age 33. After that, the coefficients are all over the place.

I've decided to not use coefficients for ages 37 and over for the polynomial fit

Code: Select all

```
def age_infl_def(age):
return -1.3905924679440346e-001 * pow(age,0)
+ 1.2958074760491843e-002 * pow(age,1)
+ -3.5330169150904782e-004 * pow(age,2)
+ 2.9414942568037581e-006 * pow(age,3)
```

There are two reasons for the inconsistent coefficients at age 37 and above:

- Sample size: There simply aren't many players that play after age 37, let alone age 40. The smaller the number of players of that age group in our sample, the harder it becomes for the regression to estimate a reasonable coefficient.

Example: Suppose we had only one single player that played at age 42 and 43. For one single player it is not entirely unlikely that he, for random reasons, has better +/- numbers (after adjusting for teammates) at age 43 compared to 42. Since the regression has only his performance to go by for 42/43-year-olds, the coefficient for 'Age_43' would be higher than the coefficient for 'Age_42'. If we had more players that had played at 42 and 43, chances are that most of them played a little worse at age 43 and the coefficients would look more reasonable

- Survivor bias: Only those players that play exceptionally well up to a very high age do get some playing time at high age, and are thus in our sample. Players which were more heavily (negatively) affected by age don't remain in the league as long, are thus not in the matchupdata and not in the sample. This skews results

### Re: RAPM aging curve

Posted: **Mon Feb 10, 2014 1:21 am**

by **Mike G**

Hey, this is great. Regarding aging and Offense:

On the up-slope (18-23) each year has a more positive coefficient than the preceding year, and on the down-slope (31 and after) most years have a more negative coefficient than their preceding year, ..

The curve is intuitively appealing, but it looks like a straight-line increase up to age 23, and a straight dropoff by year after 30. Basically a plateau from 23-30.

The equivalent defensive plateau age range looks like 25-32 -- just about 2 years later/older than for offense. Intuitively about right.

A single integer for age in a given season is rather arbitrary. A lot of smoothing could be had by assigning, for example to age 25: the sum of 1/4 of the value of the age 24 group, 1/2 of age 25, and 1/4 of age 26.

Oftentimes a player goes 1/4 or more of a season after a birthday.

### Re: RAPM aging curve

Posted: **Mon Feb 10, 2014 6:08 pm**

by **DSMok1**

J.E. wrote:
- Sample size: There simply aren't many players that play after age 37, let alone age 40. The smaller the number of players of that age group in our sample, the harder it becomes for the regression to estimate a reasonable coefficient.

Example: Suppose we had only one single player that played at age 42 and 43. For one single player it is not entirely unlikely that he, for random reasons, has better +/- numbers (after adjusting for teammates) at age 43 compared to 42. Since the regression has only his performance to go by for 42/43-year-olds, the coefficient for 'Age_43' would be higher than the coefficient for 'Age_42'. If we had more players that had played at 42 and 43, chances are that most of them played a little worse at age 43 and the coefficients would look more reasonable

- Survivor bias: Only those players that play exceptionally well up to a very high age do get some playing time at high age, and are thus in our sample. Players which were more heavily (negatively) affected by age don't remain in the league as long, are thus not in the matchupdata and not in the sample. This skews results

The two issues are related. A player on the down side of his career will only get to play in year Y+1 if he was good (luckily good) in Y. So the observed deltas will be biased large.

There are several possible solutions discussed at length in baseball research, where the randomness effect is far more pronounced.

Another question: are your polynomial fit curves weighted by the number of observations of each delta, or just based on the points shown with no weighting?

Great work once again, J.E.!

### Re: RAPM aging curve

Posted: **Mon Feb 10, 2014 10:05 pm**

by **J.E.**

Mike G wrote:A lot of smoothing could be had by assigning, for example to age 25: the sum of 1/4 of the value of the age 24 group, 1/2 of age 25, and 1/4 of age 26.

Maybe I'm misunderstanding, but doesn't the polynomial fit provide the smoothing we need? If one of the coefficients seems "out of line", like in this case the one for 'age 39' on offense, the fitted curve provides a more reasonable number

DSMok1 wrote:Another question: are your polynomial fit curves weighted by the number of observations of each delta, or just based on the points shown with no weighting?

Not weighted. Good point though. On my to-do list

There are several possible solutions discussed at length in baseball research, where the randomness effect is far more pronounced.

Do you have links to papers or blog-posts that deal with this issue?

### Re: RAPM aging curve

Posted: **Tue Feb 11, 2014 5:13 am**

by **AcrossTheCourt**

I was going to say weighing by number of possessions would help the model fit the points for ages 39 and higher. I've done that a lot to deal with the extreme ranges of datasets.

So you're using an integer age and not decimal? Maybe you can use "exact" decimal age by putting in the polynomial factors (age, age^2, age^3, etc.) in the RAPM model instead of separate coefficients for each year.

The problem with survivor bias is say there are three players: A, B, and C. Player A is a weird Stockton-type who plays well until he's 40. Player B falls off sharply at age 35 and is out of the league by 36. Player B declines every year from age 32 to 35 before retiring. He tries to make a team at age 36 but no one signs him.

We know that age 36 isn't a good age given our set of players, but say Player A has a better season at age 36 than 35 (or hardly declines.) That means the model doesn't see age 36 as a problem at all. But we know from the other two players there's a significant decline at age 36 or else they'd be able to play. Or imagine a new player, D, who plays a lot during age 35 but barely at all at age 36 because he's much worse. Yet because he has a limited amount of possessions, he won't change the model results (I think, given what you're doing.)

So there has to be a way to penalize an age that causes player to retire/drop off in playing time. I'm not sure how to do that, however.... (Helpful, I know.) But this is an area of stats I do want to learn more about.

### Re: RAPM aging curve

Posted: **Tue Feb 11, 2014 6:33 am**

by **nbo2**

Obviously not the same level of research, as I am a sophomore in college and had very limited time and slightly incomplete data, but I did a project on the aging of offensive and defensive xRAPM and got some (obviously flawed but) potentially interesting results (link below).

**Obviously sample bias was the overriding issue** because of the qualifier, and the discussion about getting around it is more important than the results themselves, but I still think there are some useful takeaways, at least for further research.

First off: more evidence that age is a very significant predictor projecting NBA performance from college, as younger players are generally better talents, hence they are drafted earlier.

Interesting but may or may not be real: it looks like the discrepancy in defense between players drafted at younger ages vs. older ages is, on average, much bigger than the discrepancy in offense (without accounting for variance, which will be larger for offense). It also looks like, on average, defense declines much more gradually than offense.

Of course part of this will be explained by the fact that players drafted at younger ages are higher picks and more athletic. And the usual caveats with defensive metrics apply.

Potential improvements include: adding an Injury variable, adding a Cumulative Minutes Played or Cumulative Possessions variable (getting rid of the Experience term), making Possessions a lag variable, estimating the rate of change rather than the actual value, and using "total value over replacement" rather than the rate stat. The real value is going to be found in adjusting for position/player type, as most empirical and anecdotal evidence says offense peaks earlier and defense (particularly big man defense) peaks later and declines more gradually.

Link includes summary and paper. Paper is long and technical as the intended audience was the professor; summary is much more to the point.

https://www.dropbox.com/s/qelbcxgga6wm4 ... BPaper.pdf
Would love to hear what everyone thinks. Thanks.

### Re: RAPM aging curve

Posted: **Tue Feb 11, 2014 1:43 pm**

by **J.E.**

Did weighing by # of observations, and figured out the 'optimal' polynomial degree empirically through Out-Of-Sample-Testing, instead of choosing it arbitrarily.

For defense, polynomial degree of 2 had the lowest out-of-sample-error. For offense it was 3

### Re: RAPM aging curve

Posted: **Tue Feb 11, 2014 5:01 pm**

by **Bobbofitos**

Jeremias, can you post a graph w. O + D?

### Re: RAPM aging curve

Posted: **Tue Feb 11, 2014 6:22 pm**

by **DSMok1**

Excellent work on the new curves!

J.E. wrote:There are several possible solutions discussed at length in baseball research, where the randomness effect is far more pronounced.

Do you have links to papers or blog-posts that deal with this issue?

Well, I'd consider Tangotiger the best public authority on this, but his blog can be hard to search. Some recent articles on this issue, known as survivor or survivorship bias.

http://nhlnumbers.com/2012/12/6/goalie- ... rship-bias
(A real quick look at it)

http://tangotiger.com/index.php/site/co ... witterfeed
http://www.insidethebook.com/ee/index.p ... ias_issue/
http://www.insidethebook.com/ee/index.p ... ing_study/
http://www.insidethebook.com/ee/index.p ... ing_curve/
(Read comments, follow links on all of these)

There are many more threads at Tango's blog covering this issue, but I can't uncover them all.