A Review of Adjusted Plus/Minus and Stabilization

DSMok1 · Post by **DSMok1** » Fri May 20, 2011 3:51 pm

I have posted a huge article reviewing the current state-of-the-art in Adjusted Plus/Minus on my blog.

I have endeavored to discuss, in plain English, the origin and method of Adjusted Plus/Minus, and all of the currently used stabilization techniques (such as RAPM). I have also discussed some areas for future research.

J.E. · Post by **J.E.** » Fri May 20, 2011 5:29 pm

I think there are at least two reasonable ways to go for choosing better priors (than 0):
1. SPM. I think incorporating offensive SPM as an offensive prior will help some. Defensive SPM will probably not help as much, if at all.
2. RAPM of prior years. One thing I will try at some point is compute RAPM for 05/06 with either zero as prior or some MPG-dependent prior, use those ratings as priors to compute ratings for 06/07 and so on. Rookies are obviously a special case, they probably need their own kind of prior depending on age and draft position.

Older data can be helpful. Hopefully I can find the time to convert the 2002-2005 play by play into matchup data soon

I don't really see how player pairs could work with ridge regression in a reasonable way but I'll look at a slightly different variant in the near future

DSMok1 · Post by **DSMok1** » Fri May 20, 2011 6:06 pm

I think a simple prior just based on minutes played and team quality may be the best, since it would be unbiased.

I think we could make a Bayesian APM with player pairs work. Just list all pairs currently on the court as the variables. Sure, it'd be a bunch of unknowns, but there are ways to deal with that.

Can you weight prior years less? Like, weight all possessions from this year 5, last year 3, before that 2, and before that 1? That would be really useful, if you could adjust the weights to maximize accuracy.

Further: have you looked into pre-processing the matchups with aging curves? That could be difficult, but would certainly yield better OOS results in the current year when using previous years data.

J.E. · Post by **J.E.** » Fri May 20, 2011 8:01 pm

DSMok1 wrote: I think we could make a Bayesian APM with player pairs work. Just list all pairs currently on the court as the variables. Sure, it'd be a bunch of unknowns, but there are ways to deal with that.

My original idea was to just build one single pair, treat the pair as one player and re-run the regression. Then look if the pair rating significantly differs from the sum of individual ratings. You want to list all pairs as variables? Am I correct that this way we would have 10 pairs per observation for offense alone? I'll have to think about that

Can you weight prior years less? Like, weight all possessions from this year 5, last year 3, before that 2, and before that 1? That would be really useful, if you could adjust the weights to maximize accuracy.

This is certainly something I would have already done if I have had more time. I think Joe Sill found that optimal weighing of the prior season should be 0.5. I am not sure if he tried using more than two years but I think he said that the inclusion of the prior season didn't help that much over using just one season, so the benefit we get from more seasons and figuring out better weighing schemes might be minimal

Further: have you looked into pre-processing the matchups with aging curves? That could be difficult, but would certainly yield better OOS results in the current year when using previous years data.

I tried to put player age into the regression the same way I put coaches in, as a sixth man if you want. Initial results weren't that great but I would be lying if I said I spent much time on it. Time is always an issue

Crow · Post by **Crow** » Fri May 20, 2011 9:35 pm

A model can have just the 5 players on each side or all pairs or it could perhaps have a combination. At least I put it out there as a suggestion for consideration.

Most pairs would have small or very small sample size and might not get used that much because of it. (In many cases there probably would be no meaningful interest in pair data of any kind, at all.) A quick check for a few teams in the playoffs showed about 20 pairs per team that played 10+ minutes per game. One possibility might be to focus mostly on those pairs and basically ignore the others with less minutes and less ability to estimate accurately. Consider them neutral or "unknowable". The highest used pairs seem more manageable to try to understand to a degree and more realistic to try to incorporate into a summary and explain to others than dealing with all of them.

With player ratings and the big minute pair ratings, one might get of a better sense of who is good or bad in general and who is also notably good or bad with particular guys and who is most and least pair-context dependent than with player average roll-up ratings alone. That could inform considerations of player movement between teams or even usage within a rotation. If a guy seems dependent on being with a certain guy to get a good rating, does the team have a similar kind of guy to foster that again or not?

gfarkas · Post by **gfarkas** » Fri May 20, 2011 10:17 pm

DSMok1 wrote:I think a simple prior just based on minutes played and team quality may be the best, since it would be unbiased.

Unbiased with respect to what, specifically? In other words, what type of bias are you attempting to avoid, and how does using MP and SOT accomplish that?

DSMok1 · Post by **DSMok1** » Sat May 21, 2011 2:10 am

Unbiased by box-score statistics. Minutes should, theoretically, be allotted by quality of player.

Ryan · Post by **Ryan** » Sat May 21, 2011 3:18 am

Bayesian models of efficiency are so 2009 (or earlier?): A Basic Hierarchical Model of Efficiency

DSMok1 · Post by **DSMok1** » Sat May 21, 2011 1:52 pm

Ah! I didn't remember that model.

How would you explain it in layman's terms? If the Bayesian prior is uninformed, what effect does the prior have on the distribution? I'll add that in, after you clarify it simply.

Ryan · Post by **Ryan** » Sat May 21, 2011 2:10 pm

So when I say uninformed I mean that I used a "flat" prior on those variance parameters of the rating distributions.

In sort of layman's terms, my use of an uninformed prior here is so that those variances are not confined to some pre-determined interval. They should theoretically be any possible value.

DSMok1 · Post by **DSMok1** » Sat May 21, 2011 3:04 pm

Ryan wrote:So when I say uninformed I mean that I used a "flat" prior on those variance parameters of the rating distributions.

In sort of layman's terms, my use of an uninformed prior here is so that those variances are not confined to some pre-determined interval. They should theoretically be any possible value.

Trying to grasp this... So you basically use a different prior distribution for each of the ratings--orating, drating, etc? Or are priors different per player? And the distribution is determined by the cross validation? Would that be a normal distribution?

I'm afraid I haven't studied Hierarchical models.

Ryan · Post by **Ryan** » Sat May 21, 2011 10:49 pm

The ratings are assumed to come from a normal distribution with some unknown variance that is estimated from the data. These are estimated using MCMC techniques that give estimates for these variances, likely values, etc.

DSMok1 · Post by **DSMok1** » Sat May 21, 2011 11:20 pm

Ryan wrote:The ratings are assumed to come from a normal distribution with some unknown variance that is estimated from the data. These are estimated using MCMC techniques that give estimates for these variances, likely values, etc.

This sounds, to me, like it would give results very similar to RAPM--is that correct? The result is a regression of all ORatings, for example, towards the same mean (for each player?) What happens to players that have very few minutes--do they end up league average?

You said in the comments on the post, "I’ve constructed this spreadsheet that lists all offensive and defensive ratings from this model, with associated standard deviations and 95% credible intervals (which are constructed with the 2.5% and 97.5% quantiles)." Did you actually post the spreadsheet anywhere?

BTW--did you do cross validation within the year for that hierarchical model?

Ryan · Post by **Ryan** » Sun May 22, 2011 1:09 am

It would be similar to RAPM in that way, yes. As for the spreadsheet, the links look hidden in the comments. "this spreadsheet" actually links to:

http://spreadsheets.google.com/ccc?key= ... tZVE&hl=en

I didn't do any cross-validation with this model.

DSMok1 · Post by **DSMok1** » Mon May 23, 2011 11:51 am

Okay, I'll try to add your system to my rundown.

The open research for an informed Bayesian prior still stands, since that will be huge for getting good estimates for low-minutes players.

APBRmetrics

A Review of Adjusted Plus/Minus and Stabilization

A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization

Re: A Review of Adjusted Plus/Minus and Stabilization