A Review of Adjusted Plus/Minus and Stabilization
A Review of Adjusted Plus/Minus and Stabilization
I have posted a huge article reviewing the current state-of-the-art in Adjusted Plus/Minus on my blog.
I have endeavored to discuss, in plain English, the origin and method of Adjusted Plus/Minus, and all of the currently used stabilization techniques (such as RAPM). I have also discussed some areas for future research.
I have endeavored to discuss, in plain English, the origin and method of Adjusted Plus/Minus, and all of the currently used stabilization techniques (such as RAPM). I have also discussed some areas for future research.
Re: A Review of Adjusted Plus/Minus and Stabilization
I think there are at least two reasonable ways to go for choosing better priors (than 0):
1. SPM. I think incorporating offensive SPM as an offensive prior will help some. Defensive SPM will probably not help as much, if at all.
2. RAPM of prior years. One thing I will try at some point is compute RAPM for 05/06 with either zero as prior or some MPG-dependent prior, use those ratings as priors to compute ratings for 06/07 and so on. Rookies are obviously a special case, they probably need their own kind of prior depending on age and draft position.
Older data can be helpful. Hopefully I can find the time to convert the 2002-2005 play by play into matchup data soon
I don't really see how player pairs could work with ridge regression in a reasonable way but I'll look at a slightly different variant in the near future
1. SPM. I think incorporating offensive SPM as an offensive prior will help some. Defensive SPM will probably not help as much, if at all.
2. RAPM of prior years. One thing I will try at some point is compute RAPM for 05/06 with either zero as prior or some MPG-dependent prior, use those ratings as priors to compute ratings for 06/07 and so on. Rookies are obviously a special case, they probably need their own kind of prior depending on age and draft position.
Older data can be helpful. Hopefully I can find the time to convert the 2002-2005 play by play into matchup data soon
I don't really see how player pairs could work with ridge regression in a reasonable way but I'll look at a slightly different variant in the near future
Re: A Review of Adjusted Plus/Minus and Stabilization
I think a simple prior just based on minutes played and team quality may be the best, since it would be unbiased.
I think we could make a Bayesian APM with player pairs work. Just list all pairs currently on the court as the variables. Sure, it'd be a bunch of unknowns, but there are ways to deal with that.
Can you weight prior years less? Like, weight all possessions from this year 5, last year 3, before that 2, and before that 1? That would be really useful, if you could adjust the weights to maximize accuracy.
Further: have you looked into pre-processing the matchups with aging curves? That could be difficult, but would certainly yield better OOS results in the current year when using previous years data.
I think we could make a Bayesian APM with player pairs work. Just list all pairs currently on the court as the variables. Sure, it'd be a bunch of unknowns, but there are ways to deal with that.
Can you weight prior years less? Like, weight all possessions from this year 5, last year 3, before that 2, and before that 1? That would be really useful, if you could adjust the weights to maximize accuracy.
Further: have you looked into pre-processing the matchups with aging curves? That could be difficult, but would certainly yield better OOS results in the current year when using previous years data.
Re: A Review of Adjusted Plus/Minus and Stabilization
My original idea was to just build one single pair, treat the pair as one player and re-run the regression. Then look if the pair rating significantly differs from the sum of individual ratings. You want to list all pairs as variables? Am I correct that this way we would have 10 pairs per observation for offense alone? I'll have to think about thatDSMok1 wrote: I think we could make a Bayesian APM with player pairs work. Just list all pairs currently on the court as the variables. Sure, it'd be a bunch of unknowns, but there are ways to deal with that.
This is certainly something I would have already done if I have had more time. I think Joe Sill found that optimal weighing of the prior season should be 0.5. I am not sure if he tried using more than two years but I think he said that the inclusion of the prior season didn't help that much over using just one season, so the benefit we get from more seasons and figuring out better weighing schemes might be minimalCan you weight prior years less? Like, weight all possessions from this year 5, last year 3, before that 2, and before that 1? That would be really useful, if you could adjust the weights to maximize accuracy.
I tried to put player age into the regression the same way I put coaches in, as a sixth man if you want. Initial results weren't that great but I would be lying if I said I spent much time on it. Time is always an issueFurther: have you looked into pre-processing the matchups with aging curves? That could be difficult, but would certainly yield better OOS results in the current year when using previous years data.
Re: A Review of Adjusted Plus/Minus and Stabilization
A model can have just the 5 players on each side or all pairs or it could perhaps have a combination. At least I put it out there as a suggestion for consideration.
Most pairs would have small or very small sample size and might not get used that much because of it. (In many cases there probably would be no meaningful interest in pair data of any kind, at all.) A quick check for a few teams in the playoffs showed about 20 pairs per team that played 10+ minutes per game. One possibility might be to focus mostly on those pairs and basically ignore the others with less minutes and less ability to estimate accurately. Consider them neutral or "unknowable". The highest used pairs seem more manageable to try to understand to a degree and more realistic to try to incorporate into a summary and explain to others than dealing with all of them.
With player ratings and the big minute pair ratings, one might get of a better sense of who is good or bad in general and who is also notably good or bad with particular guys and who is most and least pair-context dependent than with player average roll-up ratings alone. That could inform considerations of player movement between teams or even usage within a rotation. If a guy seems dependent on being with a certain guy to get a good rating, does the team have a similar kind of guy to foster that again or not?
Most pairs would have small or very small sample size and might not get used that much because of it. (In many cases there probably would be no meaningful interest in pair data of any kind, at all.) A quick check for a few teams in the playoffs showed about 20 pairs per team that played 10+ minutes per game. One possibility might be to focus mostly on those pairs and basically ignore the others with less minutes and less ability to estimate accurately. Consider them neutral or "unknowable". The highest used pairs seem more manageable to try to understand to a degree and more realistic to try to incorporate into a summary and explain to others than dealing with all of them.
With player ratings and the big minute pair ratings, one might get of a better sense of who is good or bad in general and who is also notably good or bad with particular guys and who is most and least pair-context dependent than with player average roll-up ratings alone. That could inform considerations of player movement between teams or even usage within a rotation. If a guy seems dependent on being with a certain guy to get a good rating, does the team have a similar kind of guy to foster that again or not?
Last edited by Crow on Sun May 22, 2011 11:06 pm, edited 2 times in total.
Re: A Review of Adjusted Plus/Minus and Stabilization
Unbiased with respect to what, specifically? In other words, what type of bias are you attempting to avoid, and how does using MP and SOT accomplish that?DSMok1 wrote:I think a simple prior just based on minutes played and team quality may be the best, since it would be unbiased.
Re: A Review of Adjusted Plus/Minus and Stabilization
Unbiased by box-score statistics. Minutes should, theoretically, be allotted by quality of player.
Re: A Review of Adjusted Plus/Minus and Stabilization
Bayesian models of efficiency are so 2009 (or earlier?): A Basic Hierarchical Model of Efficiency
I am a basketball geek.
Re: A Review of Adjusted Plus/Minus and Stabilization
Ah! I didn't remember that model.
How would you explain it in layman's terms? If the Bayesian prior is uninformed, what effect does the prior have on the distribution? I'll add that in, after you clarify it simply.
How would you explain it in layman's terms? If the Bayesian prior is uninformed, what effect does the prior have on the distribution? I'll add that in, after you clarify it simply.
Re: A Review of Adjusted Plus/Minus and Stabilization
So when I say uninformed I mean that I used a "flat" prior on those variance parameters of the rating distributions.
In sort of layman's terms, my use of an uninformed prior here is so that those variances are not confined to some pre-determined interval. They should theoretically be any possible value.
In sort of layman's terms, my use of an uninformed prior here is so that those variances are not confined to some pre-determined interval. They should theoretically be any possible value.
I am a basketball geek.
Re: A Review of Adjusted Plus/Minus and Stabilization
Trying to grasp this... So you basically use a different prior distribution for each of the ratings--orating, drating, etc? Or are priors different per player? And the distribution is determined by the cross validation? Would that be a normal distribution?Ryan wrote:So when I say uninformed I mean that I used a "flat" prior on those variance parameters of the rating distributions.
In sort of layman's terms, my use of an uninformed prior here is so that those variances are not confined to some pre-determined interval. They should theoretically be any possible value.
I'm afraid I haven't studied Hierarchical models.
Re: A Review of Adjusted Plus/Minus and Stabilization
The ratings are assumed to come from a normal distribution with some unknown variance that is estimated from the data. These are estimated using MCMC techniques that give estimates for these variances, likely values, etc.
I am a basketball geek.
Re: A Review of Adjusted Plus/Minus and Stabilization
This sounds, to me, like it would give results very similar to RAPM--is that correct? The result is a regression of all ORatings, for example, towards the same mean (for each player?) What happens to players that have very few minutes--do they end up league average?Ryan wrote:The ratings are assumed to come from a normal distribution with some unknown variance that is estimated from the data. These are estimated using MCMC techniques that give estimates for these variances, likely values, etc.
You said in the comments on the post, "I’ve constructed this spreadsheet that lists all offensive and defensive ratings from this model, with associated standard deviations and 95% credible intervals (which are constructed with the 2.5% and 97.5% quantiles)." Did you actually post the spreadsheet anywhere?
BTW--did you do cross validation within the year for that hierarchical model?
Re: A Review of Adjusted Plus/Minus and Stabilization
It would be similar to RAPM in that way, yes. As for the spreadsheet, the links look hidden in the comments. "this spreadsheet" actually links to:
http://spreadsheets.google.com/ccc?key= ... tZVE&hl=en
I didn't do any cross-validation with this model.
http://spreadsheets.google.com/ccc?key= ... tZVE&hl=en
I didn't do any cross-validation with this model.
I am a basketball geek.
Re: A Review of Adjusted Plus/Minus and Stabilization
Okay, I'll try to add your system to my rundown.
The open research for an informed Bayesian prior still stands, since that will be huge for getting good estimates for low-minutes players.
The open research for an informed Bayesian prior still stands, since that will be huge for getting good estimates for low-minutes players.