Prediction with BoxScore totals
Re: Prediction with BoxScore totals
This new metric consisting of recursively informed RAPM+BoxScoreRating is now on my website as "xRAPM" and has replaced RAPM
There are full lists of players for each year, like here http://stats-for-the-nba.appspot.com/ratings/2007.html
and team pages with player ratings of '13, like this http://stats-for-the-nba.appspot.com/teams/OKC.html
There are full lists of players for each year, like here http://stats-for-the-nba.appspot.com/ratings/2007.html
and team pages with player ratings of '13, like this http://stats-for-the-nba.appspot.com/teams/OKC.html
Re: Prediction with BoxScore totals
Great!
Have you yet applied an aging curve to the priors for each player between seasons? That's critical in my opinion.
Have you yet applied an aging curve to the priors for each player between seasons? That's critical in my opinion.
-
- Posts: 5
- Joined: Fri Nov 16, 2012 12:57 am
Re: Prediction with BoxScore totals
JE, seems like I"m the only downer here, which made me decide to re-register.J.E. wrote:This new metric consisting of recursively informed RAPM+BoxScoreRating is now on my website as "xRAPM" and has replaced RAPM
There are full lists of players for each year, like here http://stats-for-the-nba.appspot.com/ratings/2007.html
and team pages with player ratings of '13, like this http://stats-for-the-nba.appspot.com/teams/OKC.html
I guess first and foremost: Can you please keep RAPM data available on your site in addition to the xRAPM data?
I understand that you are focused on using RAPM-based techniques to make for a single stat that gets closer to the holy grail, and I'm not against that. However, for those of us who are focused primarily in using various tools together to analyze a situation, the beauty of +/- data is in its unbiased orthogonality relative to the box score. So from my perspective right now, the site with the single best source of non-box score biased based data on the internet has gone and tossed that out and replaced it with something that really isn't the same thing.
If people here believe I'm not understand the situation properly, please set me straight. Clearly though, box scores are being used, and now all of a sudden I see guys like McGrady & Paul who were loved by PER take a leap forward. Hard to imagine this is a coincidence.
Re: Prediction with BoxScore totals
I feel like pointing out that RAPM is biased by design, so whilst it may not suffer the same lack of completeness as the box-score it is biased in another way. I think the aim should be to minimize our errors in explaining future outcomes, and not worry so much about the 'purity' of our results. A good hack is a very valuable thing in stats, it can reveal a lot. However I can understand you wanting RAPM as well as xRAPM.
And obviously, whilst +/- is informationally complete it is far from accurate.
And obviously, whilst +/- is informationally complete it is far from accurate.
Re: Prediction with BoxScore totals
Looking through your coefficients makes me wonder how "useful" they are. I have a big issue with results, which are implying that a FGA is more harmful than a turnover, because that is per se not true. The rules of the game are forcing teams to take a shot within 24 seconds, if they are not taking that shot, they are penalized with a turnover. According to your results, it is better to throw the ball away/let the shot clock expire than taking a difficult shot with a success-rate below 29%.J.E. wrote:Here are the weights I found for offense and defense. Everything's scaled to "influence of height on offense"
Also, converting your FT with a higher FT% is something to be considered negative, especially on defense? I have the impression that some of the variables you picked are statistically not significant.
I also agree with johannesdesilentio, that having pure RAPM would be useful.
Re: Prediction with BoxScore totals
I will definitely add that to the site at some point, together with coach rating. Although I do believe that by including BoxScore stats the problem of not having a aging curve built in got a little smaller (But I'm not saying an aging curve isn't necessary anymore)Have you yet applied an aging curve to the priors for each player between seasons? That's critical in my opinion.
My main goal is to build a metric that is best at forecasting future offensive efficiency; It is my belief that this is equivalent with building a metric that gives the most accurate player ratings. My goal is not to build a metric that further needs to be combined with other metrics. Especially because I don't really believe in most other metrics except ASPM, ORtg/DRtg and LambdaPm (which is very similar to xRAPM).I guess first and foremost: Can you please keep RAPM data available on your site in addition to the xRAPM data?
I understand that you are focused on using RAPM-based techniques to make for a single stat that gets closer to the holy grail, and I'm not against that. However, for those of us who are focused primarily in using various tools together to analyze a situation, the beauty of +/- data is in its unbiased orthogonality relative to the box score. So from my perspective right now, the site with the single best source of non-box score biased based data on the internet has gone and tossed that out and replaced it with something that really isn't the same thing.
If people here believe I'm not understand the situation properly, please set me straight. Clearly though, box scores are being used, and now all of a sudden I see guys like McGrady & Paul who were loved by PER take a leap forward. Hard to imagine this is a coincidence.
As was already mentioned, RAPM was already biased. And, it was biased in direction of a worse/less accurate prior. The ratings were less accurate, and thus unfair to certain players. If some players took a leap forward through xRAPM it most likely means they were unfairly underrated in RiRAPM, and in turn, their teammates were overrated. Further, all ratings are estimates and never represent actual truth. I'm just trying to improve those estimates. If you want hard-fact +/- stats you should probably look at simple +/- and On/Off.
xRAPM just continues with the thought of improving out-of-sample prediction, which (probably) was the reason for RAPM being built as an replacement of APM
Also, please realize that an at least top 3 BoxScore metric is helping with building the priors, and that estimates given by RiRAPM were estimates that were most often further away from the truth compared to xRAPM estimates
I'd have no problem with saying that shooting FTs at a bad % is certainly an indicator for being a good defender.Also, converting your FT with a higher FT% is something to be considered negative, especially on defense?
I've taken measures to avoid overfitting. Those coefficients that might not be significant should at least be close to 0I have the impression that some of the variables you picked are statistically not significant.
Re: Prediction with BoxScore totals
Well, your coefficients are actually saying the opposite overall. That wasn't really the point, just that I think FT% is redundant, when you have FT and FTA in it as well.J.E. wrote:I'd have no problem with saying that shooting FTs at a bad % is certainly an indicator for being a good defender.
I don't think that this is true at all. I can probably present you multiple examples of regressions in which a non-significant variable had actually a value substantially higher than 0. No idea what kind of "measures to avoid overfitting" were used, but looking through your picked variables and the coefficients they are rather close to the coefficients you would get, if you ran a regression on team overall boxscore totals. And when I do that, I have multiple variables being not significant for offense and others for defense while having a bigger coefficient than other significant variables.J.E. wrote:I've taken measures to avoid overfitting. Those coefficients that might not be significant should at least be close to 0
Including the height is a good thing, but I would guess (at least that is my result) that height over average height for position gives a better prediction. 6'9 is bigger than league average, but 6'9 as a center is actually below average for that position. But that would mean that you would need to assign each player with a certain position, which can be a tough job, especially for those who are essentially playing different position on offense and defense.
Re: Prediction with BoxScore totals
You'll always get counter-intuitive correlations when you double-count FG and FGA; or triple-count, with FG%.mystic wrote:... implying that a FGA is more harmful than a turnover, because that is per se not true. ..
Using discrete events should always be preferred: FG made and FG missed, for example.
Don't worry about the fact that FGX aren't explicitly in the box score. It's just FGA - FG, y'know.
Re: Prediction with BoxScore totals
Mike, I agree that using exclusive events like 2pt field goals made, 2pt fg missed, 3pt fg made, etc. is the way to go. But not using that is not the issue which causes the "counter-intuitive" results. It is also not just "counter-intuitive" that turnover are more harmful than even a field goal missed, it is simply a fact. The results are not showing that, because players, coaches, etc. are aware of the shotclock, thus we don't have an unbiased sample anymore. If nobody would know about the shotclock while it still exists, we would have more turnover events, while less "bad" shots. In that case the results of the regression would shift. It is just the rules of the game, which are forcing players to take late bad shots on purpose, while nobody is actually letting the shot clock expire on purpose. The latter is what the results of the regression is implying to be the better strategy in some cases.
Re: Prediction with BoxScore totals
WOWSA. This is a game-changer.
Re: Prediction with BoxScore totals
Re: Mystic.
It is simply impossible to pull out two variables from a multivariable regression and compare them without taking into context the other variables.
In fact, even looking at one variable and saying it "costs x" is a fallacy. Statistics correlate with other statistics.
The easiest proof of this is Turnovers. If your only input in a regression against +/- was Turnovers and Rebounds, for example, Turnovers would likely be positive! This is because players with higher turnover rate tend to score more points and assist the ball more. It doesn't mean that a player who turns the ball over 10 times per 100 possessions is necessarily better than one who does it at 5 times per 100 possessions. It just means that while we are holding other things equal, players with a tendency to turn it over also tend to be more active in helping put points on the board.
This is also precisely why Wins Produced doesn't make any sense.
It is simply impossible to pull out two variables from a multivariable regression and compare them without taking into context the other variables.
In fact, even looking at one variable and saying it "costs x" is a fallacy. Statistics correlate with other statistics.
The easiest proof of this is Turnovers. If your only input in a regression against +/- was Turnovers and Rebounds, for example, Turnovers would likely be positive! This is because players with higher turnover rate tend to score more points and assist the ball more. It doesn't mean that a player who turns the ball over 10 times per 100 possessions is necessarily better than one who does it at 5 times per 100 possessions. It just means that while we are holding other things equal, players with a tendency to turn it over also tend to be more active in helping put points on the board.
This is also precisely why Wins Produced doesn't make any sense.

Re: Prediction with BoxScore totals
FWIW, Nick Collison is not super-elite on this measure as he was on other versions of APM and RAPM. In the +2 range in the 2010 and 2011 seasons, in the +1 range in 2012 and 2013.
Re: Prediction with BoxScore totals
Have you tested that? Well, likely not, because otherwise you wouldn't have brought up that example.bbstats wrote: The easiest proof of this is Turnovers. If your only input in a regression against +/- was Turnovers and Rebounds, for example, Turnovers would likely be positive!
Correlation between PTS, AST and TO-R:
Code: Select all
Correlations
TO-R PTS AST
Pearson Correlation TO-R 1,000 -,262 -,052
PTS -,262 1,000 ,237
AST -,052 ,237 1,000
Sig. (1-tailed) TO-R . ,000 ,136
PTS ,000 . ,000
AST ,136 ,000 .
N TO-R 452 452 452
PTS 452 452 452
AST 452 452 452
Also, you are wrong about the rest as well, because I compare the coefficients within the context.

That is also wrong. WP doesn't make sense, because it is using team formulas to calculate the marginal values while at the same time handling the game, as if that would be 5 distinct different 1on1 games rather than one 5on5 game.bbstats wrote: This is also precisely why Wins Produced doesn't make any sense.
In the sense Jerry is falling into a similar trap here. That's why a turnover comes out as being more harmful than FGA, while the chances to score in a possession for a team after missed field goal is higher than 0 while it is exactly 0, if a turnover occurs.
Re: Prediction with BoxScore totals
Mystic,
It seems to me that you are drawing too many conclusions from the BOX weights. These weights, from what I can tell, do not say anything about the VALUE of a turnover or a missed field goal, and I do not believe you can extrapolate anything about coaching strategy either. It certainly would not make any sense to say that a minute played has a negative offensive value or that missing free throws will somehow turn you into Ben Wallace. Obviously a turnover is worse than a missed field goal; but, that is beside the point. The value of an event and its predictive power for the value of a player in a small range on the margins are two completely different concepts.
It seems to me that you are drawing too many conclusions from the BOX weights. These weights, from what I can tell, do not say anything about the VALUE of a turnover or a missed field goal, and I do not believe you can extrapolate anything about coaching strategy either. It certainly would not make any sense to say that a minute played has a negative offensive value or that missing free throws will somehow turn you into Ben Wallace. Obviously a turnover is worse than a missed field goal; but, that is beside the point. The value of an event and its predictive power for the value of a player in a small range on the margins are two completely different concepts.
Re: Prediction with BoxScore totals
It's actually very easy to call into question whether a turnover should be considered as bad as a missed FGA for an individual player, whilst not for a team. Maybe often turnovers occur when there is little to no chance of a decent shot coming off, and the guy turning it over isn't close to the only reason it's being turned over, just the guy who touches it last. Now contrast that with the idea that often players take shots when passes for possibly superior shots are available, suddenly a missed FGA is worse than merely the loss of an average opportunity.
I'm not saying this is the case, only that trying to theoretically derive box-score weights is, as far as I'm concerned, a fool's errand that I gave up some time ago.
I'm not saying this is the case, only that trying to theoretically derive box-score weights is, as far as I'm concerned, a fool's errand that I gave up some time ago.