Prediction with BoxScore totals

Home for all your discussion of basketball statistical analysis.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Prediction with BoxScore totals

Post by J.E. »

This new metric consisting of recursively informed RAPM+BoxScoreRating is now on my website as "xRAPM" and has replaced RAPM

There are full lists of players for each year, like here http://stats-for-the-nba.appspot.com/ratings/2007.html
and team pages with player ratings of '13, like this http://stats-for-the-nba.appspot.com/teams/OKC.html
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Prediction with BoxScore totals

Post by DSMok1 »

Great!

Have you yet applied an aging curve to the priors for each player between seasons? That's critical in my opinion.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
johannesdesilentio
Posts: 5
Joined: Fri Nov 16, 2012 12:57 am

Re: Prediction with BoxScore totals

Post by johannesdesilentio »

J.E. wrote:This new metric consisting of recursively informed RAPM+BoxScoreRating is now on my website as "xRAPM" and has replaced RAPM

There are full lists of players for each year, like here http://stats-for-the-nba.appspot.com/ratings/2007.html
and team pages with player ratings of '13, like this http://stats-for-the-nba.appspot.com/teams/OKC.html
JE, seems like I"m the only downer here, which made me decide to re-register.

I guess first and foremost: Can you please keep RAPM data available on your site in addition to the xRAPM data?

I understand that you are focused on using RAPM-based techniques to make for a single stat that gets closer to the holy grail, and I'm not against that. However, for those of us who are focused primarily in using various tools together to analyze a situation, the beauty of +/- data is in its unbiased orthogonality relative to the box score. So from my perspective right now, the site with the single best source of non-box score biased based data on the internet has gone and tossed that out and replaced it with something that really isn't the same thing.

If people here believe I'm not understand the situation properly, please set me straight. Clearly though, box scores are being used, and now all of a sudden I see guys like McGrady & Paul who were loved by PER take a leap forward. Hard to imagine this is a coincidence.
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: Prediction with BoxScore totals

Post by v-zero »

I feel like pointing out that RAPM is biased by design, so whilst it may not suffer the same lack of completeness as the box-score it is biased in another way. I think the aim should be to minimize our errors in explaining future outcomes, and not worry so much about the 'purity' of our results. A good hack is a very valuable thing in stats, it can reveal a lot. However I can understand you wanting RAPM as well as xRAPM.

And obviously, whilst +/- is informationally complete it is far from accurate.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Prediction with BoxScore totals

Post by mystic »

J.E. wrote:Here are the weights I found for offense and defense. Everything's scaled to "influence of height on offense"
Looking through your coefficients makes me wonder how "useful" they are. I have a big issue with results, which are implying that a FGA is more harmful than a turnover, because that is per se not true. The rules of the game are forcing teams to take a shot within 24 seconds, if they are not taking that shot, they are penalized with a turnover. According to your results, it is better to throw the ball away/let the shot clock expire than taking a difficult shot with a success-rate below 29%.

Also, converting your FT with a higher FT% is something to be considered negative, especially on defense? I have the impression that some of the variables you picked are statistically not significant.

I also agree with johannesdesilentio, that having pure RAPM would be useful.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Prediction with BoxScore totals

Post by J.E. »

Have you yet applied an aging curve to the priors for each player between seasons? That's critical in my opinion.
I will definitely add that to the site at some point, together with coach rating. Although I do believe that by including BoxScore stats the problem of not having a aging curve built in got a little smaller (But I'm not saying an aging curve isn't necessary anymore)
I guess first and foremost: Can you please keep RAPM data available on your site in addition to the xRAPM data?

I understand that you are focused on using RAPM-based techniques to make for a single stat that gets closer to the holy grail, and I'm not against that. However, for those of us who are focused primarily in using various tools together to analyze a situation, the beauty of +/- data is in its unbiased orthogonality relative to the box score. So from my perspective right now, the site with the single best source of non-box score biased based data on the internet has gone and tossed that out and replaced it with something that really isn't the same thing.

If people here believe I'm not understand the situation properly, please set me straight. Clearly though, box scores are being used, and now all of a sudden I see guys like McGrady & Paul who were loved by PER take a leap forward. Hard to imagine this is a coincidence.
My main goal is to build a metric that is best at forecasting future offensive efficiency; It is my belief that this is equivalent with building a metric that gives the most accurate player ratings. My goal is not to build a metric that further needs to be combined with other metrics. Especially because I don't really believe in most other metrics except ASPM, ORtg/DRtg and LambdaPm (which is very similar to xRAPM).
As was already mentioned, RAPM was already biased. And, it was biased in direction of a worse/less accurate prior. The ratings were less accurate, and thus unfair to certain players. If some players took a leap forward through xRAPM it most likely means they were unfairly underrated in RiRAPM, and in turn, their teammates were overrated. Further, all ratings are estimates and never represent actual truth. I'm just trying to improve those estimates. If you want hard-fact +/- stats you should probably look at simple +/- and On/Off.

xRAPM just continues with the thought of improving out-of-sample prediction, which (probably) was the reason for RAPM being built as an replacement of APM

Also, please realize that an at least top 3 BoxScore metric is helping with building the priors, and that estimates given by RiRAPM were estimates that were most often further away from the truth compared to xRAPM estimates
Also, converting your FT with a higher FT% is something to be considered negative, especially on defense?
I'd have no problem with saying that shooting FTs at a bad % is certainly an indicator for being a good defender.
I have the impression that some of the variables you picked are statistically not significant.
I've taken measures to avoid overfitting. Those coefficients that might not be significant should at least be close to 0
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Prediction with BoxScore totals

Post by mystic »

J.E. wrote:I'd have no problem with saying that shooting FTs at a bad % is certainly an indicator for being a good defender.
Well, your coefficients are actually saying the opposite overall. That wasn't really the point, just that I think FT% is redundant, when you have FT and FTA in it as well.
J.E. wrote:I've taken measures to avoid overfitting. Those coefficients that might not be significant should at least be close to 0
I don't think that this is true at all. I can probably present you multiple examples of regressions in which a non-significant variable had actually a value substantially higher than 0. No idea what kind of "measures to avoid overfitting" were used, but looking through your picked variables and the coefficients they are rather close to the coefficients you would get, if you ran a regression on team overall boxscore totals. And when I do that, I have multiple variables being not significant for offense and others for defense while having a bigger coefficient than other significant variables.

Including the height is a good thing, but I would guess (at least that is my result) that height over average height for position gives a better prediction. 6'9 is bigger than league average, but 6'9 as a center is actually below average for that position. But that would mean that you would need to assign each player with a certain position, which can be a tough job, especially for those who are essentially playing different position on offense and defense.
Mike G
Posts: 6145
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Prediction with BoxScore totals

Post by Mike G »

mystic wrote:... implying that a FGA is more harmful than a turnover, because that is per se not true. ..
You'll always get counter-intuitive correlations when you double-count FG and FGA; or triple-count, with FG%.

Using discrete events should always be preferred: FG made and FG missed, for example.
Don't worry about the fact that FGX aren't explicitly in the box score. It's just FGA - FG, y'know.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Prediction with BoxScore totals

Post by mystic »

Mike, I agree that using exclusive events like 2pt field goals made, 2pt fg missed, 3pt fg made, etc. is the way to go. But not using that is not the issue which causes the "counter-intuitive" results. It is also not just "counter-intuitive" that turnover are more harmful than even a field goal missed, it is simply a fact. The results are not showing that, because players, coaches, etc. are aware of the shotclock, thus we don't have an unbiased sample anymore. If nobody would know about the shotclock while it still exists, we would have more turnover events, while less "bad" shots. In that case the results of the regression would shift. It is just the rules of the game, which are forcing players to take late bad shots on purpose, while nobody is actually letting the shot clock expire on purpose. The latter is what the results of the regression is implying to be the better strategy in some cases.
bbstats
Posts: 227
Joined: Thu Apr 21, 2011 8:25 pm
Location: Boone, NC
Contact:

Re: Prediction with BoxScore totals

Post by bbstats »

WOWSA. This is a game-changer.
bbstats
Posts: 227
Joined: Thu Apr 21, 2011 8:25 pm
Location: Boone, NC
Contact:

Re: Prediction with BoxScore totals

Post by bbstats »

Re: Mystic.

It is simply impossible to pull out two variables from a multivariable regression and compare them without taking into context the other variables.

In fact, even looking at one variable and saying it "costs x" is a fallacy. Statistics correlate with other statistics.

The easiest proof of this is Turnovers. If your only input in a regression against +/- was Turnovers and Rebounds, for example, Turnovers would likely be positive! This is because players with higher turnover rate tend to score more points and assist the ball more. It doesn't mean that a player who turns the ball over 10 times per 100 possessions is necessarily better than one who does it at 5 times per 100 possessions. It just means that while we are holding other things equal, players with a tendency to turn it over also tend to be more active in helping put points on the board.

This is also precisely why Wins Produced doesn't make any sense. :)
Crow
Posts: 10536
Joined: Thu Apr 14, 2011 11:10 pm

Re: Prediction with BoxScore totals

Post by Crow »

FWIW, Nick Collison is not super-elite on this measure as he was on other versions of APM and RAPM. In the +2 range in the 2010 and 2011 seasons, in the +1 range in 2012 and 2013.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Prediction with BoxScore totals

Post by mystic »

bbstats wrote: The easiest proof of this is Turnovers. If your only input in a regression against +/- was Turnovers and Rebounds, for example, Turnovers would likely be positive!
Have you tested that? Well, likely not, because otherwise you wouldn't have brought up that example.

Correlation between PTS, AST and TO-R:

Code: Select all

		Correlations
		TO-R	PTS	AST 
Pearson Correlation	TO-R	1,000	-,262	-,052
	PTS	-,262	1,000	,237
	AST 	-,052	,237	1,000
Sig. (1-tailed)	TO-R	.	,000	,136
	PTS	,000	.	,000
	AST 	,136	,000	.
N	TO-R	452	452	452
	PTS	452	452	452
	AST 	452	452	452

There is even a negative correlation between points and turnover-rate as well as a negative between assists and turnover-rate. (Data from 2010/11 season)

Also, you are wrong about the rest as well, because I compare the coefficients within the context. ;)
bbstats wrote: This is also precisely why Wins Produced doesn't make any sense. :)
That is also wrong. WP doesn't make sense, because it is using team formulas to calculate the marginal values while at the same time handling the game, as if that would be 5 distinct different 1on1 games rather than one 5on5 game.

In the sense Jerry is falling into a similar trap here. That's why a turnover comes out as being more harmful than FGA, while the chances to score in a possession for a team after missed field goal is higher than 0 while it is exactly 0, if a turnover occurs.
KAN
Posts: 10
Joined: Thu Oct 18, 2012 2:44 pm

Re: Prediction with BoxScore totals

Post by KAN »

Mystic,

It seems to me that you are drawing too many conclusions from the BOX weights. These weights, from what I can tell, do not say anything about the VALUE of a turnover or a missed field goal, and I do not believe you can extrapolate anything about coaching strategy either. It certainly would not make any sense to say that a minute played has a negative offensive value or that missing free throws will somehow turn you into Ben Wallace. Obviously a turnover is worse than a missed field goal; but, that is beside the point. The value of an event and its predictive power for the value of a player in a small range on the margins are two completely different concepts.
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: Prediction with BoxScore totals

Post by v-zero »

It's actually very easy to call into question whether a turnover should be considered as bad as a missed FGA for an individual player, whilst not for a team. Maybe often turnovers occur when there is little to no chance of a decent shot coming off, and the guy turning it over isn't close to the only reason it's being turned over, just the guy who touches it last. Now contrast that with the idea that often players take shots when passes for possibly superior shots are available, suddenly a missed FGA is worse than merely the loss of an average opportunity.

I'm not saying this is the case, only that trying to theoretically derive box-score weights is, as far as I'm concerned, a fool's errand that I gave up some time ago.
Post Reply