APBRmetrics

The discussion of the analysis of basketball through objective evidence, especially basketball statistics.
It is currently Sat Sep 23, 2017 12:21 am

All times are UTC




Post new topic Reply to topic  [ 88 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next
Author Message
PostPosted: Sun Sep 09, 2012 1:29 pm 
Offline

Joined: Fri Apr 15, 2011 8:28 am
Posts: 802
I wanted to see which BoxScore stats predict 5on5 matchups in the following season the best, so I did this:

Using BoxScore totals, find out which weighing scheme gives the lowest MSE for expected vs actual points per possession for every 5on5 matchup in the following season. (ignoring matchups which contain players that we've never seen before)

So, '07 BoxScore totals were used to create player values which then were used to predict '08.
'08 BoxScore totals were used to predict '09, and '09 was used to predict '10 (I stopped here)

The goal was to find the weighing scheme that minimizes prediction error for all 3 forecasted seasons ('08+'09+'10) combined

For offense, I came up with
(0.000007minutes+0.27FGM-0.06FGA+0.25 3PM+0.14 3PA+0.34 FTM-0.07FTA+0.38OReb-0.03DReb+0.02TReb+0.25ASS+0.06Steal-0.02Block-0.3Turnover+0.1Foul-0.02Points-0.08)*100
and defense
(0.000009minutes+0.1FGM-0.02FGA+0.3 3PM-0.06 3PA+0.04 FTM+0.02FTA+0.0 OReb+0.22DReb+0.0TReb+0.02ASS+0.8Steal+0.37Block-0.2Turnover-0.04Foul-0.03Points-0.06)*100

(everything but minutes is per minute)

Players with <200 minutes in a season had their rating influenced by minutes only.

Player ratings are here
http://stats-for-the-nba.appspot.com/PBP/ranking09.html
http://stats-for-the-nba.appspot.com/PBP/ranking08.html
http://stats-for-the-nba.appspot.com/PBP/ranking07.html

In the end, I'm probably going to use this as priors for RAPM, but I'll have to check first if "BoxScoreTotals informed RAPM" outperforms "RAPM informed RAPM" in forecasting

The player ratings look mostly OK to me, so I guess it passes the smell test. Obviously it's not without some weird names at the top.
'07 has Arenas #1 (back then he had a 24 PER)
'08 has Camby at #3, Stoudemire #6, Biedrins ~#15
'09 has Troy Murphy #7, Nate Robinson #12

_________________
http://stats-for-the-nba.appspot.com/


Top
 Profile  
 
PostPosted: Sun Sep 09, 2012 7:26 pm 
Offline

Joined: Fri Apr 15, 2011 12:37 am
Posts: 280
I assume you ran this in a regression framework? How did you get it to converge when including offensive, defensive, and total rebounds?


Top
 Profile  
 
PostPosted: Sun Sep 09, 2012 7:26 pm 
Offline

Joined: Thu Apr 14, 2011 11:10 pm
Posts: 4604
There are some interesting weights in there. Missed FGs and missed FTs don't seem to matter hardly at all for the offensive player rating. The "break even point" (purely for the offensive player, not the net break even point with the defender) for shooting efficiency appears to be far below even Hollinger's. 0.25 weight for assists here, in contrast to many systems that give them far more. Defensive rebounds worth about 60% to the defense score compared to what an offensive rebound is worth to the offensive score. If one were comparing these results to PER and WP, PER is closer to them on shot efficiency and WP is closer on defensive rebounds. The impacts of double entry accounting of all these boxscore stats in comparison to systems that are not complete on double entry accounting.


Top
 Profile  
 
PostPosted: Sun Sep 09, 2012 10:23 pm 
Offline

Joined: Fri Apr 15, 2011 8:28 am
Posts: 802
xkonk wrote:
I assume you ran this in a regression framework? How did you get it to converge when including offensive, defensive, and total rebounds?

I removed total rebounds and total points now, didn't really make too much sense to include them

I've run it again, this time predicting '08-'12. Should help with some overfitting (I just need to keep in mind that I may not use these coefficients when testing 'BoxScore informed RAPM' when predicting anything before '13)

Off:
(0.000015minutes+0.18FGM-0.03FGA+0.32 3PM+0.03 3PA+0.24 FTM-0.01FTA+0.26OReb-0.03DReb+0.2ASS-0.06Steal-0.18Block-0.42Turnover+0.22Foul-0.08)*100
Just foul alot and you're gonna be a good offensive player next season!

Def:
(0.00001minutes+0.1FGM-0.03FGA+0.12 3PM-0.03 3PA+0.04 FTM+0.05FTA-0.07 OReb+0.19DReb+0.04ASS+0.67Steal+0.42Block-0.28Turnover-0.05)*100

Guards/small players tend to dominate the offensive ranking, while on defense things are reversed. Together, things seem to even out (equal amounts of each position ranked near the top)

http://stats-for-the-nba.appspot.com/PBP/ranking12.html
http://stats-for-the-nba.appspot.com/PBP/ranking11.html
http://stats-for-the-nba.appspot.com/PBP/ranking10.html
http://stats-for-the-nba.appspot.com/PBP/ranking09.html
http://stats-for-the-nba.appspot.com/PBP/ranking08.html
http://stats-for-the-nba.appspot.com/PBP/ranking07.html

It sure likes Chris Paul.

Some obscure names at the top remain:
-Likes Kevin Martin too much for my taste.
-Barbosa top 20 in '07
-Amare #2 in '08
-Kidd #6 in '09, Murphy top 10, Nate Robinson top 20
-Kidd again top 10 in '10, Murphy top 15
-Love #3 in '11, Stoudemire top 10
-'12 looks the weirdest. Love #3, DeMarcus Cousins #4, Greg Monroe top 15, Wall top 20, Humphries top 30

_________________
http://stats-for-the-nba.appspot.com/


Top
 Profile  
 
PostPosted: Sun Sep 09, 2012 10:28 pm 
Offline

Joined: Thu Apr 14, 2011 11:10 pm
Posts: 4604
Shot attempts, especially missed shots, matter even less than in the previous equation, as do assists. Defensive rebounds to the defense worth a bit higher ratio of what a offensive rebound is worth to the offense.


Top
 Profile  
 
PostPosted: Sun Sep 09, 2012 10:58 pm 
Offline

Joined: Fri Apr 15, 2011 12:37 am
Posts: 280
Free throws are extremely predictive of positive play next year. On offense you can break even as long as you go 1 for 25 and on defense you don't even need to make them; attempts alone are positive.

Are FGM and FGA all shots, from which you would adjust the 3PM and 3PA, or are they two-pointers only?


Top
 Profile  
 
PostPosted: Sun Sep 09, 2012 11:16 pm 
Offline

Joined: Fri Apr 15, 2011 8:28 am
Posts: 802
xkonk wrote:
On offense you can break even as long as you go 1 for 25

Now now, you'll have to go 1 for 6!
(I don't think the term "break even" is correct here, because everyone starts as a -8 offensive player and has to work their way up)

Quote:
Are FGM and FGA all shots, from which you would adjust the 3PM and 3PA, or are they two-pointers only?
FGM and FGA are all shots, I should probably seperate those

_________________
http://stats-for-the-nba.appspot.com/


Top
 Profile  
 
PostPosted: Sun Sep 09, 2012 11:25 pm 
Offline

Joined: Thu Apr 14, 2011 11:10 pm
Posts: 4604
J.E. wrote:
(I don't think the term "break even" is correct here, because everyone starts as a -8 offensive player and has to work their way up)


That is a really major detail necessary to understand the results.



Can you readily compute the average boxscore stats of players with overall ratings between +0.5 and -0.5 (and +2.5 and +1.5 or -1.5 and -2.5)? If so that would be interesting to see.


Top
 Profile  
 
PostPosted: Mon Sep 10, 2012 9:54 am 
Offline

Joined: Fri Apr 15, 2011 12:02 am
Posts: 3828
Location: Asheville, NC
Quote:
FGM and FGA are all shots, I should probably seperate those

Quote:
I removed total rebounds and total points now, didn't really make too much sense to include them

And then why not separate all discrete events? -- A missed shot (FGX, FTX) is more specific than an attempted shot (FGA, FTA), which includes made shots.
A 3pt miss may have different impact than a 2pt miss, and certainly shouldn't be conflated with 3pt makes (via 'attempts').

3FG, 2FG, FT, 3FGX, 2FGX, FTX are the 6 discrete shooting 'events'.

Boxscore events are distinguished from boxscore entries: 2FGA = FGA - 3FGA. 2FGX = 2FGA - 2FG.
These raw numbers are just as valid as those which arbitrarily appear in the box scores.


Top
 Profile  
 
PostPosted: Mon Sep 10, 2012 2:58 pm 
Offline

Joined: Thu Apr 14, 2011 11:18 pm
Posts: 813
Location: Maine
I'm interested to see how your numbers will fare when compared with ASPM, which was also built expressly to model APM/RAPM.

I would recommend using the 8-year equally-weighted RAPM that you compiled for me to compile the box-score weights...

Are you including regression to the mean, here, like in RAPM?

_________________
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1


Top
 Profile  
 
PostPosted: Mon Sep 10, 2012 3:45 pm 
Offline

Joined: Fri Apr 15, 2011 12:37 am
Posts: 280
J.E. wrote:
xkonk wrote:
On offense you can break even as long as you go 1 for 25

Now now, you'll have to go 1 for 6!
(I don't think the term "break even" is correct here, because everyone starts as a -8 offensive player and has to work their way up)

Quote:


Did I misread your post? I think it says .24 for FTM and -.01 for FTA. So if I have 25 tries and miss 24, I lose .24 points but gain .24 for the one make. No? And on defense both FTM and FTA are positive, so you literally don't even have to make free throws; simply getting to the line is a good thing.


Top
 Profile  
 
PostPosted: Mon Sep 10, 2012 3:47 pm 
Offline

Joined: Thu Apr 14, 2011 11:24 pm
Posts: 331
J.E. wrote:
In the end, I'm probably going to use this as priors for RAPM, but I'll have to check first if "BoxScoreTotals informed RAPM" outperforms "RAPM informed RAPM" in forecasting...
Jeremias, I am interested in learning what your priors are about priors.

First of all, are you of the belief that this box score approach will outperform the year n-1 RAPM prior, and if so, why? I don't know if this intuition is correct, but my sense is that it won't, in that it is effectively an attempt to reestablish a baseline that is at variance with what APM/RAPM is "designed" to generate: estimates inclusive of all the little things not accounted for in box scores. And then there is the special, relative weakness of box score methods in measuring defensive performance.

And then a concern about this:
J.E. wrote:
Guards/small players tend to dominate the offensive ranking, while on defense things are reversed. Together, things seem to even out (equal amounts of each position ranked near the top)...
Isn't one of the insights from APM/RAPM that all positions aren't equal? That players labeled PG, on average, have negative APM/RAPM, whereas "bigger players" don't. By having an offensive bias in the prior - what a box score approach yields, I think - you discount this "true" effect.

Finally, another observation is that the distribution of +/- results from the box score regressions appears to be flatter than that generated by the RAPM priors.

All this aside, I am hoping that an improved prior can be formulated, at least for the initial year of the chain. From your posted results, it seems that using an RAPM prior for year one (i.e. using 2001-02 for 2002-03) imposes a three year timetable for "reasonable" results to be obtained: reasonable being defined as those yielding a consistent year on year distribution of results in the upper tail. Perhaps instead a box score result normalized by position averages?

And then, of course, there is the long-awaited and much-anticipated incorporation of aging curves.


Top
 Profile  
 
PostPosted: Mon Sep 10, 2012 4:07 pm 
Offline

Joined: Thu Apr 14, 2011 11:18 pm
Posts: 813
Location: Maine
schtevie, I would anticipate an informed prior significantly outperforming a prior of a single number for each player, for any test you choose.

It would, however also bias the results toward box-score performers. I would recommend J.E. do a prior based purely on minutes played and other non-box-score data for an alternative, non-biased, RAPM.

Another thing I think would improve performance significantly for multi-year RAPM (like the current version based on last year's RAPM) would be the addition of aging curves (properly generated to minimize error).

_________________
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1


Top
 Profile  
 
PostPosted: Mon Sep 10, 2012 8:06 pm 
Offline

Joined: Thu Apr 14, 2011 11:35 pm
Posts: 153
Mike G wrote:
And then why not separate all discrete events? -- A missed shot (FGX, FTX) is more specific than an attempted shot (FGA, FTA), which includes made shots.
A 3pt miss may have different impact than a 2pt miss, and certainly shouldn't be conflated with 3pt makes (via 'attempts').

3FG, 2FG, FT, 3FGX, 2FGX, FTX are the 6 discrete shooting 'events'.

Boxscore events are distinguished from boxscore entries: 2FGA = FGA - 3FGA. 2FGX = 2FGA - 2FG.
These raw numbers are just as valid as those which arbitrarily appear in the box scores.


Right, although the model is fundamentally the same regardless of whether one measures say FGA and FGM, or FGX and FGM, as long as the variables that one chooses include all of the relevant events, and provide enough information to distinguish between them. For ease of interpreting the results, it's usually (but not always) advisable to have categories which are mutually exclusive, e.g. 2FGX rather than 2FGA. But the underlying model is the same either way.

The big thing that seems to be missing is derived boxscore stats, most prominently FG% (or 2FG% and 3FG%, or Eff FG%, or TS%, whatever you prefer). In theory, by including FGM and either FGA or FGX you've got it covered. In reality, FG% is likely to have additional important predictive value. (If in fact you did include FG% but failed to report it, much of what I write below is irrelevant, instead the question is what is the coefficient on FG%.)

This is exactly the issue that Tony Minkoff dealt with (or really, failed to deal with) some 15-20 years ago when he did his Minkoff Player Ratings. We didn't have play-by-play data in those days, so he used minutes played (or maybe it was mins/game) as his dependent variable, and all of the raw boxscore stats as his independent variables. The results weren't bad, but I suggested that he was leaving out some crucial NBA variables such as FG%.

It's not enough to include just FGM and FGA. FGM/FGA (in whatever version you wish, 2pt, 3pt, TS%, whatever) is also of critical importance. If you're trying to figure out who's fastest, it's not enough to include just miles travelled and hours of travel; you also need to calculate distance/time, i.e. miles per hour. And miles/hour, just like FGM/FGA, is not a linear combination of FGM and FGA. Thus a linear model which merely includes FGM and FGA, but which omits FGM/FGA, is going to be missing an important variable.

Sure enough, Minkoff found that FG% when added to the regression was a significant predictor. But he continued to leave it out of his model because he wanted to include only counting stats: FGA, FGM, ORebds, etc. That gave his model a nice simple linearity -- but worse fit than it could have and should have had.

There are similar issues with fouls or more precisely fouls per minute. A player with an inordinately small number of fouls is probably not doing very much on the court (unless he's maybe Wilt Chamberlain), so more fouls per minute is often a positive in these sorts of regressions, as we see in the offense regression here. But that relationship cannot be linear, because as fouls per minute get higher and higher the player eventually will find himself getting benched, to say nothing of the harm he does to his team by getting it into the bonus earlier (and getting the opponents to the FT line more often).


Top
 Profile  
 
PostPosted: Tue Sep 11, 2012 11:36 pm 
Offline

Joined: Fri Apr 15, 2011 8:28 am
Posts: 802
Crow, I'll get to your request in the next couple of days. Probably after I did some minor changes that people here suggested

Mike G wrote:
And then why not separate all discrete events?
Yeah. I'll do that from now on

Quote:
I would recommend using the 8-year equally-weighted RAPM that you compiled for me to compile the box-score weights...
I know that this and your ASPM should probably give the same results, but I'll try this approach. Maybe it adds some insight, maybe it doesn't. We'll have to certainly test it

DSMok1 wrote:
Are you including regression to the mean, here, like in RAPM?

schtevie wrote:
Finally, another observation is that the distribution of +/- results from the box score regressions appears to be flatter than that generated by the RAPM priors.

I assumed everyone would go back to 75% to what they did the season before. (I think it's a reasonable figure, based on the fact that it used to give the best prediction results with 'RAPM informed RAPM'. Those 75% can obviously be messed with to achieve slightly better results). I actually multiplied the BoxScore-derived ratings with that 0.75 factor *before* putting them online, which is something I haven't done with RAPM figures. Also, 'RAPM informed RAPM' uses multiple years of information, if you will. Single year 'uninformed RAPM' looks a lot flatter, also. All being equal, the BoxScore derived ratings are probably less flat

xkonk wrote:
Did I misread your post?
No, I misread yours. Sorry

schtevie wrote:
First of all, are you of the belief that this box score approach will outperform the year n-1 RAPM prior, and if so, why?
I'm not sure but it might. If it's not better as a prior, maybe a mix of n-1 RAPM and BoxScore prior is the best. One area where it probably has an edge on the n-1 RAPM prior is low minute players, which are very likely worse than 0, which is what n-1 RAPM used to give them. Rookies are another aspect where the n-1 RAPM prior is probably weak

Quote:
And then there is the special, relative weakness of box score methods in measuring defensive performance
It's weak, but if there was no information in the BoxScore regarding defense, everything should just show up as 0. As it appears, even offensive statistics can give indications for future defensive performance

mtamda wrote:
FG%
I'll put 2FG%, 3FG% and FT% in. I'll just need to find a good threshold for minimum attempts


The next steps, for me, should be these:
-Grab the now available PBP from bbr, so that I have more (reliable) matchupdata to fit the model on.
-Run this algorithm on the standard BoxScore. This should be translatable to college and euroleague if one wishes
-grab all other available player specific data for single seasons, like salary and actions we can read from the PBP, like goaltends, shots from different distances (or clock situations), different types of turnovers etc.. Then run the algorithm with the more detailed player specific data.
-Find out whether 'BoxScore(+PBP info) informed RAPM' beats 'RAPM informed RAPM' in retrodiction

_________________
http://stats-for-the-nba.appspot.com/


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 88 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group