vanilla RAPM

Home for all your discussion of basketball statistical analysis.
Post Reply
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

vanilla RAPM

Post by J.E. »

Back by popular demand. You can find it here

http://stats-for-the-nba.appspot.com/va ... -2007.html
http://stats-for-the-nba.appspot.com/va ... -2008.html
http://stats-for-the-nba.appspot.com/va ... -2009.html
http://stats-for-the-nba.appspot.com/va ... -2010.html
http://stats-for-the-nba.appspot.com/va ... -2011.html
http://stats-for-the-nba.appspot.com/va ... -2012.html
http://stats-for-the-nba.appspot.com/va ... -2013.html

"2004-2007" is multiyear RAPM, where 2004 gets weighted with 1/8, 2005 with 1/4, 2006 with 1/2 and 2007 with 1. All other files accordingly. Doesn't use priors anywhere, so it's probably a little kinder to rookies, and free of any BoxScore data
permaximum
Posts: 416
Joined: Tue Nov 27, 2012 7:04 pm

Re: vanilla RAPM

Post by permaximum »

Great. Just when I found out how to do it properly :)

Could you share your decision on minutes cutoff and cross validation method?
jbrocato23
Posts: 105
Joined: Thu Jul 26, 2012 8:49 pm
Location: Dallas, TX

Re: vanilla RAPM

Post by jbrocato23 »

Great, thanks J.E.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: vanilla RAPM

Post by J.E. »

permaximum wrote:Great. Just when I found out how to do it properly :)

Could you share your decision on minutes cutoff and cross validation method?
You don't need any minute cutoff with ridge (provided you have a reasonable lambda).
As for crossvalidation, I'd randomly remove observations from the data and then "forecast" the left out observations later with the computed ß's. You could do 10-fold CV, but 4-fold already seems enough if you want to save a bit of computing time
permaximum
Posts: 416
Joined: Tue Nov 27, 2012 7:04 pm

Re: vanilla RAPM

Post by permaximum »

J.E. wrote:
permaximum wrote:Great. Just when I found out how to do it properly :)

Could you share your decision on minutes cutoff and cross validation method?
You don't need any minute cutoff with ridge (provided you have a reasonable lambda).
As for crossvalidation, I'd randomly remove observations from the data and then "forecast" the left out observations later with the computed ß's. You could do 10-fold CV, but 4-fold already seems enough if you want to save a bit of computing time
Thanks for the information. I found out that lambda pretty much stabilizes at 100-fold cv. It takes more time ofc but not that bad. I don't like getting different (close but different anyway) lambda values for each time I use 10-fold cv.

As for cutoff, I decided to go with it simply because of this paragraph here. (http://www.nbastuffer.com/component/opt ... /catid,42/)
RAPM is about twice as accurate as an APM using standard regression and using 3 years of data, where the weighting of past years of data and the reference player minutes cutoff has also been carefully optimized.
I think I understood the sentence wrong because of my English :) OK, last question, do you think for 1-year RAPM cutoff isn't needed too? I fear there will be weird names at top.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: vanilla RAPM

Post by J.E. »

permaximum wrote:do you think for 1-year RAPM cutoff isn't needed too?
For the actual computation of the ß's a minute cutoff isn't necessary. When displaying results you might want to use a cutoff, so people aren't tempted to call _unknown_player_X the best player in the league ("because RAPM said so"). Or just include each player's minutes or # of possessions with the RAPM results and add a disclaimer that there's higher uncertainty with lower minutes.

Kosta Koufos is #1 right now in 1y RAPM, I think.

I don't exactly know what you're using those for, but I'd just use more years of data, if you want to avoid "weird names at the top" and whatnot
permaximum
Posts: 416
Joined: Tue Nov 27, 2012 7:04 pm

Re: vanilla RAPM

Post by permaximum »

Thanks for all the info again. Then, I won't use minute cutoff for the ridge but qualify players results by minutes for the final results in the end.

I have a simple player box-score metric which I believe near best you can get with raw box scores. I want to check which players' negative or positive effect won't translate to box scores to get a rough idea of those things that box score misses. It will also be useful to see what type of players they are. Then I will decide what to do with the results.
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: vanilla RAPM

Post by DSMok1 »

permaximum wrote:Thanks for all the info again. Then, I won't use minute cutoff for the ridge but qualify players results by minutes for the final results in the end.

I have a simple player box-score metric which I believe near best you can get with raw box scores. I want to check which players' negative or positive effect won't translate to box scores to get a rough idea of those things that box score misses. It will also be useful to see what type of players they are. Then I will decide what to do with the results.
You might be interested in a couple of articles I wrote: http://godismyjudgeok.com/DStats/2012/n ... -via-rapm/ and http://godismyjudgeok.com/DStats/2012/n ... pm-part-2/
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
permaximum
Posts: 416
Joined: Tue Nov 27, 2012 7:04 pm

Re: vanilla RAPM

Post by permaximum »

DSMok1 wrote:You might be interested in a couple of articles I wrote: http://godismyjudgeok.com/DStats/2012/n ... -via-rapm/ and http://godismyjudgeok.com/DStats/2012/n ... pm-part-2/
I read those articles before and I think they prove the general consensus. They are also on par with my findings.

I recently have come up with RAPM ratings of the last full regular season (2010-11). Kevin Love is extremely overrated in my PTR and PER. As you guess pure point guards are underrated. There are also very good non-stat defenders PTR, PER or other box score ratings miss. Here's top 10 RAPM, DRAPM and ORAPM of 2010-11 regular season. (minimum 1238 minutes)

Code: Select all

Player                RAPM
1. Nowitzki, Dirk     4.06 (Finals MVP)
2. Garnett, Kevin     3.87
3. Ginobili, Manu     3.73
4. Collison, Nick     3.33
5. Pierce, Paul       3.04
6. Bosh, Chris        2.93
7. Duncan, Tim        2.69
8. James, LeBron      2.63
9. Howard, Dwight     2.57
10.Chandler, Tyson    2.55
...
21.Rose, Derrick      1.91 (MVP)

Code: Select all

Player               DRAPM
1. Garnett, Kevin     2.78
2. Brewer, Ronnie     2.52
3. Arthur, Darrell    2.16
4. Duncan, Tim        1.98
5. Allen, Tony        1.96
6. Howard, Dwight     1.77 (DPOY)
7. Bass, Brandon      1.75
8. Pierce, Paul       1.75
9. Livingston, Shaun  1.73
10.Dooling, Keyon     1.69

Code: Select all

Player               ORAPM
1. Nowitzki, Dirk     2.49
2. Nash, Steve        2.40
3. Ginobili, Manu     2.40
4. Wade, Dwyane       2.33
5. Smith, J.R.        1.95
6. Collison, Nick     1.90
7. Bonner, Matt       1.85
8. Lawson, Ty         1.77
9. Davis, Baron       1.68
10.Bosh, Chris        1.65
I can say I'm very satisfied with these results. In the end of each season, I will never look at any player metric but RAPM to evaluate player performance, vote for MVP, DPOY etc. However, in midseason, RAPM won't give very accurate results because of less data. There comes prediction, prior-informed RAPM thus J.E.'s multiyear-RAPM informed RAPM. He says xRAPM is better at that, so I take his word. Towards the end of season, one year uninformed RAPM should give more accurate results than xRAPM as far as seasonal player evaluation goes.

Also, I have come to a conclusion that no box score or PBP model (advanced or not) can give better results than RAPM. In the midseason I would use prior-informed RAPM(xRAPM). For multiple years, RAPM and age-weighted RAPM(are there any?). Towards the end of season and after that, normal RAPM. Box-score metrics should only be used in the beginning of season imo.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: vanilla RAPM

Post by mystic »

First question: What dataset did you use?
permaximum wrote: Also, I have come to a conclusion that no box score or PBP model (advanced or not) can give better results than RAPM.
How did you come to that conclusion?

In my experience, a blended version of a boxscore-based model and a RAPM model gives the best results in terms of prediction and explanation. When I blended my SPM with Jerry's previously published prior informed RAPM results, I got the best result.

The blend of xRAPM and my SPM showed to be a worse and ended up with a clear bias towards bigger players, which was not seen in such a fashion for a 10 yr dataset of blended SPM+prior informed RAPM.
permaximum wrote:Box-score metrics should only be used in the beginning of season imo.
Boxscore-based models can be as good as a predictor as prior informed RAPM in terms of point differential. The bias towards offense of the boxscore and the lack of informations about team, help and weakside defense makes the differentation between offense and defense rather difficult for individual players. There isn't much of a chance of crediting the correct player with the defensive impact via boxscore in more than 50% of the cases and mostly the rebounder, blocker and the stealer will get the defensive credit. That makes a prediction of the team defensive strength based on individual boxscore-based models basically a coin flip.

But overall you need the boxscore informations, because that is the only way to determine production and efficiency for individual players. And you really need that information, because otherwise you will have a systematic error in the model regarding low and high usage players. You can't just increase the offensive load for an individual player and expect that the efficiency stays the same. But without the boxscore information you don't know who is doing what on the court. RAPM gives you an impact number for players in the used situation. Even if you have some sort of position information added, picking the highest RAPM players for each position may very well not result into the best overall team performance, because you don't know anything about balance and fit; RAPM is not helping you to differentiate players here and does not enable you to find a balanced and fitting lineup.
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: vanilla RAPM

Post by DSMok1 »

permaximum wrote: I can say I'm very satisfied with these results. In the end of each season, I will never look at any player metric but RAPM to evaluate player performance, vote for MVP, DPOY etc. However, in midseason, RAPM won't give very accurate results because of less data. There comes prediction, prior-informed RAPM thus J.E.'s multiyear-RAPM informed RAPM. He says xRAPM is better at that, so I take his word. Towards the end of season, one year uninformed RAPM should give more accurate results than xRAPM as far as seasonal player evaluation goes.

Also, I have come to a conclusion that no box score or PBP model (advanced or not) can give better results than RAPM. In the midseason I would use prior-informed RAPM(xRAPM). For multiple years, RAPM and age-weighted RAPM(are there any?). Towards the end of season and after that, normal RAPM. Box-score metrics should only be used in the beginning of season imo.
If you run a retrodiction contest, with any flavor of RAPM, ASPM, SPM, any version of plus/minus stats, it would be straight forward to calculate the RMSE at lineup level. That would help identify how good each of those are.

I think you might be surprised how long it takes RAPM to stabilize. Check out Alex's Retrodiction contest for one way of looking at the issue: http://sportskeptic.wordpress.com/tag/aspm/
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
permaximum
Posts: 416
Joined: Tue Nov 27, 2012 7:04 pm

Re: vanilla RAPM

Post by permaximum »

mystic wrote:First question: What dataset did you use?
permaximum wrote: Also, I have come to a conclusion that no box score or PBP model (advanced or not) can give better results than RAPM.
How did you come to that conclusion?

In my experience, a blended version of a boxscore-based model and a RAPM model gives the best results in terms of prediction and explanation. When I blended my SPM with Jerry's previously published prior informed RAPM results, I got the best result.

The blend of xRAPM and my SPM showed to be a worse and ended up with a clear bias towards bigger players, which was not seen in such a fashion for a 10 yr dataset of blended SPM+prior informed RAPM.
permaximum wrote:Box-score metrics should only be used in the beginning of season imo.
Boxscore-based models can be as good as a predictor as prior informed RAPM in terms of point differential. The bias towards offense of the boxscore and the lack of informations about team, help and weakside defense makes the differentation between offense and defense rather difficult for individual players. There isn't much of a chance of crediting the correct player with the defensive impact via boxscore in more than 50% of the cases and mostly the rebounder, blocker and the stealer will get the defensive credit. That makes a prediction of the team defensive strength based on individual boxscore-based models basically a coin flip.

But overall you need the boxscore informations, because that is the only way to determine production and efficiency for individual players. And you really need that information, because otherwise you will have a systematic error in the model regarding low and high usage players. You can't just increase the offensive load for an individual player and expect that the efficiency stays the same. But without the boxscore information you don't know who is doing what on the court. RAPM gives you an impact number for players in the used situation. Even if you have some sort of position information added, picking the highest RAPM players for each position may very well not result into the best overall team performance, because you don't know anything about balance and fit; RAPM is not helping you to differentiate players here and does not enable you to find a balanced and fitting lineup.
Dataset for what? If you mean RAPM and box-score comparison, I can say I compared 1996/97 - 2012-2013 seasonal xRAPM to PER, PTR and 2007/08 - 2011/12 seasonal uninformed 1 year RAPM to PTR and PER. But I accept I didn't compare RAPM to any type of SPM blend. However in theory, SPM-RAPM blend should have potential to be better at prediction (supported by xRAPM too whcih involves box score and I already pointed out that it's better at prediction). When it comes to explenation, I can't see any chance it can surpass RAPM over the long term. I don't have any proof for that. It's just theory. Still, if you can come up with better ratings (to whom anyway) than RAPM for explanation of previous seasons' player performances, I'll gladly accept it.

I also agree with you that RAPM has no business with efficiency and only box scores can give us a rough idea about that.(In fact I generally agree with your points except player performance explanation). With RAPM we assume every player has been used at their maximum efficiency and performance/usage by coaches. Box score's this advantage won't translate to better player performance explanation because we judge the value of players in a year regardess of how he has been used by their coaches.

In the end, except explanation of player performances in a season, I agree with everything you say.

@DSMoK1

Nice suggestion but I think J.E. has probably done it and I assume nothing can pass xRAPM at prediction atm although xRAPM's box score weights are out of place because of inclusion of very similar stats, unnecessary stats and especially the seperation of box score weights by defense and offense (I talked about why those weights are wrong a bit in my thread and mystic mentioned defense-offense issue in his previous post too). Which means I agree with you that box score involvement make things better for prediction. BTW I really like your site. I'm a webmaster in fact (but my sites aren't related to any sports) and I think you should be involved with SEO to draw high traffic and reach to casual audiences. Your site would be very helpful to inform people.

Stabilization of RAPM! Just what I needed. I think I'm gonna thank you for the 10th time or more :)

Edit: The author of that article's effort on this subject is great. Unfortunately, although his technical knowledge is beyond mine (I recently got involved with these things), I see lots of wrongs in the article. In short, you can't test explanatory and predictive power of those metrics that way.
jbrocato23
Posts: 105
Joined: Thu Jul 26, 2012 8:49 pm
Location: Dallas, TX

Re: vanilla RAPM

Post by jbrocato23 »

J.E.,

I noticed you've had these down for some time now. Do you have plans to put them back up at some point?

James
Post Reply