Sanity checking my RAPM methodology for 2019-2022 seasons
Posted: Sat May 20, 2023 7:09 pm
Hi,
Thanks for taking the time to read this! I thought I'd take the time to compute RAPM myself for fun and I wanted to get some feedback and sanity checking. My methodology is as below:
1. Download NBA play by play data, using defensive rebound, turnover, end of clock, made shots to split possessions and convert to stints. Tried to be careful here but there's definitely some cases I didn't handle.
2. Filter out 0 possession stints and non-NBA teams.
3. Adjust for home-away advantage, roughly add 0.02 PPP to away offensive possessions. (Future: FT%, 3PT%, rubber band adjustments)
4. Replace players that played less than 1% of the teams possessions with a single id and filter out the stints which only contain these players.
5. Double and flip each stint to calculate offensive and defensive RAPM separately. Remove the replacement level players with the single id and use the value -0.02. Recompute the PPP of each stint by subtracting the league average PPP and clipping to [-3, 3].
6. Sparse L2 regression with sample weight = num possessions of stint * season weight. (Season weight: 2021-2022=3/6, 2020-2021=2/6, 2019-2020=1/6 to weight recent seasons more). Tried alpha=500,1000,1500
7. Print out and sanity check (alpha=1000): https://docs.google.com/spreadsheets/u/ ... MP/pubhtml
The biggest issue seems to be the scale of Defensive RAPM vs Offensive RAPM is very sensitive to the league average. I used average of stints PPP ~= 1.13 but the ratio of defensive/offensive RAPM of the top players is roughly 0.04/0.02. The average ORAPM is -0.025 while the average DRAPM is 0.025. If I switch to a league average of 1.1, the offensive numbers start to look better. Should I be enforcing the mean of ORAPM to be 0 and DRAPM to be 0 (i.e. add 0.025 to ORAPM and subtract from DRAPM)?
With respect to the rankings, it seems ok-ish but Nikola Jokic and Lebron James and James Harden are probably too low and Gobert, Quickly and Muscala are too high.
Thanks for taking the time to read this! I thought I'd take the time to compute RAPM myself for fun and I wanted to get some feedback and sanity checking. My methodology is as below:
1. Download NBA play by play data, using defensive rebound, turnover, end of clock, made shots to split possessions and convert to stints. Tried to be careful here but there's definitely some cases I didn't handle.
2. Filter out 0 possession stints and non-NBA teams.
3. Adjust for home-away advantage, roughly add 0.02 PPP to away offensive possessions. (Future: FT%, 3PT%, rubber band adjustments)
4. Replace players that played less than 1% of the teams possessions with a single id and filter out the stints which only contain these players.
5. Double and flip each stint to calculate offensive and defensive RAPM separately. Remove the replacement level players with the single id and use the value -0.02. Recompute the PPP of each stint by subtracting the league average PPP and clipping to [-3, 3].
6. Sparse L2 regression with sample weight = num possessions of stint * season weight. (Season weight: 2021-2022=3/6, 2020-2021=2/6, 2019-2020=1/6 to weight recent seasons more). Tried alpha=500,1000,1500
7. Print out and sanity check (alpha=1000): https://docs.google.com/spreadsheets/u/ ... MP/pubhtml
The biggest issue seems to be the scale of Defensive RAPM vs Offensive RAPM is very sensitive to the league average. I used average of stints PPP ~= 1.13 but the ratio of defensive/offensive RAPM of the top players is roughly 0.04/0.02. The average ORAPM is -0.025 while the average DRAPM is 0.025. If I switch to a league average of 1.1, the offensive numbers start to look better. Should I be enforcing the mean of ORAPM to be 0 and DRAPM to be 0 (i.e. add 0.025 to ORAPM and subtract from DRAPM)?
With respect to the rankings, it seems ok-ish but Nikola Jokic and Lebron James and James Harden are probably too low and Gobert, Quickly and Muscala are too high.