RAPM metric advice!

RowRowFan · Post by **RowRowFan** » Fri Feb 09, 2024 1:12 am

Hi!

I was interesting in playing around with RAPM and creating one of those fancy all in ones with it, but I was just wondering a few questions when it came to RAPM.

I have a play by play data set, and right now have it set when it comes to possessions as
- Getting an offensive rebound off a missed free throw is a continuation of the current possession as long as the same 10 players are on the floor, where then it is a change
- If players on the floor change mid possession, its just who finished the possession
- Offensive rebounds in general are continuations of teh current possession.

I got the play by play data from bigdataball, and am pretty happy with my raw RAPM results as they somewhat match the ones that used to be over at shotcharts and are available for download.

I was wondering some things in terms of how to approach
DSMok told me i should mention which github i used, i used this to calculate RAPM, bigdataball had the possession data already and converting that into a possession matrix wasn't too much but I used this for the actual RAPM calculation

https://github.com/rd11490/NBA_Tutorial ... aster/rapm

So i want to look into doing something similar to LEBRON (not thinking of doing some huge undertaking this is a for fun type of thing), luck adjusting, box score priors based on position (perhaps taking inspiration from DSMoks 2d position post actually), perhaps an age curve since Im doing this for college basketball and I could see how that could help.

My only problem is im not sure how to approach this.

With luck adjustments, free throws are simple enough, but offense and defense and 3 point shooting is where it gets confusing to me.

My current assumption is to player average 3 point shooting and apply that to their individual possessions, which is simple enough on defense, but when it comes to offense im not sure how to approach making it so every players 3 point shooting is the average of their on court 3pt% with player X on, and their normal 3pt%, for every player in the league.

Beyond that, im not sure how to approach box score priors and how that works, as I haven't found much online yet that explains it.

rjb2 · Post by **rjb2** » Fri Feb 09, 2024 2:08 am

The prior is usually box stats regressed against RAPM. The way that I've seen prior informed RAPM calculated here, as well as other places, is to create an expected score margin from the linear combination of player priors on the floor. You then subtract the expectation from the actual results and that is your response vector. After running the ridge regression you then add back the player prior values to their estimates from the regression. When I do it I just follow the code from here (not my repository). https://github.com/colintj/RAPM/blob/ma ... pm_prior.R

RowRowFan · Post by **RowRowFan** » Fri Feb 09, 2024 2:18 am

rjb2 wrote: ↑Fri Feb 09, 2024 2:08 am The prior is usually box stats regressed against RAPM. The way that I've seen prior informed RAPM calculated here, as well as other places, is to create an expected score margin from the linear combination of player priors on the floor. You then subtract the expectation from the actual results and that is your response vector. After running the ridge regression you then add back the player prior values to their estimates from the regression. When I do it I just follow the code from here (not my repository). https://github.com/colintj/RAPM/blob/ma ... pm_prior.R

Thank you for that!

Did what I say about luck adjustments seem like it would make sense, or would that have pitfalls. I think right now im just going to keep it un-luck adjusted on offense and luck adjust it for defensive values and put them together (not including free throws of course)

rjb2 · Post by **rjb2** » Fri Feb 09, 2024 2:30 am

RowRowFan wrote: ↑Fri Feb 09, 2024 2:18 am
rjb2 wrote: ↑Fri Feb 09, 2024 2:08 am The prior is usually box stats regressed against RAPM. The way that I've seen prior informed RAPM calculated here, as well as other places, is to create an expected score margin from the linear combination of player priors on the floor. You then subtract the expectation from the actual results and that is your response vector. After running the ridge regression you then add back the player prior values to their estimates from the regression. When I do it I just follow the code from here (not my repository). https://github.com/colintj/RAPM/blob/ma ... pm_prior.R
Thank you for that!

Did what I say about luck adjustments seem like it would make sense, or would that have pitfalls. I think right now im just going to keep it un-luck adjusted on offense and luck adjust it for defensive values and put them together (not including free throws of course)

I personally have not experimented with luck adjustments. If I were to include them I would definitely include free throws but not sure about 3 pointers. Some teams like Houston for example do seem to have a measurable effect in reducing 3 point percentage. Also JE has said that it doesn't lead to improved performance.

RowRowFan · Post by **RowRowFan** » Sun Feb 11, 2024 1:32 am

rjb2 wrote: ↑Fri Feb 09, 2024 2:30 am
RowRowFan wrote: ↑Fri Feb 09, 2024 2:18 am
rjb2 wrote: ↑Fri Feb 09, 2024 2:08 am The prior is usually box stats regressed against RAPM. The way that I've seen prior informed RAPM calculated here, as well as other places, is to create an expected score margin from the linear combination of player priors on the floor. You then subtract the expectation from the actual results and that is your response vector. After running the ridge regression you then add back the player prior values to their estimates from the regression. When I do it I just follow the code from here (not my repository). https://github.com/colintj/RAPM/blob/ma ... pm_prior.R
Thank you for that!

Did what I say about luck adjustments seem like it would make sense, or would that have pitfalls. I think right now im just going to keep it un-luck adjusted on offense and luck adjust it for defensive values and put them together (not including free throws of course)
I personally have not experimented with luck adjustments. If I were to include them I would definitely include free throws but not sure about 3 pointers. Some teams like Houston for example do seem to have a measurable effect in reducing 3 point percentage. Also JE has said that it doesn't lead to improved performance.

Also when it comes to making box score priors, is the reasons something like TS% isnt used because of issues with 2P% and 3P% (Which are kind of built in just because i know 2p, 2pa and 3p, 3pa are all built in) being indirect components of it right

One other thing, I’m more interested in the idea of projecting individual impact forward rather than descriptive, so another thing I’m thinking is making it multiyear RAPM with more weight towards recent years along with box score priors with it.

Would PI RAPM have the previous years RAPM as the prior to regress to, while multiyear RAPM would be having more weight to recent years by, like maybe duplicating the year in the dataset or am I thinking about it wrong

rjb2 · Post by **rjb2** » Sun Feb 11, 2024 4:25 am

RowRowFan wrote: ↑Sun Feb 11, 2024 1:32 am
rjb2 wrote: ↑Fri Feb 09, 2024 2:30 am
RowRowFan wrote: ↑Fri Feb 09, 2024 2:18 am

Thank you for that!

Did what I say about luck adjustments seem like it would make sense, or would that have pitfalls. I think right now im just going to keep it un-luck adjusted on offense and luck adjust it for defensive values and put them together (not including free throws of course)
I personally have not experimented with luck adjustments. If I were to include them I would definitely include free throws but not sure about 3 pointers. Some teams like Houston for example do seem to have a measurable effect in reducing 3 point percentage. Also JE has said that it doesn't lead to improved performance.
Also when it comes to making box score priors, is the reasons something like TS% isnt used because of issues with 2P% and 3P% (Which are kind of built in just because i know 2p, 2pa and 3p, 3pa are all built in) being indirect components of it right

One other thing, I’m more interested in the idea of projecting individual impact forward rather than descriptive, so another thing I’m thinking is making it multiyear RAPM with more weight towards recent years along with box score priors with it.

Would PI RAPM have the previous years RAPM as the prior to regress to, while multiyear RAPM would be having more weight to recent years by, like maybe duplicating the year in the dataset or am I thinking about it wrong

Most public box score priors/SPM's that I've seen just use 2p, 2pa, etc... without terms like TS% because of redundancy. For the multiyear "nowcast" RAPM, if the software for ridge regression that you are using allows the use of a weight vector (which you can do in glmnet and sklearn), you could just use that to weigh data from previous years less. For the PI RAPM, I haven't really explored a version of it with predictive box priors.

RowRowFan · Post by **RowRowFan** » Wed Feb 14, 2024 2:36 am

rjb2 wrote: ↑Sun Feb 11, 2024 4:25 am
RowRowFan wrote: ↑Sun Feb 11, 2024 1:32 am
rjb2 wrote: ↑Fri Feb 09, 2024 2:30 am

I personally have not experimented with luck adjustments. If I were to include them I would definitely include free throws but not sure about 3 pointers. Some teams like Houston for example do seem to have a measurable effect in reducing 3 point percentage. Also JE has said that it doesn't lead to improved performance.
Also when it comes to making box score priors, is the reasons something like TS% isnt used because of issues with 2P% and 3P% (Which are kind of built in just because i know 2p, 2pa and 3p, 3pa are all built in) being indirect components of it right

One other thing, I’m more interested in the idea of projecting individual impact forward rather than descriptive, so another thing I’m thinking is making it multiyear RAPM with more weight towards recent years along with box score priors with it.

Would PI RAPM have the previous years RAPM as the prior to regress to, while multiyear RAPM would be having more weight to recent years by, like maybe duplicating the year in the dataset or am I thinking about it wrong
Most public box score priors/SPM's that I've seen just use 2p, 2pa, etc... without terms like TS% because of redundancy. For the multiyear "nowcast" RAPM, if the software for ridge regression that you are using allows the use of a weight vector (which you can do in glmnet and sklearn), you could just use that to weigh data from previous years less. For the PI RAPM, I haven't really explored a version of it with predictive box priors.

So I’ve gotten the actual possessions all sorted now,
And I have RAPM code I’ve used in the past, my only issue now is just with how much college data there is I just can’t seem to even create RAPM.

I switch over to Python to run RAPM since I’ve seen people on here have issues running it with R, but even upgrading to collab pro I can’t seem to have nearly enough space to run a multiyear RAPM on college data, is this to be expected?

v-zero · Post by **v-zero** » Wed Feb 14, 2024 9:36 am

I'm assuming your system isn't able to allocate enough memory during matrix pseudoinversion. That's not at all unusual. I would suggest giving the SGDRegressor in Sklearn a go.

J.E. · Post by **J.E.** » Thu Feb 15, 2024 5:56 am

Also
- of course, use sparse matrizes
- Mac's are better at memory handling with sparse matrizes than Windows/Unix, in case you have a choice
- given it's the number of columns that's creating problems, you could - like in the early APM days - handle all players below a certain minute threshold as the same guy, grouping them all into a single column. Choose a cutoff, run and watch your memory. If it blows up, increase the threshold
- if you're currently running it on single possession basis, you can group the stints without subs into 1 row. But that requires weighing, which a lot of people seem to get confused about

RowRowFan · Post by **RowRowFan** » Thu Feb 15, 2024 5:08 pm

J.E. wrote: ↑Thu Feb 15, 2024 5:56 am Also
- of course, use sparse matrizes
- Mac's are better at memory handling with sparse matrizes than Windows/Unix, in case you have a choice
- given it's the number of columns that's creating problems, you could - like in the early APM days - handle all players below a certain minute threshold as the same guy, grouping them all into a single column. Choose a cutoff, run and watch your memory. If it blows up, increase the threshold
- if you're currently running it on single possession basis, you can group the stints without subs into 1 row. But that requires weighing, which a lot of people seem to get confused about

Thank you so much! What did you mean by your last point, I’m a bit confused by what you mean by that

DSMok1 · Post by **DSMok1** » Thu Feb 15, 2024 6:46 pm

RowRowFan wrote: ↑Thu Feb 15, 2024 5:08 pm
J.E. wrote: ↑Thu Feb 15, 2024 5:56 am .....
- if you're currently running it on single possession basis, you can group the stints without subs into 1 row. But that requires weighing, which a lot of people seem to get confused about
Thank you so much! What did you mean by your last point, I’m a bit confused by what you mean by that

You can either use a row per possession or group rows that have the same lineups and then weight the regression by number of possessions in each row.

RowRowFan · Post by **RowRowFan** » Sun Feb 18, 2024 3:53 pm

Just wanna thank everyone for helping out, I was able to get it to work. Just getting through picking the best Lambdas now, but I really apprecaite the advice everyone gave me here it helped me out a ton

J.E. · Post by **J.E.** » Mon Feb 19, 2024 4:45 am

RowRowFan wrote: ↑Sun Feb 18, 2024 3:53 pm Just wanna thank everyone for helping out, I was able to get it to work. Just getting through picking the best Lambdas now, but I really apprecaite the advice everyone gave me here it helped me out a ton

Arriving at the optimal lambda - while of course somewhat dependent on your data size - should be around a 2 minute thing

Code: Select all

#from sklearn import linear_model
#import numpy as np
y -= np.average(y)
clf = linear_model.RidgeCV(alphas = [1000, 2000, 4000, 8000], fit_intercept = 0, cv = 5) #can choose a higher number for the latter if you have time. Not specifying will lead to it running forever
clf.fit(X, y)
print (clf.alpha_)

should be all you need

Mike G · Post by **Mike G** » Mon Feb 19, 2024 3:54 pm

J.E., the link below your post leads me to:

Error: Page not found
The requested URL was not found on this server.

DSMok1 · Post by **DSMok1** » Mon Feb 19, 2024 5:20 pm

Mike G wrote: ↑Mon Feb 19, 2024 3:54 pm J.E., the link below your post leads me to:

Error: Page not found
The requested URL was not found on this server.

That link hasn't worked for many years, since Jerry got hired by a team.

https://web.archive.org/web/20150317061 ... pspot.com/

APBRmetrics

RAPM metric advice!

RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!

Re: RAPM metric advice!