Calculating separate offensive / defensive RAPM?
Calculating separate offensive / defensive RAPM?
I understand how to calculate RAPM as a single value but I was wondering how to compute separate OFF/DEF RAPM?
Detailed explanation would be appreciated
Detailed explanation would be appreciated
Re: Calculating separate offensive / defensive RAPM?
This thread may be a good starting point. I believe there are multiple approaches: http://www.apbr.org/metrics/viewtopic.p ... 935#p29935
Re: Calculating separate offensive / defensive RAPM?
Thanks for that link
Do you need to run two separate regressions? I don't see how the code in that post would yield both offensive and defensive coefficients, but is that the case?
Also just to be sure, should the defensive player number be -1 and not 1?
Okay now I understand how the data needs to be prepared for this. But how do you actually get the O-RAPM and D-RAPM results?permaximum wrote: ↑Mon Mar 20, 2017 7:11 amWhile converting the data to regression-ready format; you have to double each matchup, compaletely reverse the lineups and only use offensive ratings of each lineup that's attacking, as the response. You're simply doubling the variables.nbacouchside wrote:perma, how did you get the splits for ORAPM and DRAPM? I was able to run your code with some modifications on basketballvalue's 2011 data (no playoffs), but it produced all-in RAPM, not O and D split.
E,g:
O stands for Offense
D stands for Defense
Numbers is Player ID
O1, O2, O3, O4, O5, D6, D7, D8, D9, D10 - Offensive Rating (e,g: 500)
O6, O7, O8, O8, O10, D1, D2, D3, D4, D5 - Offensive Rating (e,g: 400)
Do you need to run two separate regressions? I don't see how the code in that post would yield both offensive and defensive coefficients, but is that the case?
Also just to be sure, should the defensive player number be -1 and not 1?
Re: Calculating separate offensive / defensive RAPM?
viewtopic.php?p=15739#p15739
found this old explanation
does it matter whether you put -1 or 1 for the defensive players?
found this old explanation
so you have a data set with, say 500 players, you'd need 1000 explanatory columns (each player having two columns, one for offense and one for defense)..? and then you'd get 1000 coefficients... i think i'm understanding this now.You have it right, you break down the 5-on-5 into offence vs defence situations (so your dependent variable is how many points the team on offence scored, and your explanatory variables are your five guys on offence against your five guys on defence - so each player in the regression must have an offence and defence variable). However, all we care about is the marginal value (value above average) of these players, so in order to only have player variables as marginal you must include an intercept term in your regression (as well as an HCA term) in order to swallow up the average efficiency. There are actually very sound numerical reasons for including an intercept in this particular method, but I won't get into that. Suffice to say that introducing an intercept is the way to go.
does it matter whether you put -1 or 1 for the defensive players?
Re: Calculating separate offensive / defensive RAPM?
It seems to me that you just have to be consistent with the sign. The resulting values will just flip signs depending on your choice.
Re: Calculating separate offensive / defensive RAPM?
I followed the steps but was not successful. Overall RAPM analysis is no problem and I get those results. But I can not get seperate ORAPM and DRAPM values. Can somebody tell me why?
My analysis for overall RAPM with R looks like this:
HomeID1, HomeID2, HomeID3, HomeID4, HomeID5, AwayID6, AwayID7, AwayID8, AwayID9, AwayID10, Margin, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, 13, 9
....
Margin is just Home ORtg - Away ORtg.
Then I try to set up the data as described in the threads here. In comparison to the setup above, this time I have doubled the variables so every player has an Offensive and Defensive variable. Also the number of observations doubled, because for every lineup, the ORtg is needed for Home and Away. As it was mentioned that an intercept is needed, I also involved the average ORtg like this: ORtg = MyMeasuredORtg - LeagueAverageORtg
O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, ORtg, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -5.7, 13
-1, -1, -1, -1, -1, 1, 1, 1, 1, 1, +15.3, 13
....
Again, I just do not get good results whatsoever. Can somebody explain what am I doing wrong?
My analysis for overall RAPM with R looks like this:
HomeID1, HomeID2, HomeID3, HomeID4, HomeID5, AwayID6, AwayID7, AwayID8, AwayID9, AwayID10, Margin, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, 13, 9
....
Margin is just Home ORtg - Away ORtg.
Then I try to set up the data as described in the threads here. In comparison to the setup above, this time I have doubled the variables so every player has an Offensive and Defensive variable. Also the number of observations doubled, because for every lineup, the ORtg is needed for Home and Away. As it was mentioned that an intercept is needed, I also involved the average ORtg like this: ORtg = MyMeasuredORtg - LeagueAverageORtg
O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, ORtg, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -5.7, 13
-1, -1, -1, -1, -1, 1, 1, 1, 1, 1, +15.3, 13
....
Again, I just do not get good results whatsoever. Can somebody explain what am I doing wrong?
Re: Calculating separate offensive / defensive RAPM?
I suspect it may not be set up quite right.
So, for every stint, you have 2 observations--one for each team being on offense.
O_P1 + O_P2 + O_P3 + O_P4 + O_P5 - D_P6 - D_P7 - D_P8 - D_P9 - D_P10 = O_Tm1Rtg (Wt = Poss)
O_P6 + O_P7 + O_P8 + O_P9 + O_P10 - D_P1 - D_P2 - D_P3 - D_P4 - D_P5 = O_Tm2Rtg (Wt = Poss)
Where P1 through P5 are on Tm1 and P6 through P10 are on Tm2.
Does that look like what you are doing? Then, from that point, it should be exactly the like the basic RAPM.
So, for every stint, you have 2 observations--one for each team being on offense.
O_P1 + O_P2 + O_P3 + O_P4 + O_P5 - D_P6 - D_P7 - D_P8 - D_P9 - D_P10 = O_Tm1Rtg (Wt = Poss)
O_P6 + O_P7 + O_P8 + O_P9 + O_P10 - D_P1 - D_P2 - D_P3 - D_P4 - D_P5 = O_Tm2Rtg (Wt = Poss)
Where P1 through P5 are on Tm1 and P6 through P10 are on Tm2.
Does that look like what you are doing? Then, from that point, it should be exactly the like the basic RAPM.
Re: Calculating separate offensive / defensive RAPM?
Thanks for your reply. Unfortunately, my setup was just as you described. I just did a bad job to get the syntay right in my last post.
And I still struggle to solve this problem. While my overall Vanilla RAPM values are pretty similiar to those of other pages (with i.e. 5.3 as the top value for Rudy Gobert), my ORAPM/DRAPM values estimated by this formula are much lower. O.Gobert is here with -0.13 and D.Gobert would be at 0.45. That's not an outlier. No D or O player has a value above 1.8 basically.
What could I have done wrong?
And I still struggle to solve this problem. While my overall Vanilla RAPM values are pretty similiar to those of other pages (with i.e. 5.3 as the top value for Rudy Gobert), my ORAPM/DRAPM values estimated by this formula are much lower. O.Gobert is here with -0.13 and D.Gobert would be at 0.45. That's not an outlier. No D or O player has a value above 1.8 basically.
What could I have done wrong?
Re: Calculating separate offensive / defensive RAPM?
It looks as though you are including the number of possessions as an explanatory variable? If so, don't do that. Scale all stints to 100 possessions, and use the number of possessions actually in that stint as the weight for that stint in the regression. I also suggest you don't adjust anything by LeagueAverageORtg, and instead include an intercept in the regression to capture that.apophain wrote: ↑Fri Oct 22, 2021 2:59 pm I followed the steps but was not successful. Overall RAPM analysis is no problem and I get those results. But I can not get seperate ORAPM and DRAPM values. Can somebody tell me why?
My analysis for overall RAPM with R looks like this:
HomeID1, HomeID2, HomeID3, HomeID4, HomeID5, AwayID6, AwayID7, AwayID8, AwayID9, AwayID10, Margin, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, 13, 9
....
Margin is just Home ORtg - Away ORtg.
Then I try to set up the data as described in the threads here. In comparison to the setup above, this time I have doubled the variables so every player has an Offensive and Defensive variable. Also the number of observations doubled, because for every lineup, the ORtg is needed for Home and Away. As it was mentioned that an intercept is needed, I also involved the average ORtg like this: ORtg = MyMeasuredORtg - LeagueAverageORtg
O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, ORtg, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -5.7, 13
-1, -1, -1, -1, -1, 1, 1, 1, 1, 1, +15.3, 13
....
Again, I just do not get good results whatsoever. Can somebody explain what am I doing wrong?
Re: Calculating separate offensive / defensive RAPM?
Actually, I am using the possessions as the weight. The dependent variable is already per100, it is simply the ORtg.
I tried to take your advice regarding the intercept but am struggling so far. I get the idea behind using the league average as the intercept, but I really don't know how to include an intercept in a Ridge regression with R.
Edit: okay, intercept=TRUE has done the trick, but my results are exactly the same with or without the intercept. Just checked it and it seems as R runs the intercept by default. Still I have rather low numbers (-2<X<2) which do not add up to the overall RAPM values.
I tried to take your advice regarding the intercept but am struggling so far. I get the idea behind using the league average as the intercept, but I really don't know how to include an intercept in a Ridge regression with R.
Edit: okay, intercept=TRUE has done the trick, but my results are exactly the same with or without the intercept. Just checked it and it seems as R runs the intercept by default. Still I have rather low numbers (-2<X<2) which do not add up to the overall RAPM values.
Re: Calculating separate offensive / defensive RAPM?
I have just noticed this... this formulation as stated doesn't look right.
For this regression your value for the offensive dummy variables should always be 1, and your values for the defensive dummy variables should always be -1. This might just be an artefact of how you wrote this out, but I'd like to help so I'm pursuing all avenues.
If it helps paste your code. I don't use R, but it's syntactically very simple so I should be able to grok.
Re: Calculating separate offensive / defensive RAPM?
Unbelievable. I really messed it up when I wrote here how I set everything up. That's not how I did it. The way I did it would be like:
O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, O_ID6, O_ID7, O_ID8, O_ID9, O_ID10, ||| D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, D_ID6, D_ID7, D_ID8, D_ID9, D_ID10,||| ORtg, Poss
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, ||| 0, 0, 0, 0, 0, -1, -1, -1, -1, -1, |||-5.7, 13
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, |||-1, -1, -1, -1, -1, 0, 0, 0, 0, 0, |||+15.3, 13
Based on the explanations in this thread this should be correct, right? Well, in this case the ORtg is still relative to the league average, but that can be fixed quickly.
From the table above I remove the ORtg and Possession columns such that we have only the O_IDx and D_IDx columns. This matrix with the 0, -1 and 1s is called "AllFinalORAPM2021", while the vector for ORtg is called "OffPer100" and the vector for the possessions is called "Possessions".
x=data.matrix(AllFinalORAPM2021)
lambda=cv.glmnet(x, OffPer100,weights= Possessions,nfolds=5, standardize=FALSE, alpha = 0, intercept=TRUE)
lambda.min=lambda$lambda.min
ridge=glmnet(x, OffPer100,family=c("gaussian"), Possessions,alpha=0,lambda=lambda.min, standardize=FALSE, intercept=TRUE)
coef(ridge,s=lambda.min)
Again, this is the same exact code I used for the overall RAPM which works very good. Thanks for your help, I appreciate it.
O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, O_ID6, O_ID7, O_ID8, O_ID9, O_ID10, ||| D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, D_ID6, D_ID7, D_ID8, D_ID9, D_ID10,||| ORtg, Poss
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, ||| 0, 0, 0, 0, 0, -1, -1, -1, -1, -1, |||-5.7, 13
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, |||-1, -1, -1, -1, -1, 0, 0, 0, 0, 0, |||+15.3, 13
Based on the explanations in this thread this should be correct, right? Well, in this case the ORtg is still relative to the league average, but that can be fixed quickly.
From the table above I remove the ORtg and Possession columns such that we have only the O_IDx and D_IDx columns. This matrix with the 0, -1 and 1s is called "AllFinalORAPM2021", while the vector for ORtg is called "OffPer100" and the vector for the possessions is called "Possessions".
x=data.matrix(AllFinalORAPM2021)
lambda=cv.glmnet(x, OffPer100,weights= Possessions,nfolds=5, standardize=FALSE, alpha = 0, intercept=TRUE)
lambda.min=lambda$lambda.min
ridge=glmnet(x, OffPer100,family=c("gaussian"), Possessions,alpha=0,lambda=lambda.min, standardize=FALSE, intercept=TRUE)
coef(ridge,s=lambda.min)
Again, this is the same exact code I used for the overall RAPM which works very good. Thanks for your help, I appreciate it.
Re: Calculating separate offensive / defensive RAPM?
Yeah that looks as I would set it up, data wise.
I can't immediately spot the issue, my current guess is that maybe cv.glmnet is doing a poor job of optimising lambda by choosing a sequence that doesn't get close to the optimal value. I haven't ever used it, so I can't really say how good it is at optimising when it also has to choose the sequence. I would personally be tempted to play around with smaller values of lambda and see whether the data 'looks' more like what you expect, at which point I would feed cv.glmnet a sequence including a range spanning from the lambda that 'looks' right (results wise), to the lambda cv.glmnet selects on its own without being fed a sequence. If then the cv.glmnet still selects a value giving the same sort of results as you originally get, then that would probably point to a data setup issue. If instead it selects a lambda closer to what you fed it that 'looks' right, then that suggests you will need to feed the lambda sequence to cv.glmnet as it is doing a poor job on its own.
But, I don't use any of that, so all of this is just how I would go about troubleshooting when playing with a new toy.
I can't immediately spot the issue, my current guess is that maybe cv.glmnet is doing a poor job of optimising lambda by choosing a sequence that doesn't get close to the optimal value. I haven't ever used it, so I can't really say how good it is at optimising when it also has to choose the sequence. I would personally be tempted to play around with smaller values of lambda and see whether the data 'looks' more like what you expect, at which point I would feed cv.glmnet a sequence including a range spanning from the lambda that 'looks' right (results wise), to the lambda cv.glmnet selects on its own without being fed a sequence. If then the cv.glmnet still selects a value giving the same sort of results as you originally get, then that would probably point to a data setup issue. If instead it selects a lambda closer to what you fed it that 'looks' right, then that suggests you will need to feed the lambda sequence to cv.glmnet as it is doing a poor job on its own.
But, I don't use any of that, so all of this is just how I would go about troubleshooting when playing with a new toy.
Re: Calculating separate offensive / defensive RAPM?
Well, lowering the lambda does definitely lead to better looking results. Instead of -2 to +2, I can get up to i.e. -5 to +5. But still most values seem to be off. I mentioned the Gobert example before. In a overall RAPM analysis he has the highest RAPM. But analyzing Offensive and Defensive RAPM, he is barely positive for ORAPM and even slightly negative for DRAPM. Those are contradicting results that are not making sense. To me this is suggesting that either my data setup is somehow wrong or the code I sent yesterday has some flaws.