Calculating separate offensive / defensive RAPM?

JJ33 · Post by **JJ33** » Wed Sep 08, 2021 9:00 pm

I understand how to calculate RAPM as a single value but I was wondering how to compute separate OFF/DEF RAPM?

Detailed explanation would be appreciated

DSMok1 · Post by **DSMok1** » Thu Sep 09, 2021 12:02 pm

This thread may be a good starting point. I believe there are multiple approaches: http://www.apbr.org/metrics/viewtopic.p ... 935#p29935

JJ33 · Post by **JJ33** » Thu Sep 09, 2021 12:39 pm

Thanks for that link

permaximum wrote: ↑Mon Mar 20, 2017 7:11 am
nbacouchside wrote:perma, how did you get the splits for ORAPM and DRAPM? I was able to run your code with some modifications on basketballvalue's 2011 data (no playoffs), but it produced all-in RAPM, not O and D split.
While converting the data to regression-ready format; you have to double each matchup, compaletely reverse the lineups and only use offensive ratings of each lineup that's attacking, as the response. You're simply doubling the variables.
E,g:
O stands for Offense
D stands for Defense
Numbers is Player ID
O1, O2, O3, O4, O5, D6, D7, D8, D9, D10 - Offensive Rating (e,g: 500)
O6, O7, O8, O8, O10, D1, D2, D3, D4, D5 - Offensive Rating (e,g: 400)

Okay now I understand how the data needs to be prepared for this. But how do you actually get the O-RAPM and D-RAPM results?

Do you need to run two separate regressions? I don't see how the code in that post would yield both offensive and defensive coefficients, but is that the case?

Also just to be sure, should the defensive player number be -1 and not 1?

JJ33 · Post by **JJ33** » Thu Sep 09, 2021 9:13 pm

viewtopic.php?p=15739#p15739

found this old explanation

You have it right, you break down the 5-on-5 into offence vs defence situations (so your dependent variable is how many points the team on offence scored, and your explanatory variables are your five guys on offence against your five guys on defence - so each player in the regression must have an offence and defence variable). However, all we care about is the marginal value (value above average) of these players, so in order to only have player variables as marginal you must include an intercept term in your regression (as well as an HCA term) in order to swallow up the average efficiency. There are actually very sound numerical reasons for including an intercept in this particular method, but I won't get into that. Suffice to say that introducing an intercept is the way to go.

so you have a data set with, say 500 players, you'd need 1000 explanatory columns (each player having two columns, one for offense and one for defense)..? and then you'd get 1000 coefficients... i think i'm understanding this now.

does it matter whether you put -1 or 1 for the defensive players?

DSMok1 · Post by **DSMok1** » Fri Sep 10, 2021 3:30 pm

It seems to me that you just have to be consistent with the sign. The resulting values will just flip signs depending on your choice.

apophain · Post by **apophain** » Fri Oct 22, 2021 2:59 pm

I followed the steps but was not successful. Overall RAPM analysis is no problem and I get those results. But I can not get seperate ORAPM and DRAPM values. Can somebody tell me why?
My analysis for overall RAPM with R looks like this:
HomeID1, HomeID2, HomeID3, HomeID4, HomeID5, AwayID6, AwayID7, AwayID8, AwayID9, AwayID10, Margin, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, 13, 9
....

Margin is just Home ORtg - Away ORtg.

Then I try to set up the data as described in the threads here. In comparison to the setup above, this time I have doubled the variables so every player has an Offensive and Defensive variable. Also the number of observations doubled, because for every lineup, the ORtg is needed for Home and Away. As it was mentioned that an intercept is needed, I also involved the average ORtg like this: ORtg = MyMeasuredORtg - LeagueAverageORtg
O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, ORtg, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -5.7, 13
-1, -1, -1, -1, -1, 1, 1, 1, 1, 1, +15.3, 13
....

Again, I just do not get good results whatsoever. Can somebody explain what am I doing wrong?

DSMok1 · Post by **DSMok1** » Fri Oct 22, 2021 4:41 pm

I suspect it may not be set up quite right.

So, for every stint, you have 2 observations--one for each team being on offense.

O_P1 + O_P2 + O_P3 + O_P4 + O_P5 - D_P6 - D_P7 - D_P8 - D_P9 - D_P10 = O_Tm1Rtg (Wt = Poss)
O_P6 + O_P7 + O_P8 + O_P9 + O_P10 - D_P1 - D_P2 - D_P3 - D_P4 - D_P5 = O_Tm2Rtg (Wt = Poss)

Where P1 through P5 are on Tm1 and P6 through P10 are on Tm2.

Does that look like what you are doing? Then, from that point, it should be exactly the like the basic RAPM.

apophain · Post by **apophain** » Tue Oct 26, 2021 7:26 am

Thanks for your reply. Unfortunately, my setup was just as you described. I just did a bad job to get the syntay right in my last post.

And I still struggle to solve this problem. While my overall Vanilla RAPM values are pretty similiar to those of other pages (with i.e. 5.3 as the top value for Rudy Gobert), my ORAPM/DRAPM values estimated by this formula are much lower. O.Gobert is here with -0.13 and D.Gobert would be at 0.45. That's not an outlier. No D or O player has a value above 1.8 basically.

What could I have done wrong?

v-zero · Post by **v-zero** » Tue Oct 26, 2021 9:00 am

apophain wrote: ↑Fri Oct 22, 2021 2:59 pm I followed the steps but was not successful. Overall RAPM analysis is no problem and I get those results. But I can not get seperate ORAPM and DRAPM values. Can somebody tell me why?
My analysis for overall RAPM with R looks like this:
HomeID1, HomeID2, HomeID3, HomeID4, HomeID5, AwayID6, AwayID7, AwayID8, AwayID9, AwayID10, Margin, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, 13, 9
....

Margin is just Home ORtg - Away ORtg.

Then I try to set up the data as described in the threads here. In comparison to the setup above, this time I have doubled the variables so every player has an Offensive and Defensive variable. Also the number of observations doubled, because for every lineup, the ORtg is needed for Home and Away. As it was mentioned that an intercept is needed, I also involved the average ORtg like this: ORtg = MyMeasuredORtg - LeagueAverageORtg
O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, ORtg, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -5.7, 13
-1, -1, -1, -1, -1, 1, 1, 1, 1, 1, +15.3, 13
....

Again, I just do not get good results whatsoever. Can somebody explain what am I doing wrong?

It looks as though you are including the number of possessions as an explanatory variable? If so, don't do that. Scale all stints to 100 possessions, and use the number of possessions actually in that stint as the weight for that stint in the regression. I also suggest you don't adjust anything by LeagueAverageORtg, and instead include an intercept in the regression to capture that.

apophain · Post by **apophain** » Tue Oct 26, 2021 10:05 am

Actually, I am using the possessions as the weight. The dependent variable is already per100, it is simply the ORtg.

I tried to take your advice regarding the intercept but am struggling so far. I get the idea behind using the league average as the intercept, but I really don't know how to include an intercept in a Ridge regression with R.

Edit: okay, intercept=TRUE has done the trick, but my results are exactly the same with or without the intercept. Just checked it and it seems as R runs the intercept by default. Still I have rather low numbers (-2<X<2) which do not add up to the overall RAPM values.

v-zero · Post by **v-zero** » Tue Oct 26, 2021 11:04 am

apophain wrote: ↑Fri Oct 22, 2021 2:59 pm O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, ORtg, Poss
1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -5.7, 13
-1, -1, -1, -1, -1, 1, 1, 1, 1, 1, +15.3, 13

I have just noticed this... this formulation as stated doesn't look right.

For this regression your value for the offensive dummy variables should always be 1, and your values for the defensive dummy variables should always be -1. This might just be an artefact of how you wrote this out, but I'd like to help so I'm pursuing all avenues.

If it helps paste your code. I don't use R, but it's syntactically very simple so I should be able to grok.

apophain · Post by **apophain** » Tue Oct 26, 2021 12:39 pm

Unbelievable. I really messed it up when I wrote here how I set everything up. That's not how I did it. The way I did it would be like:
O_ID1, O_ID2, O_ID3, O_ID4, O_ID5, O_ID6, O_ID7, O_ID8, O_ID9, O_ID10, ||| D_ID1, D_ID2, D_ID3, D_ID4, D_ID5, D_ID6, D_ID7, D_ID8, D_ID9, D_ID10,||| ORtg, Poss
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, ||| 0, 0, 0, 0, 0, -1, -1, -1, -1, -1, |||-5.7, 13
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, |||-1, -1, -1, -1, -1, 0, 0, 0, 0, 0, |||+15.3, 13

Based on the explanations in this thread this should be correct, right? Well, in this case the ORtg is still relative to the league average, but that can be fixed quickly.
From the table above I remove the ORtg and Possession columns such that we have only the O_IDx and D_IDx columns. This matrix with the 0, -1 and 1s is called "AllFinalORAPM2021", while the vector for ORtg is called "OffPer100" and the vector for the possessions is called "Possessions".

x=data.matrix(AllFinalORAPM2021)
lambda=cv.glmnet(x, OffPer100,weights= Possessions,nfolds=5, standardize=FALSE, alpha = 0, intercept=TRUE)
lambda.min=lambda$lambda.min
ridge=glmnet(x, OffPer100,family=c("gaussian"), Possessions,alpha=0,lambda=lambda.min, standardize=FALSE, intercept=TRUE)
coef(ridge,s=lambda.min)

Again, this is the same exact code I used for the overall RAPM which works very good. Thanks for your help, I appreciate it.

v-zero · Post by **v-zero** » Tue Oct 26, 2021 1:58 pm

Yeah that looks as I would set it up, data wise.

I can't immediately spot the issue, my current guess is that maybe cv.glmnet is doing a poor job of optimising lambda by choosing a sequence that doesn't get close to the optimal value. I haven't ever used it, so I can't really say how good it is at optimising when it also has to choose the sequence. I would personally be tempted to play around with smaller values of lambda and see whether the data 'looks' more like what you expect, at which point I would feed cv.glmnet a sequence including a range spanning from the lambda that 'looks' right (results wise), to the lambda cv.glmnet selects on its own without being fed a sequence. If then the cv.glmnet still selects a value giving the same sort of results as you originally get, then that would probably point to a data setup issue. If instead it selects a lambda closer to what you fed it that 'looks' right, then that suggests you will need to feed the lambda sequence to cv.glmnet as it is doing a poor job on its own.

But, I don't use any of that, so all of this is just how I would go about troubleshooting when playing with a new toy.

apophain · Post by **apophain** » Wed Oct 27, 2021 9:19 am

Well, lowering the lambda does definitely lead to better looking results. Instead of -2 to +2, I can get up to i.e. -5 to +5. But still most values seem to be off. I mentioned the Gobert example before. In a overall RAPM analysis he has the highest RAPM. But analyzing Offensive and Defensive RAPM, he is barely positive for ORAPM and even slightly negative for DRAPM. Those are contradicting results that are not making sense. To me this is suggesting that either my data setup is somehow wrong or the code I sent yesterday has some flaws.

APBRmetrics

Calculating separate offensive / defensive RAPM?

Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?

Re: Calculating separate offensive / defensive RAPM?