Did you use per 100 possessions or per game data to determine the coefficients? I have not thought about nor put in anywhere near the amount work you have at this, but it appears to me that per game stats more accurately reflect single year RAPM than per 100 possessions stats or advance stats (other than TS%). I do not know if that remains true for multi-year RAPM.

## Reconstructing Box Plus/Minus

### Re: Reconstructing Box Plus/Minus

### Re: Reconstructing Box Plus/Minus

Contribution is the per 100 possessions GmBPM rating times the percentage of possessions actually played. Points created then translates that to the actual number of possessions played in this game. Based on the pace of the game.Crow wrote: ↑Fri May 17, 2019 1:19 amThese are not the classic 4 Factors but it would be good to regularly offer these 4 components (or some other variation) for GmBPM2 and season level BPM2. Maybe BRef could also offer BPM2 on its splits page. More detail, less case that it is a blackbox hard to interpret. If BPM2 components were divided like the 4 factors it would ease comparison to RPM. DRE and SPR could be useful with splits as well, preferably standardized.

What is the difference between contribution and points generated?

In rough terms and on average, what % of total credit is funneling to players via the team adjustment? Any consideration of doing the team adjustment as something more discriminatory than same for anybody (on court and off when things happen)?

I have not come up with any better way to split up the remaining team adjustment, which includes the intercept of the regression and any unassigned credit as well. It seems that some way could be developed where the post players could get more of the credit or debit for defense and perimeter players less, but I haven't figured out a good way mathematically.

Taking it another direction, it would be very informative and a better approach to actually do this on every stint during the game where the lineups remained the same. That way defense in particular could be better assigned to the players that actually were on the court. Then you could just sum it up at the end of the game to come up with a cumulative rating for that game.

Developer of Box Plus/Minus

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

### Re: Reconstructing Box Plus/Minus

RAPM is denominated in terms of 100 possessions, and I have used per 100 possessions data for all of this GmBPM regression. In theory, that should match up as well as anything I would think.Adam H wrote: ↑Fri May 17, 2019 1:30 am

Did you use per 100 possessions or per game data to determine the coefficients? I have not thought about nor put in anywhere near the amount work you have at this, but it appears to me that per game stats more accurately reflect single year RAPM than per 100 possessions stats or advance stats (other than TS%). I do not know if that remains true for multi-year RAPM.

Developer of Box Plus/Minus

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

### Re: Reconstructing Box Plus/Minus

Have you thought about adding a height variable to the 3 point part of the regression? I would imagine that 3 pointers would be more valuable offensively for big men while not as valuable for smaller players. It could add accuracy so that the proper players are credited for being great 3 point shooters.DSMok1 wrote: ↑Thu May 16, 2019 3:20 pmInteresting GmBPM (that's the linear BPM) evaluation:

What if we let the points from 3 pointers, 2 pointers, and free throws each have a different value? Remember, this regression attributes value to the player AND ALL LEFTOVER VALUE TO THE REST OF THE TEAM. So--maybe some scoring shows more for the individual player, while other scoring is more generic (anybody could do it). Also, some scoring may be more valuable from a spacing perspective.

Does that make sense?

The results from the same GmBPM linear model:

https://docs.google.com/spreadsheets/d/ ... k841J7nXi8

Interestingly, points from free throws are worth the same as before, as are free throw attempts.

2 pointers have less value and 2 point attempts are less penalized. In other words, this regression is indicating 2 pointers don't matter as much to the player.

Conversely, 3 pointers matter considerably more! Made 3 pointers are worth more and missed 3 pointers are penalized more heavily!

This is very interesting to me. What would it look like if we had the data to split out at-rim from 2 point jumpers?

P.S. The rest of the coefficients in the GmBPM regression did not change significantly at all.

P.P.S. This helps elite shooters the most, depresses the value of 2pt scorers and bad 3 point shooters. In other words, it really helps Stephen Curry.

The same could be done for rebounds. Big men who don't rebound are almost always bad defenders. Defensive rebounding could mean more for big men than it does for PG's. A Russell Westbrook defensive rebound is not as valuable as a defensive rebound from a Center.

### Re: Reconstructing Box Plus/Minus

Well, for this linear GmBPM model, I won't be adding a height or any other interactions.

However, for the full new box plus minus model, these things certainly need to be explored further. I am leaning towards using a position indicator rather than height, since I think that is more applicable to what is happening on the court. I also want this regression to be portable to non-NBA context that may not have height available as an input.

However, for the full new box plus minus model, these things certainly need to be explored further. I am leaning towards using a position indicator rather than height, since I think that is more applicable to what is happening on the court. I also want this regression to be portable to non-NBA context that may not have height available as an input.

Developer of Box Plus/Minus

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

### Re: Reconstructing Box Plus/Minus

In an earlier table, you gave these values:...maybe some scoring shows more for the individual player, while other scoring is more generic (anybody could do it). Also, some scoring may be more valuable from a spacing perspective...

3fg 3.11

3fga -0.82

2fg 1.25

2fga -0.48

ft 0.74

fta -0.21

Adding the value of each make to the cost of the attempt, we get these net values for a made shot:

Code: Select all

```
pts value %
3 2.29 0.76
2 0.77 0.39
1 0.53 0.53
```

Should more credit be given to the assist man, than to those who have perhaps set a screen or spaced the floor? Or perhaps done nothing useful?

Is a missed shot just as bad when teams are shooting .640 or .460? Or if they are rebounding especially well?

On a team with few scorers and good rebounding, it seems a low shooting% is not as detrimental. I'm thinking Westbrook/Thunder, where you have a few guys that shoot high% but are not good shot creators; and they get a lot of OReb. Then a 46% shot is not so bad as with the Warriors who shoot 60%.

Different playoff series may require different parameters to successfully assign individual credit. A 60% shooter who doesn't rebound may be a hero in one environment and a pariah in another.

### Re: Reconstructing Box Plus/Minus

Nicely done, Mike!Mike G wrote: ↑Fri May 17, 2019 12:45 pmIn an earlier table, you gave these values:...maybe some scoring shows more for the individual player, while other scoring is more generic (anybody could do it). Also, some scoring may be more valuable from a spacing perspective...

3fg 3.11

3fga -0.82

2fg 1.25

2fga -0.48

ft 0.74

fta -0.21

Adding the value of each make to the cost of the attempt, we get these net values for a made shot:The final column is the fraction of the points attributed to the individual scorer, if I have interpreted this correctly.Code: Select all

`pts value % 3 2.29 0.76 2 0.77 0.39 1 0.53 0.53`

Should more credit be given to the assist man, than to those who have perhaps set a screen or spaced the floor? Or perhaps done nothing useful?

Is a missed shot just as bad when teams are shooting .640 or .460? Or if they are rebounding especially well?

On a team with few scorers and good rebounding, it seems a low shooting% is not as detrimental. I'm thinking Westbrook/Thunder, where you have a few guys that shoot high% but are not good shot creators; and they get a lot of OReb. Then a 46% shot is not so bad as with the Warriors who shoot 60%.

Different playoff series may require different parameters to successfully assign individual credit. A 60% shooter who doesn't rebound may be a hero in one environment and a pariah in another.

The logical thing is absolutely to extend this to assisted vs. unassisted, and also to split 2 pointers into at-rim vs. midrange.

That said, that data is outside of what I want to use for an historical BPM model, since it is only available since around 2000. Lots of directions for research!

Regarding team effects--at a seasonal level, BPM always judges shooting vs. the rest of the shooters on the team. That can't be done within a single game. It wouldn't be stable enough. Stability is an issue here.

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

### Re: Reconstructing Box Plus/Minus

I do it for playoff series. Since you have Team A vs Team B, league averages aren't even relevant as parameters. No additional "team adjustment" is required; team and player credits stabilize in unison.... at a seasonal level, BPM always judges shooting vs. the rest of the shooters on the team. That can't be done within a single game. It wouldn't be stable enough.

Yeah, one game is sometimes bizarre, 2 is much better, and then the operation is pretty smooth.

The big adjustment is with rebounds; apparently there are team rebounds not assigned to any player, but just as important to the teams, as they result in a possession.

Team turnovers are an issue, and assists granted liberally to one or both teams.

You could try estimating assisted % of points. Once you make these corrections, correlations improve.

### Re: Reconstructing Box Plus/Minus

An update on this "GmBPM2" Linear model effort:

I bootstrapped the standard errors of the regression coefficients shown above (the one with shooting from various locations split out) and found that the shooting terms were all highly correlated and had very large standard errors.

I decided to combine fg2a and fg3a, using fga instead. This removes a lot of the issue, without a penalty on the R^2.

The updated GmBPM2 linear model is below. The R^2 is 0.646.

Also included lower in that sheet is an interesting output from the bootstrapping procedure: a table of correlations between the coefficient estimates. It shows how the coefficients vary vs. one another between the various bootstrap iterations of the regression. For example, if my coefficient for FGA goes up, my coefficient for FG2 and FG3 go down. (Which makes sense). More interesting is how some of the other coefficients relate together.

Here is a look at this updated GmBPM2 Linear model Top 50:

I bootstrapped the standard errors of the regression coefficients shown above (the one with shooting from various locations split out) and found that the shooting terms were all highly correlated and had very large standard errors.

I decided to combine fg2a and fg3a, using fga instead. This removes a lot of the issue, without a penalty on the R^2.

The updated GmBPM2 linear model is below. The R^2 is 0.646.

Also included lower in that sheet is an interesting output from the bootstrapping procedure: a table of correlations between the coefficient estimates. It shows how the coefficients vary vs. one another between the various bootstrap iterations of the regression. For example, if my coefficient for FGA goes up, my coefficient for FG2 and FG3 go down. (Which makes sense). More interesting is how some of the other coefficients relate together.

Here is a look at this updated GmBPM2 Linear model Top 50:

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

### Re: Reconstructing Box Plus/Minus

Still no sign of Westbrook '17 : BPM = 15.6

And Harden peaked in 2015 ?

And Harden peaked in 2015 ?

### Re: Reconstructing Box Plus/Minus

I'm sorry, I should have clarified the dates. This is only 1997 through 2016, which is the 20 year sample I am working with.

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

### Re: Reconstructing Box Plus/Minus

Without too many details I think shooting changes are a pretty good idea, as I usually found 3pt spacers to be the most underrated archetype (though BPM handled them better than most box-score stats).

### Re: Reconstructing Box Plus/Minus

DSMok1 wrote: ↑Fri Apr 12, 2019 4:55 pm

- Box score stats only (i.e. anything that can be calculated from the stats we have from the 80s.)
- No PbP stats, not even things like "assisted by" ratios.
- Nothing super complex that can't be done by someone with Excel and a good knowledge of math.
- Focus on Explanation, not Prediction. What happens should be credited to the team. No luck adjustment. (A good explanatory stat can be converted to a predictive stat with appropriate regression to the mean.)

How are you getting on?

Not sure what your 4th goal actually means, but if I understand correctly you are trying to make a new "feature" (or call it metric) that has similar performance to the old Box +/- but without it's shortcomings (as you mention them) right?

### Re: Reconstructing Box Plus/Minus

Gradually working on this project!vzografos wrote: ↑Thu Jun 13, 2019 3:21 pmDSMok1 wrote: ↑Fri Apr 12, 2019 4:55 pm

- Box score stats only (i.e. anything that can be calculated from the stats we have from the 80s.)
- No PbP stats, not even things like "assisted by" ratios.
- Nothing super complex that can't be done by someone with Excel and a good knowledge of math.
- Focus on Explanation, not Prediction. What happens should be credited to the team. No luck adjustment. (A good explanatory stat can be converted to a predictive stat with appropriate regression to the mean.)

How are you getting on?

Not sure what your 4th goal actually means, but if I understand correctly you are trying to make a new "feature" (or call it metric) that has similar performance to the old Box +/- but without it's shortcomings (as you mention them) right?

Yes, the idea is to maintain the existing general concept of BPM (i.e. historic applicability, general structure) and significantly improve the handling of outliers.

Thus far I have focused on the linear version of BPM, currently called GmBPM. It should be very stable and should handle outlier numbers very well.

Then I intend to build upon that framework and add nonlinear terms as appropriate to help handle nuances while hopefully not destroying applicability to outlier values (like the existing BPM did).

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1

### Re: Reconstructing Box Plus/Minus

Recently, I evaluated another angle to the linear GmBPM regression.

I allowed a different intercept/constant for each position, just to see what would happen to the coefficients.

Interestingly, only one coefficient changed much. The AST coefficient jumped from 0.43 to 0.60. The coefficients for all of the other terms stayed almost exactly the same.

The constants for the positions were PG = 0 (baseline), SG =1.06, SF = 1.62, PF = 1.70, C = 1.56.

What are thoughts as to the reasons behind this behavior? I've got a few ideas I am in the process of exploring.

I allowed a different intercept/constant for each position, just to see what would happen to the coefficients.

Interestingly, only one coefficient changed much. The AST coefficient jumped from 0.43 to 0.60. The coefficients for all of the other terms stayed almost exactly the same.

The constants for the positions were PG = 0 (baseline), SG =1.06, SF = 1.62, PF = 1.70, C = 1.56.

What are thoughts as to the reasons behind this behavior? I've got a few ideas I am in the process of exploring.

APBRmetrics Forum Administrator

GodismyJudgeOK.com/DStats/

Twitter.com/DSMok1