Page 11 of 11

Re: Reconstructing Box Plus/Minus

Posted: Wed May 06, 2020 4:24 pm
by DSMok1
As a note on that: Extraordinary value in a given game is almost always given by great shooting performances. Someone shooting 11/11 from the field is a huge value add in a given game. Things like rebounds cannot drive a +25 game.

Re: Reconstructing Box Plus/Minus

Posted: Wed May 06, 2020 10:16 pm
by Mike G
Funny you should say that.
I downloaded all of Michael Jordan's 179 playoff game numbers; and came up with a formula to estimate Game Score from BPM and minutes. Call it faux game score.
fgs = Min/48 * (BPM+12.5) * 1.224

This yields an avg abs. error (GS-fgs) of 3.2. The formula is specific to this player.
Sample predicted GmSc based on BPM over 40 minutes:

Code: Select all

BPM    fgs
-5     7.7
+0    12.8
+5    17.9
10    23.0
15    28.1
20    33.2
25    38.3
30    43.4
This pretty much covers the range of Jordan's BPM and GS

Here are games in which his BPM suggests much lower GS than he actually shows at b-r.com:

Code: Select all

yr  Rd  G#   Opp  Rb  As  St  Bk  TO  Pts   TS%   GmSc    BPM   fgs
89   2   6   NYK   5  10   0   4   6   40  .733   34.3    8.0   23.5
93   4   4   Phx   8   4   0   0   1   55  .612   38.9   11.5   28.2
97   1   2   Was   7   2   2   0   2   55  .698   42.8   16.1   32.1
89   2   4   NYK  11   6   1   2   6   47  .775   39.9   15.2   29.7
92   1   2   Mia  13   6   2   0   4   33  .602   27.8    8.2   18.5

92   1   3   Mia   5   5   4   2   2   56  .738   49.8   25.0   41.1
92   4   5   Por   5   4   0   1   4   46  .733   33.3   10.8   25.0
92   2   7   NYK   6   4   2   3   5   42  .605   31.0    8.7   22.7
90   1   3   Mil   9   5   1   2   5   48  .616   33.5    9.1   25.3
90   2   3   Phl   5   5   4   1   5   49  .645   35.9   14.2   28.6
So the 'funny' thing is that all these games featured stellar shooting by MJ, even by His playoff standards. And they have the largest disparity of GmSc >> BPM.

At the other end, where BPM implies a much better GmSc, we have the opposite effect.

Code: Select all

averages:
Reb   Ast   Stl   Blk   TO    Pts    TS%
7.4   5.1   1.6   1.5   4.0   47.1  .676  GmSc >> fgs
6.4   5.7   2.1   0.9   3.1   33.4  .568 <-- all playoffs
7.0   4.8   2.3   0.4   2.4   29.2  .510  GmSc << fgs
The bottom line is avg of 10 games where fgs is at least 6.7 better than GmSc.
Avg BPM for those bad-shooting 10 games is 12.9; for the 10 in the top line it's 12.7
Was the competition that much different?

Re: Reconstructing Box Plus/Minus

Posted: Tue May 12, 2020 5:54 am
by nbacouchside
Created a means of estimating BPM 2.0 for pre-1974 seasons using Height in inches - 79 (avg height), PER - 15, and WS/48 - .100.

https://twitter.com/NBAcouchside/status ... 51682?s=20

.85 r^2

Re: Reconstructing Box Plus/Minus

Posted: Fri May 29, 2020 2:00 pm
by dtkavana
Is there a timeline for when BPM2.0 will be added to basketball-reference for players pre1974, and will there be BPM2.0 for these players in the playoffs?


Thanks

Re: Reconstructing Box Plus/Minus

Posted: Fri Dec 11, 2020 8:24 pm
by DSMok1
I realized the following information about BPM 2.0, aging curves, and projections was never published.

First, the aging curve. It was developed accounting for the survivorship bias--players who get to play another season are usually luckier than players that do not get to play another season. I found that effect was around a gap of 0.38 pts/100.

Here is the aging curve, using a fitted cubic equation for the deltas. The fit was quite good overall.

Code: Select all

╔══════╦══════╦═════════════════╗
║ Age1 ║ Age2 ║ Estimated Delta ║
╠══════╬══════╬═════════════════╣
║   18 ║   19 ║ 2.07            ║
║   19 ║   20 ║ 1.69            ║
║   20 ║   21 ║ 1.35            ║
║   21 ║   22 ║ 1.06            ║
║   22 ║   23 ║ 0.80            ║
║   23 ║   24 ║ 0.57            ║
║   24 ║   25 ║ 0.38            ║
║   25 ║   26 ║ 0.22            ║
║   26 ║   27 ║ 0.08            ║
║   27 ║   28 ║ -0.04           ║
║   28 ║   29 ║ -0.13           ║
║   29 ║   30 ║ -0.20           ║
║   30 ║   31 ║ -0.27           ║
║   31 ║   32 ║ -0.31           ║
║   32 ║   33 ║ -0.35           ║
║   33 ║   34 ║ -0.38           ║
║   34 ║   35 ║ -0.41           ║
║   35 ║   36 ║ -0.44           ║
║   36 ║   37 ║ -0.46           ║
║   37 ║   38 ║ -0.50           ║
║   38 ║   39 ║ -0.53           ║
║   39 ║   40 ║ -0.58           ║
║   40 ║   41 ║ -0.64           ║
║   41 ║   42 ║ -0.72           ║
║   42 ║   43 ║ -0.81           ║
║   43 ║   44 ║ -0.93           ║
║   44 ║   45 ║ -1.07           ║
╚══════╩══════╩═════════════════╝
Obviously, take the deltas younger than 20 and older than 35 with a larger grain of salt due to small sample size.

Then, for projections.

I found an exponential decay for weighting past performance (after adjusting for aging) of around 0.5 worked the best. In other words, Current year Y=1.0, Last year Y-1=0.5, Y-2=0.25, and so on.

For priors to add some regression to the mean--I used two, neither of which were very heavily weighted.

First, all players had a -2.8 prior weighted at 165 minutes, except rookies, whose prior was -3.6.

Secondly, I used a minutes/team rating prior to add a bit more information. This prior was not included if the player did not have recent results. The prior looked at either the current year or previous year, whichever was more recent and had data.

The equation for this second prior looked like: MPG*0.22 + Tm Rtg*0.10 + MPG*TmRtg*0.005 - 5.6. This prior, if it existed, was weighted at 130 minutes.

Using all of these components should yield a decent BPM projection, and should work decently well on other similar stats.

Re: Reconstructing Box Plus/Minus

Posted: Wed Dec 16, 2020 1:11 am
by colts18
Awesome work DSmok. I've spent quite a bit of time exploring the BPM leaderboards to get a better understanding of the stat. I can say for certain that BPM is the best box score stat. Almost all of the players that BPM loves are good players. More importantly, BPM seems to identify bad players that other stats aren't able to pick up (ahem Enes Kanter). This version of BPM is also better than the BPM 1.0. The first version had a weakness with PG's.

There are a few things I would like to suggest to be incorporated into BPM 3.0 for more improved accuracy.

1. Breaking down BPM on a game level has to be the biggest innovation of BPM 2.0. Players who missed a ton of games and their teams sucked in them are not getting punished by their teammates sucking. I would like for this to go even deeper. Why not adjust opponent's FT shooting as a luck adjustment. In the 2001 season, Shaq had a 13-13 FT shooting game against the Nuggets. :shock: That same season he had an 0-11 FT shooting game against the Sonics. The Sonics defenders are getting extra credit for playing awesome defense because Shaq sucked that night. The Nuggets get their defensive stats destroyed because Shaq was lucky. Both teams played the same defense, but one of them got luckier than the other. The same can be done for Shaq's Lakers teammates who shouldn't get their offensive stats destroyed in the team adjustment portion because Shaq sucked.

2. How is the average lead calculated in BPM? I know we have Quarter by Quarter box scores going back to the 80's. You can get a quality estimate of the average lead if you know a team is up by 15 after the 1st quarter then cruises to a 20 point lead.

3. Have you thought about incorporating something similar to J.E's Fake RAPM in the BPM stats that have quarter by quarter box scores? http://apbr.org/metrics/viewtopic.php?f ... 7&start=15 The team adjustment section can be improved if you know the team like say the Lakers in the finals goes up by 35 points in the finals against the Heat but ends up winning by 12 points because of meaningless garbage time baskets in the 4th quarter. You can give the starting players more credit than the backups if you know what the average lead was throughout the game.

4. I have a radical idea for team adjustments. Instead of splitting the credit 100% evenly for all players in the team adjustment, why not use 4 factors and pace calculations and experiment with the proper adjustment. Like you know the team won because their Defensive rebounding was high and pace was low, maybe the big men get more credit. If you won because of low turnovers and high eFG%, the little guys get more credit.

5. I'm not a fan of the defensive BPM. the results are too compressed. The highest Defensive RAPM for the 2012-2016 seasons was KG's 5.4. The highest DBPM is Bogut's 3.1. 10 Different players have higher Defensive RAPM scores than the highest Defensive BPM score. This is how the top 10 in Defensive RAPM from 2012-2016 did in DBPM:

Player D RAPM DBPM
Kevin Garnett 5.4 2
Andre Iguodala 4.6 1.8
Draymond Green 4.4 2.8
Paul George 4 2
Eric Bledsoe 3.9 1
Tony Allen 3.7 2.2
Thabo Sefolosha 3.7 1.8
LeBron James 3.3 1.8
Chris Paul 3.2 2.2
Danny Green 3.1 1.8

They all had lower DBPM's. They averaged 2 points/100 less in DBPM. Defenders should be getting more credit. As a result, the split between top offensive players and defensive players is too large.

Top 10 in 2012-2016:
OBPM: 5.81
DBPM: 2.43
2.39 Ratio or 42% of the top offensive players

Top 10 in:
ORAPM: 6.41
DRAPM: 3.93
1.63 Ratio or 61% of the top offensive players

Those BPM splits are massive. Offensive players seem to get rated properly by the stat, but defenders are getting punished.

6. To go along with point #5, Defensive Big men are getting the short thrift in DBPM. I see too many small guards high up on the leaderboard. Nate McMillan's career DBPM is HIGHER than Patrick Ewing's CAREER HIGH. This is the same Ewing that played on some of the best defenses in history. Chris Paul has 5 seasons better than Alonzo Mourning's best season and Mourning won 2 DPOY. 4 Time DPOY Dikembe Mutombo is behind Jon Koncak in the ratings. :lol: Michael Jordan has Top 10 Defensive seasons, the same amount as Ben Wallace and Kevin Garnett. The stat struggles to handle historical players who played before the 2000s. The best defensive players are Big Men and the stat should reflect that.

Re: Reconstructing Box Plus/Minus

Posted: Wed Dec 16, 2020 6:51 pm
by DSMok1
colts18 wrote: Wed Dec 16, 2020 1:11 am There are a few things I would like to suggest to be incorporated into BPM 3.0 for more improved accuracy.

1. Breaking down BPM on a game level has to be the biggest innovation of BPM 2.0. Players who missed a ton of games and their teams sucked in them are not getting punished by their teammates sucking. I would like for this to go even deeper. Why not adjust opponent's FT shooting as a luck adjustment. In the 2001 season, Shaq had a 13-13 FT shooting game against the Nuggets. :shock: That same season he had an 0-11 FT shooting game against the Sonics. The Sonics defenders are getting extra credit for playing awesome defense because Shaq sucked that night. The Nuggets get their defensive stats destroyed because Shaq was lucky. Both teams played the same defense, but one of them got luckier than the other. The same can be done for Shaq's Lakers teammates who shouldn't get their offensive stats destroyed in the team adjustment portion because Shaq sucked.
That certainly could be an interesting and effective upgrade. I have generally avoided using any opponent statistics... would this be worthwhile? What about 3Pt% luck? Paging @bbstats...
colts18 wrote: Wed Dec 16, 2020 1:11 am2. How is the average lead calculated in BPM? I know we have Quarter by Quarter box scores going back to the 80's. You can get a quality estimate of the average lead if you know a team is up by 15 after the 1st quarter then cruises to a 20 point lead.
In general, we have just used final margin/2. Using quarter by quarter box scores would be a fairly simple modification. I suspect that the impact would be so small over the course of a season that it's probably not worth chasing.
colts18 wrote: Wed Dec 16, 2020 1:11 am3. Have you thought about incorporating something similar to J.E's Fake RAPM in the BPM stats that have quarter by quarter box scores? http://apbr.org/metrics/viewtopic.php?f ... 7&start=15 The team adjustment section can be improved if you know the team like say the Lakers in the finals goes up by 35 points in the finals against the Heat but ends up winning by 12 points because of meaningless garbage time baskets in the 4th quarter. You can give the starting players more credit than the backups if you know what the average lead was throughout the game.
I agree that could be done, but it would be outside of the scope of BPM. By design, BPM should be able to be calculated without any PbP data. That's not going to change. I want this to be feasible to calculate for the NBA, for the NCAA, for high schools...any level with a box score.
colts18 wrote: Wed Dec 16, 2020 1:11 am. I have a radical idea for team adjustments. Instead of splitting the credit 100% evenly for all players in the team adjustment, why not use 4 factors and pace calculations and experiment with the proper adjustment. Like you know the team won because their Defensive rebounding was high and pace was low, maybe the big men get more credit. If you won because of low turnovers and high eFG%, the little guys get more credit.
OK, here we get into a complicated area.

If we want the calculation to feasible for any level of competition, we have to be very, very careful about how the team adjustments are done. It's certainly an avenue to explore. Looking at the statistics you're talking about is interesting; I had not thought of that.
colts18 wrote: Wed Dec 16, 2020 1:11 am5. I'm not a fan of the defensive BPM. the results are too compressed. The highest Defensive RAPM for the 2012-2016 seasons was KG's 5.4. The highest DBPM is Bogut's 3.1. 10 Different players have higher Defensive RAPM scores than the highest Defensive BPM score. This is how the top 10 in Defensive RAPM from 2012-2016 did in DBPM:

Player D RAPM DBPM
Kevin Garnett 5.4 2
Andre Iguodala 4.6 1.8
Draymond Green 4.4 2.8
Paul George 4 2
Eric Bledsoe 3.9 1
Tony Allen 3.7 2.2
Thabo Sefolosha 3.7 1.8
LeBron James 3.3 1.8
Chris Paul 3.2 2.2
Danny Green 3.1 1.8

They all had lower DBPM's. They averaged 2 points/100 less in DBPM. Defenders should be getting more credit. As a result, the split between top offensive players and defensive players is too large.

Top 10 in 2012-2016:
OBPM: 5.81
DBPM: 2.43
2.39 Ratio or 42% of the top offensive players

Top 10 in:
ORAPM: 6.41
DRAPM: 3.93
1.63 Ratio or 61% of the top offensive players

Those BPM splits are massive. Offensive players seem to get rated properly by the stat, but defenders are getting punished.
I 100% agree with you. Defensive BPM isn't very good.

I have not come up with any good way to fix it. The box score is just so limited. The obvious approach would be to somehow "focus credit" on the big men, particularly big men who played more time. But that gets very, very sticky when trying to generalize to the NCAA or other contexts. How do you even say what a good defensive performance was? Compared to what?

I would love to hear any ideas that could also generalize to other contexts.
colts18 wrote: Wed Dec 16, 2020 1:11 am6. To go along with point #5, Defensive Big men are getting the short thrift in DBPM. I see too many small guards high up on the leaderboard. Nate McMillan's career DBPM is HIGHER than Patrick Ewing's CAREER HIGH. This is the same Ewing that played on some of the best defenses in history. Chris Paul has 5 seasons better than Alonzo Mourning's best season and Mourning won 2 DPOY. 4 Time DPOY Dikembe Mutombo is behind Jon Koncak in the ratings. :lol: Michael Jordan has Top 10 Defensive seasons, the same amount as Ben Wallace and Kevin Garnett. The stat struggles to handle historical players who played before the 2000s. The best defensive players are Big Men and the stat should reflect that.
Agreed. Here's the deal--big men drive the defense, to a large degree. Big men should have a wider spread than guards. A bad defensive big man hurts the team more than a bad defensive guard, and a good defensive big man helps the team more than a good defensive guard.

But I cannot, for the life of me, figure out how to construct BPM to handle that in a way that is generalizable to non-NBA situations. That is the one big thing I would like to solve for a BPM 3.0.

Re: Reconstructing Box Plus/Minus

Posted: Fri Dec 18, 2020 5:12 pm
by colts18
DSMok1 wrote: Wed Dec 16, 2020 6:51 pm That certainly could be an interesting and effective upgrade. I have generally avoided using any opponent statistics... would this be worthwhile? What about 3Pt% luck? Paging @bbstats...
3 point luck is getting into tricky territory. Adjusting for Free throws is easy and intuitive. Maybe you can regress 3 pointers by 25% towards the mean to account for 3 point luck.




I agree that could be done, but it would be outside of the scope of BPM. By design, BPM should be able to be calculated without any PbP data. That's not going to change. I want this to be feasible to calculate for the NBA, for the NCAA, for high schools...any level with a box score.
Why does BPM have to be the exact same for NBA and college? The games are different so the stat can be slightly different too. I mean BPM for 1974-1984 is different to 1985-present BPM. Since the 1985-present BPM is already different I don't see why it can't be changed slightly to reflect the extra boxscore data we have for those years.

OK, here we get into a complicated area.

If we want the calculation to feasible for any level of competition, we have to be very, very careful about how the team adjustments are done. It's certainly an avenue to explore. Looking at the statistics you're talking about is interesting; I had not thought of that.
Once again, BPM can be slightly different for the NBA than other leagues. And I'm sure you can get a good estimate for the 4 factors if you have boxscore data in NCAA games.
I 100% agree with you. Defensive BPM isn't very good.

I have not come up with any good way to fix it. The box score is just so limited. The obvious approach would be to somehow "focus credit" on the big men, particularly big men who played more time. But that gets very, very sticky when trying to generalize to the NCAA or other contexts. How do you even say what a good defensive performance was? Compared to what?

I would love to hear any ideas that could also generalize to other contexts.

Agreed. Here's the deal--big men drive the defense, to a large degree. Big men should have a wider spread than guards. A bad defensive big man hurts the team more than a bad defensive guard, and a good defensive big man helps the team more than a good defensive guard.

But I cannot, for the life of me, figure out how to construct BPM to handle that in a way that is generalizable to non-NBA situations. That is the one big thing I would like to solve for a BPM 3.0.
First off you can multiply DBPM by 10% - 20% to accurately reflect the D RAPM spread. That's a simple adjustment that will align the top defenders closer to the top offensive players. OR you can calculate the standard deviation of Offense and Defense in your RAPM sets and manually adjust BPM values to account for that spread. For past seasons you can adjust based on the standard deviation of team offensive and defensive ratings too.

Big Men: I noticed that the D RAPM for 2012-2016 has a solid mix of big men and perimeter players. But the 1997-2001 RAPM is dominated by big men. My theory is that BPM is overcompensating for perimeter players because they have dominated recently due to rule changes. If you use the BPM for 1990 that you use for 2015 then the 1990 perimeter players will get overrated. My suggestion is for some kind of era adjustment to give Big Men higher values for past eras, especially on the defensive end. An idea would be to adjust each year based on the number of 3 pointers taken. The more 3 pointers attempted in an era, the more perimeter defenders get credit. And the opposite is true. So for 2020, teams might average 30 3 Pointers a game so we know that Big Men don't have the same defensive value. But in 1990, there was 5 3 pointers a game so the BPM credit should get split more in favor of Big Men.

Re: Reconstructing Box Plus/Minus

Posted: Mon Dec 28, 2020 7:26 pm
by colts18
Dsmok,

I've been thinking about BPM a bit recently and how it can accurately reflect the impact of big men. Have you thought about incorporating counterpart stats in your defensive stats? It's not going to tell a lot but every slight improvement is big. You have a positional adjustment already in your stat, so you can find offensive stats of the opponent in the same position (ex: find the opposing PG's in games played by Chris Paul) and use that for the defensive team adjustment. Then you can split up 2 point shooting and 3 point shooting. If your team has a strong 2 point defense, chances are the big man is a big part of that (example: Shaq's teams always had strong 2 point defenses in the year he tried on defense).

I know you don't have data for assisted vs unassisted baskets but couldn't you use the box score data to create a proxy for that? For example if you know the PG had 20 assists in the game and his teammates made a bunch of 3's, you know that they got assisted a lot. But if you see an Allen Iverson game where he went 12-35, chances are he wasn't being assisted by his teammates much so he should get a boost for that. You can use 2 point shooting, 3 point shooting, offensive rebounds, and assists/turnovers to get a proxy for assisted vs unassisted baskets.

Re: Reconstructing Box Plus/Minus

Posted: Thu Dec 31, 2020 5:48 am
by rainmantrail
DSMok1 wrote: Wed Dec 16, 2020 6:51 pm Agreed. Here's the deal--big men drive the defense, to a large degree. Big men should have a wider spread than guards. A bad defensive big man hurts the team more than a bad defensive guard, and a good defensive big man helps the team more than a good defensive guard.

But I cannot, for the life of me, figure out how to construct BPM to handle that in a way that is generalizable to non-NBA situations. That is the one big thing I would like to solve for a BPM 3.0.
Does your model account for a player's meta data at all? Stuff like height, weight, wingspan, vertical leap, draft pick, etc? While there is obviously much more to the equation that this, each of these components should be surrogates for at least some of what we'd otherwise want DBPM to capture, and it would be relatively easy to incorporate into the model.

Re: Reconstructing Box Plus/Minus

Posted: Thu Dec 31, 2020 5:54 am
by rainmantrail
colts18 wrote: Mon Dec 28, 2020 7:26 pm I know you don't have data for assisted vs unassisted baskets but couldn't you use the box score data to create a proxy for that? For example if you know the PG had 20 assists in the game and his teammates made a bunch of 3's, you know that they got assisted a lot. But if you see an Allen Iverson game where he went 12-35, chances are he wasn't being assisted by his teammates much so he should get a boost for that. You can use 2 point shooting, 3 point shooting, offensive rebounds, and assists/turnovers to get a proxy for assisted vs unassisted baskets.
I'm confused. Why wouldn't we just take FGs made - Assists as the unassisted baskets?

Re: Reconstructing Box Plus/Minus

Posted: Thu Dec 31, 2020 12:58 pm
by DSMok1
rainmantrail wrote: Thu Dec 31, 2020 5:48 am Does your model account for a player's meta data at all? Stuff like height, weight, wingspan, vertical leap, draft pick, etc? While there is obviously much more to the equation that this, each of these components should be surrogates for at least some of what we'd otherwise want DBPM to capture, and it would be relatively easy to incorporate into the model.
I have never included metadata in BPM, because I wanted the design to be as close to 100% portable to different leagues or eras as possible. The general philosophy is to restrict to the box score since that is so universal.

I agree it could provide additional information along these lines.

Re: Reconstructing Box Plus/Minus

Posted: Fri Jan 01, 2021 5:13 pm
by Mike G
rainmantrail wrote: Thu Dec 31, 2020 5:54 am I'm confused. Why wouldn't we just take FGs made - Assists as the unassisted baskets?
Assists in the boxscore are not the number of times you were assisted, but the number of times you passed to someone else, who then scored on a FG.

I have used a formula like this to estimate "fraction of points which were unassisted"

uAst% = (.47*FTA + 2.06*Ast + .77*Stl + 2.9*TO - .103*3fga)/Min * (TmAPG - APG)/TmAPG * 1.2

This is kinda crude, and for low-minutes players can be <0 or >1. But for most players it seems pretty close and invites analysis into relative value of "shot creation."

An easier boxscore adjustment is for home assist bias, as some scorekeepers still give more assists than others, and most give more to the home team.

This last season, the Spurs were said to assist on 56% of their FG on the road but 61% at home. I'd say their home assists were inflated by a factor of (61/56) 1.092. It varies by individual player, but dividing all SAS players' assists by 1.046 (assuming half of games were at home) is still better than giving them full credit.

You might also consider assisting a 3FG has more value than on a 2FG, but that's debatable.

Re: Reconstructing Box Plus/Minus

Posted: Sat Jan 02, 2021 4:18 am
by rainmantrail
Mike G wrote: Fri Jan 01, 2021 5:13 pm
rainmantrail wrote: Thu Dec 31, 2020 5:54 am I'm confused. Why wouldn't we just take FGs made - Assists as the unassisted baskets?
Assists in the boxscore are not the number of times you were assisted, but the number of times you passed to someone else, who then scored on a FG.
Oh, sorry. I misread that before. I thought we were talking about the team level unassisted baskets.