Constructive discussion re: RAPM
Posted: Sat Mar 01, 2025 2:49 pm
				
				Thought I'd create an account here for more meaningful discussions.
I've layed out all my thoughts on RAPM here (https://x.com/sportsandmath1/status/189 ... 52605?s=46) and
@Crow/bballstrategy suggested to "take your case directly to one or more authors of RAPM or hybrids, if they are willing to spend the time (maybe not)"
I know I might come off as arrogant, but that's just the result of frustration on the current state of RAPM (and related measures). I'm not claiming to be an oracle, just airing out my grievances and suggesting a better path that clearly hasn't been explored enough if I'm the only one in the area (it appears).
I'd prefer if responses were from people who have worked extensively on RAPM (Jeremias Engelmann) or a SPM (e.g. RPM - Steve Ilardi , BPM - Daniel Myers, DARKO - Kostya Medvedovsky, LEBRON - Krishna Narsu, EPM - Taylor Snarr, etc.) so that we have a set of rankings using your methodology to reference and ideally if you have points of agreement with me you can incorporate them in the next iteration of your player rankings.
I'm not going to rehash the entire Twitter discussion (you can check for yourself).
But here's a quick summary of the key points of contention I'd like to discuss on here:
Start of discussion:
1. https://x.com/sportsandmath1/status/189 ... 64322?s=46
In response to a recent article on ESPN by Zach Kram, with a headline suggesting LeBron is "unlucky" this season,
I noted that I don't put much credence into On/Off since while I value On-Court +/- (3rd most impactful stat in my player rankings after 1. PTS (positive) , 2. FGA (negative)), Off-Court +/- is largely noise that should be considered circumstantial evidence at best.
I quoted a tweet with my player rankings for this season (https://x.com/sportsandmath1/status/189 ... 90177?s=46) using the formula which measures % of production per game to rank players (quite well imho).
[/begin tangent]
Source for stat hierarchy (initially made as a response to the opposite critique that +/- is useless) : https://x.com/sportsandmath1/status/182 ... 01455?s=46
Here's the thread where I discuss the importance of +/- and how it is the key box score stat missing from a stat like GameScore referencing how Bruno Cabaclo was 1st in GameScore vs Team USA in a blowout loss and how the inclusion of +/- drops him to 8th best player in the game: https://x.com/sportsandmath1/status/182 ... 89833?s=46
[/end of tangent]
As an example, I noted that A'ja Wilson won the 2024 WNBA MVP unanimously with an On-Off of -2.7. Her On-Court +/- was +6.5
This is evidence in support of my claim that On/Off is quite useless, but +/- is definitely relevant.
[/discussion about a reply]
Crow replied with an irrelevant comment using highly filtered lineup data (low sample size) and implied that one lineup with LeBron being +13 is somehow bad because that lineup without LeBron was +42. This comment reinforced my sentiment that Off-Court +/- is useless.
https://x.com/bballstrategy/status/1895 ... 73909?s=46
I replied repeating the statement that Off-Court +/- is largely irrelevant and that combining On-Court +/- (useful) with Off-Court (useless) creates circumstantial evidence (LeBron's RAPM is negative) that is relatively meaningless when determining the veracity of the claim that LeBron is an All-NBA Caliber player at age 40 (should be quite obvious for NBA fans who watch the games)
[/end of discussion about a reply]
Beginning of my 7 tweet thread (please read in entirely before responding):
https://x.com/sportsandmath1/status/189 ... 52605?s=46
"""
Quick rant on why RAPM is inherently plagued with noise:
It weighs Off-Court +/- (doesn't correlate to impact aka noise) quite heavily alongside On-Court +/- (does correlate to impact)
while ignoring PTS, FGA, and other stats that actually are related to impact.
--
When the "gold standard" of RAPM with box prior (EPM) says DFS > LeBron,
the problem isn't luck, it's that RAPM has nothing to do with impact.
It's just the answer to a linear algebra problem requiring 500³ operations, which intrigues mathematicians but isn't useful in practice
--
So to all my fellow analytics nerds,
Please stop viewing RAPM as some elegant approach to solve basketball.
It's just mixing quality data (On-Court +/-) with unrelated garbage (Off-Court +/-)
with a veneer of sophistication.
There's no needle in the haystack (of +/- data).
--
ChatGPT explanation:

- Basically if you're still a believer in RAPM. the regularization needs to be turned up way higher to tune out the noise which brings the values much closer to zero.
Then it may be marginally more useful than +/-.
--
The high regularization version of RAPM would essentially be a scaled down version of +/-
that still needs to be combined with stats related to impact (not RAPM) like PTS, FGA, etc.
--
RAPM is not a valid substitute for Impact.
That's why trying to predict RAPM (less related to Impact) using box score stats (more related to Impact) isn't a smart idea.
--
So if you want to train an RAPM model, the optimal regularization parameter is not the one that best predicts RAPM on unseen data
but rather something that's more related to Player Impact like unseen game results when combined with the box score.
----
End of thread
Follow-up reply chain
Final tweet, then scroll up:
https://x.com/sportsandmath1/status/189 ... 57309?s=46
Key points:
1. My player rankings explain over 90% of variance in player rankings. This approach can be iterated to hopefully approach 95%
To clarify I'm referencing the NBA Math Crystal Basketball rankings from 2018 (train), 2019 (train) , 2020 (test) , 2021 (test) which ranked all ~500 players on a 1-12 scale aggregating the opinions of 10-15 people such as Ben Taylor).
2. No one can predict RAPM reliably since it's plagued with noise.
3. RAPM isn't meaningful on its own (see thread for why).
It needs to be combined alongside the box score (most notably INDIVIDUAL PTS and FGA)
to best predict impact (again RAPM ≠ Impact)
4. Training the box score to predict RAPM is foolish it's like going into an room blindfolded and with ear plugs and trying to imagine what the room looks like solely based on distance data.
That's why you get nonsense like DFS > LeBron
5. It's all in the thread. Your comments come across as if you're ignoring the points in the thread and repeating your talking points.
I've suggested how to improve RAPM in the thread:
Namely to use it in ensemble with the box score to predict OOS games (rather than OOS RAPM)
I assume doing an approach that's based on empiricism would yield a much higher regularization (lambda) parameter which tunes out most of the off court noise that's currently added to RAPM making it much more similar to +/-.
6. The main reasons I prefer +/- to RAPM
a. It's a box score statistic capturing everything in and outside the individual stats.
b. it's additive and not biased (the regularization in RAPM introduces bias)
c. It doesn't claim to be an all-in-one metric the way RAPM does despite the inherent limitation of solely relying on +/- data (just one of 10+ factors).
d. It's plainly obvious that ranking N=500 players shouldn't require O(N³) = 125 million
computations.
That's what makes it obvious that RAPM is made by/for linear algebra nerds not NBA nerds.
It got repackaged into NBA All-in-one metrics that are quite meaningless.
If you have the time you should be able to
use my ranking engine (https://sportsandmath1.github.io/RankingEngine/) to rank a random subset of 50 NBA players in 5-10 minutes.
then try comparing the correlation of the Elo Ratings to my player rankings vs some form of RAPM and it's almost certain it'll correlate higher to mine for 99% of NBA fans/analysts.
The 1% who's rankings correlate better to RAPM are fully on the RAPM Kool-aid which I'd respect that we simply have a difference of opinion.
But for the other 99% there's some inherent dissonance between how they rank players and how RAPM does.
Quoted tweet re the ranking engine:
https://x.com/sportsandmath1/status/189 ... 19467?s=46
Hopefully that was enough to understand my perspective on RAPM and start a discussion on how we can improve the future of NBA Player Rankings by shifting away from viewing RAPM as a magical oracle back to what it is: a tool to adjust raw +/- data that should be added to our toolbox but not worshipped as a measure of the ground truth.
			I've layed out all my thoughts on RAPM here (https://x.com/sportsandmath1/status/189 ... 52605?s=46) and
@Crow/bballstrategy suggested to "take your case directly to one or more authors of RAPM or hybrids, if they are willing to spend the time (maybe not)"
I know I might come off as arrogant, but that's just the result of frustration on the current state of RAPM (and related measures). I'm not claiming to be an oracle, just airing out my grievances and suggesting a better path that clearly hasn't been explored enough if I'm the only one in the area (it appears).
I'd prefer if responses were from people who have worked extensively on RAPM (Jeremias Engelmann) or a SPM (e.g. RPM - Steve Ilardi , BPM - Daniel Myers, DARKO - Kostya Medvedovsky, LEBRON - Krishna Narsu, EPM - Taylor Snarr, etc.) so that we have a set of rankings using your methodology to reference and ideally if you have points of agreement with me you can incorporate them in the next iteration of your player rankings.
I'm not going to rehash the entire Twitter discussion (you can check for yourself).
But here's a quick summary of the key points of contention I'd like to discuss on here:
Start of discussion:
1. https://x.com/sportsandmath1/status/189 ... 64322?s=46
In response to a recent article on ESPN by Zach Kram, with a headline suggesting LeBron is "unlucky" this season,
I noted that I don't put much credence into On/Off since while I value On-Court +/- (3rd most impactful stat in my player rankings after 1. PTS (positive) , 2. FGA (negative)), Off-Court +/- is largely noise that should be considered circumstantial evidence at best.
I quoted a tweet with my player rankings for this season (https://x.com/sportsandmath1/status/189 ... 90177?s=46) using the formula which measures % of production per game to rank players (quite well imho).
[/begin tangent]
Source for stat hierarchy (initially made as a response to the opposite critique that +/- is useless) : https://x.com/sportsandmath1/status/182 ... 01455?s=46
Here's the thread where I discuss the importance of +/- and how it is the key box score stat missing from a stat like GameScore referencing how Bruno Cabaclo was 1st in GameScore vs Team USA in a blowout loss and how the inclusion of +/- drops him to 8th best player in the game: https://x.com/sportsandmath1/status/182 ... 89833?s=46
[/end of tangent]
As an example, I noted that A'ja Wilson won the 2024 WNBA MVP unanimously with an On-Off of -2.7. Her On-Court +/- was +6.5
This is evidence in support of my claim that On/Off is quite useless, but +/- is definitely relevant.
[/discussion about a reply]
Crow replied with an irrelevant comment using highly filtered lineup data (low sample size) and implied that one lineup with LeBron being +13 is somehow bad because that lineup without LeBron was +42. This comment reinforced my sentiment that Off-Court +/- is useless.
https://x.com/bballstrategy/status/1895 ... 73909?s=46
I replied repeating the statement that Off-Court +/- is largely irrelevant and that combining On-Court +/- (useful) with Off-Court (useless) creates circumstantial evidence (LeBron's RAPM is negative) that is relatively meaningless when determining the veracity of the claim that LeBron is an All-NBA Caliber player at age 40 (should be quite obvious for NBA fans who watch the games)
[/end of discussion about a reply]
Beginning of my 7 tweet thread (please read in entirely before responding):
https://x.com/sportsandmath1/status/189 ... 52605?s=46
"""
Quick rant on why RAPM is inherently plagued with noise:
It weighs Off-Court +/- (doesn't correlate to impact aka noise) quite heavily alongside On-Court +/- (does correlate to impact)
while ignoring PTS, FGA, and other stats that actually are related to impact.
--
When the "gold standard" of RAPM with box prior (EPM) says DFS > LeBron,
the problem isn't luck, it's that RAPM has nothing to do with impact.
It's just the answer to a linear algebra problem requiring 500³ operations, which intrigues mathematicians but isn't useful in practice
--
So to all my fellow analytics nerds,
Please stop viewing RAPM as some elegant approach to solve basketball.
It's just mixing quality data (On-Court +/-) with unrelated garbage (Off-Court +/-)
with a veneer of sophistication.
There's no needle in the haystack (of +/- data).
--
ChatGPT explanation:

- Basically if you're still a believer in RAPM. the regularization needs to be turned up way higher to tune out the noise which brings the values much closer to zero.
Then it may be marginally more useful than +/-.
--
The high regularization version of RAPM would essentially be a scaled down version of +/-
that still needs to be combined with stats related to impact (not RAPM) like PTS, FGA, etc.
--
RAPM is not a valid substitute for Impact.
That's why trying to predict RAPM (less related to Impact) using box score stats (more related to Impact) isn't a smart idea.
--
So if you want to train an RAPM model, the optimal regularization parameter is not the one that best predicts RAPM on unseen data
but rather something that's more related to Player Impact like unseen game results when combined with the box score.
----
End of thread
Follow-up reply chain
Final tweet, then scroll up:
https://x.com/sportsandmath1/status/189 ... 57309?s=46
Key points:
1. My player rankings explain over 90% of variance in player rankings. This approach can be iterated to hopefully approach 95%
To clarify I'm referencing the NBA Math Crystal Basketball rankings from 2018 (train), 2019 (train) , 2020 (test) , 2021 (test) which ranked all ~500 players on a 1-12 scale aggregating the opinions of 10-15 people such as Ben Taylor).
2. No one can predict RAPM reliably since it's plagued with noise.
3. RAPM isn't meaningful on its own (see thread for why).
It needs to be combined alongside the box score (most notably INDIVIDUAL PTS and FGA)
to best predict impact (again RAPM ≠ Impact)
4. Training the box score to predict RAPM is foolish it's like going into an room blindfolded and with ear plugs and trying to imagine what the room looks like solely based on distance data.
That's why you get nonsense like DFS > LeBron
5. It's all in the thread. Your comments come across as if you're ignoring the points in the thread and repeating your talking points.
I've suggested how to improve RAPM in the thread:
Namely to use it in ensemble with the box score to predict OOS games (rather than OOS RAPM)
I assume doing an approach that's based on empiricism would yield a much higher regularization (lambda) parameter which tunes out most of the off court noise that's currently added to RAPM making it much more similar to +/-.
6. The main reasons I prefer +/- to RAPM
a. It's a box score statistic capturing everything in and outside the individual stats.
b. it's additive and not biased (the regularization in RAPM introduces bias)
c. It doesn't claim to be an all-in-one metric the way RAPM does despite the inherent limitation of solely relying on +/- data (just one of 10+ factors).
d. It's plainly obvious that ranking N=500 players shouldn't require O(N³) = 125 million
computations.
That's what makes it obvious that RAPM is made by/for linear algebra nerds not NBA nerds.
It got repackaged into NBA All-in-one metrics that are quite meaningless.
If you have the time you should be able to
use my ranking engine (https://sportsandmath1.github.io/RankingEngine/) to rank a random subset of 50 NBA players in 5-10 minutes.
then try comparing the correlation of the Elo Ratings to my player rankings vs some form of RAPM and it's almost certain it'll correlate higher to mine for 99% of NBA fans/analysts.
The 1% who's rankings correlate better to RAPM are fully on the RAPM Kool-aid which I'd respect that we simply have a difference of opinion.
But for the other 99% there's some inherent dissonance between how they rank players and how RAPM does.
Quoted tweet re the ranking engine:
https://x.com/sportsandmath1/status/189 ... 19467?s=46
Hopefully that was enough to understand my perspective on RAPM and start a discussion on how we can improve the future of NBA Player Rankings by shifting away from viewing RAPM as a magical oracle back to what it is: a tool to adjust raw +/- data that should be added to our toolbox but not worshipped as a measure of the ground truth.