APBRmetrics

Posted: **Sun Apr 03, 2016 9:20 pm**

Nate wrote: Wow. That's not something that I would have expected.

I didn't expect that either, and it wasn't even my intention to use that as a predictor, but rather wanted to confirm a calculation I made regarding the mentioned championship odds. I guess, that at least a part of the better prediction is based on the "leading/trailing by" effect J.E. found.
When I remember correctly, Wayne Winston found a "clutch" effect in his APM results, but as J.E. pointed out, different weights for possessions at the end of the game or completely removing "meaningless" possessions (garbage time) did not improve out-of-sample prediction for RAPM. Tbh, I don't have a good explanation for the better prediction besides the "leading/trailing by" effect getting picked up over the course of the whole game instead of just getting partly accounted for at the end of the game.

Posted: **Sun Apr 03, 2016 10:34 pm**

A few more cases of guys very high on win impact estimate but not so high on RAPM estimate:

Name Rank on Win Impact Rank on RAPM
Dellavedova 7 98
Steven Adams 16 66
T Ross 20 230
P Patterson 9 165
C Joseph 26 199
P Mills 10 62

These are different measures and these are pretty big differences in rank. It would be good to know the correlation for the entire databases. And it could be interesting to try to cluster players by these 2 metrics (high-high, high-medium, high-low etc. for 27 clusters that probably should be gathered into a handful or two superclusters for easier handling), then look within cluster at position, skills, age, team quality, etc. to try to identify trends.

To get out of the "black box", presumably you could look at play by play box score data and the win impacts of single plays and summarize the additions and reductions in win impact to the four factor level. That might with where players are having the biggest impacts, especially in clutch situations. If you really wanted to push it you could match up Synergy characterizations of "plays" with win impacts for those plays. That might help with optimizing lineup choices with play calling and defending.

Posted: **Sun Apr 03, 2016 10:57 pm**

Wall 90th on 1 year win impact, Lillard 92, Irving 94, P. Gasol 73. Howard, A Davis, Towns, Wade, Parsons, Kanter, DeRozan, J. Butler, Drummond, C Anthony, Hayward, M Gasol, B Lopez and of course D. Rose and K. Bryant not in top 100 on my quick look. Jokic 9th on RAPM (and highest ranked center) but 79th on Win Impact estimate, behind Nurkic (who RAPM estimates a median center with almost neutral impact).

There is a lot of serious work to be done comparing these metrics and trying to understand their outputs. Or the other option is to say it is part or all too shaky and put it aside.

Posted: **Mon Apr 04, 2016 3:40 am**

permaximum wrote:
Statman wrote:
Nate wrote:Do people still believe that 'clutch factor' is a big thing in pro sports?
I don't.
Then you guys don't know a thing about human nature and sports. Don't tell this outside of this community or everyone will laugh at you

Let me clarify. The issue I had with the phrase was clutch factor as a "big thing". I'm not saying it doesn't exist whatsoever.

Posted: **Mon Apr 04, 2016 5:47 am**

mystic wrote:
You simply replace the scoring margin with changes of the win probability in the response vector (usually named y). And then you run the regression normally. In that way you get win probabilities instead of APM values for each player. The interpretation would actually be that a player would raise/lower the win probablity over average by x percentage points.

And how you compute the win probabilities from the scoring margin?

Posted: **Mon Apr 04, 2016 6:53 am**

hoopthinker wrote: And how you compute the win probabilities from the scoring margin?

I think you want to read this paper just published by Deshpande and Jensen:
http://www.degruyter.com/dg/journalprin ... j$002fjqas

Neil Paine had a blog post back in 2011, which gave me the idea: http://www.basketball-reference.com/blog/?p=9546

Posted: **Mon Apr 04, 2016 9:45 am**

mystic wrote: You simply replace the scoring margin with changes of the win probability in the response vector (usually named y). And then you run the regression normally. In that way you get win probabilities instead of APM values for each player. The interpretation would actually be that a player would raise/lower the win probablity over average by x percentage points.

Is it that straightforward? J.E. said

J.E. wrote: A team's probability to win a game is dependent on a) time left in the game, b) current lead (can be negative), c) whether they have the ball and d) the players on the court

To give an example: If your team is down 2 with 1s to go (a scenario where you're unlikely to win), and you hit a 3 as time expires you changed your team's probability of winning the game from <50% to 100%
On the other hand, if you're up 2 with 1s to go, and hit a 3, you only moved your team's probability of winning from ~>95% to 100%
Somewhat obviously, it is more important for your team to go on a run in a tight game, vs when being up by 20+

One can use the standard APM framework to carry out the analysis, but the y-vector - filled with points-per-possession in APM - gets replaced with changes of win probability.
Thanks to the APM framework we can then estimate each player's influence on win probability, controlling for factors a)-d) from above - most importantly controlling for who you're on the court with.

As you say, you run the regression normally. Which makes me think that the interpretation is Draymond increases a team's probability of winning by 13.2% holding all the other variables constant.... but what does that mean if you've included time left in the game and the team's lead at the time? I might believe, for example, that Draymond and 4 average players are favored to win 63.2% of the time against 5 average players at tip off. But to use the situation J.E. described in his original post, if Draymond's team is up 2 with 1 second left, are they actually at 108% to win instead of 95%? If his team were down 5 with 2 seconds left, would they actually have a 13% chance of winning instead of 0? Does the interpretation make sense given that team win probabilities usually stay in a tight range early in the game but can move around wildly later in a game? Maybe instead of asking how to interpret the number, I should have asked if we really believe we should interpret the values as intended.

Posted: **Mon Apr 04, 2016 9:54 am**

xkonk wrote:Which makes me think that the interpretation is Draymond increases a team's probability of winning by 13.2% holding all the other variables constant....

Yes, for the whole game, not just for a specific point of the game. It is a value per 100 possession (or like J.E. is constantly correctly pointing out per 200 poss, because it is actually based on 100 offensive and 100 defensive possessions). So, if a game has those 100 offensive and 100 defensive possessions, 4 average players on a neutral court with Green would have a win probability of 0.632 instead of 0.5. If we would interpret it as percentage, we would get 0.566 instead (13.2% increase of 0.5), and it would mean that every above average player would have a bigger effect on a better team than he would have on a weaker team.

Posted: **Mon Apr 04, 2016 10:20 am**

Crow wrote:A few more cases of guys very high on win impact estimate but not so high on RAPM estimate:
Code: Select all
Rank: Win Impact. RAPM
Dellavedova    7    98
Steven Adams  16    66
T Ross        20   230
P Patterson    9   165
C Joseph      26   199
P Mills       10    62

Clearly there's a bias in favor of guys from Australia or playing in Canada.

Posted: **Tue Apr 05, 2016 11:50 am**

mystic wrote:
Nate wrote: Wow. That's not something that I would have expected.
I didn't expect that either, and it wasn't even my intention to use that as a predictor, but rather wanted to confirm a calculation I made regarding the mentioned championship odds. I guess, that at least a part of the better prediction is based on the "leading/trailing by" effect J.E. found.
When I remember correctly, Wayne Winston found a "clutch" effect in his APM results, but as J.E. pointed out, different weights for possessions at the end of the game or completely removing "meaningless" possessions (garbage time) did not improve out-of-sample prediction for RAPM. Tbh, I don't have a good explanation for the better prediction besides the "leading/trailing by" effect getting picked up over the course of the whole game instead of just getting partly accounted for at the end of the game.

Yeah, if that better predictive power result holds up, then we have to sit up and pay more attention to this measure. It is true enough that not all plus-minus results are created equal, as shown by the examples you cite: effort levels that might vary depending on being ahead or behind, or in garbage time, or etc. And this measure might better account for that, e.g. doing real well or real poorly in garbage time isn't going to affect the win probabilities much, and the WPA regressions will automatically build that in whereas plus-minus calculations will mistakenly think those points were just as important as others.

A possible tweak to the model, which normally would improve it but might not in this case. If the regression is OLS, which is what it sounds like, then to model win probabilities linearly is problematic. Some people have already mentioned examples where the math could lead to weird results such as 108% win probabilities.

There are several standard ways of dealing with this, with perhaps the most common and easiest one being logit regressions (also called logistic regressions). Instead of using probability as the dependent variable, use the natural logarithm of the odds. I.e. instead of using p as the dependent variable, use ln( p/(1-p) ). The result is a graph with a sigmoid shape instead of a straight line; a non-linear model instead of a linear one, and no mathematically impossible results such as 108% probabilities or negative probabilities.

But that might not be a good idea in this case. Being non-linear, logit regressions can be more finicky than OLS regressions. And when a team has a 20 point lead or deficit, although a few baskets won't change the win probability much, they will affect the logit of that win probability more substantially. Which is probably a bad thing, because common sense tells us those baskets should not matter much.

Posted: **Tue Apr 05, 2016 12:14 pm**

DSMok1 wrote:So, given the crazy WPA in certain specific situations, are the results more highly regressed than normal RAPM? In other words, is the optimum lambda larger?

The optimal lambda is indeed larger, but I'm not sure yet whether that's a function of the "crazy swings" or how I define current win probability, which seems to be a (small) problem on its own. For example, for determining win probability for a situation with, say, 39:15 left and up 10, I can't just look up all occurances with the exact same time and score difference to get an expected win% - the sample would simply be too small and the noise would be huge. I, then, started to create buckets. But the bucket-creation process has an influence on the magnitude of lambda. A better approach than the one I'm currently using, or simply choosing the bucket size more carefully could lead to a different lambda. More research is needed

Kevin Pelton wrote:Jerry, do you have a good way to compare these results to standard RAPM? I'm sort of trying to do that from memory, but having them side by side to see where players differ would help me.

Here's a link with both single-year RAPM and WPA https://docs.google.com/spreadsheets/d/ ... sp=sharing
R^2 is 0.76

mystic wrote:
hoopthinker wrote: And how you compute the win probabilities from the scoring margin?
I think you want to read this paper just published by Deshpande and Jensen:
http://www.degruyter.com/dg/journalprin ... j$002fjqas

Thanks for posting the link to the paper. I didn't realize it was public yet

mystic wrote:So, if a game has those 100 offensive and 100 defensive possessions, 4 average players on a neutral court with Green would have a win probability of 0.632 instead of 0.5. If we would interpret it as percentage, we would get 0.566 instead (13.2% increase of 0.5), and it would mean that every above average player would have a bigger effect on a better team than he would have on a weaker team.

Right now it's the former: 0.632

There's a ton of things to be done here, some of them have been mentioned:
- Mess with the dependent variable
- Compare results from this to standard RAPM. Check for extreme differences in rank
- Run aging curve for this analysis. I have a hunch it might be slightly shifted towards higher age than the RAPM-aging curve
- Run SPM with this instead of RAPM. Compare differences in coefficients

Some of it will have to wait until the offseason when I'm less busy

Posted: **Tue Apr 05, 2016 1:01 pm**

mystic wrote:4 average players on a neutral court with Green would have a win probability of 0.632 instead of 0.5. If we would interpret it as percentage, we would get 0.566 instead (13.2% increase of 0.5), and it would mean that every above average player would have a bigger effect on a better team than he would have on a weaker team.

No. It doesn't work like that if I read the method in the OP right. I know that guy's posts are very rough and don't explain much but that's what I got from it.

Those numbers are additive. So a team with 2 Lebron Jameses , one below-average player with a win probability -0.8 and 2 average players is going to win 100% of the time against average teams.

Is it realistic? OFC not. You can't be sure with any APM-based method where Iverson's defense is better than Kobe but ultimately he's a Channing Frye caliber player. Making the hall of fame in the first ballot didn't help

1:29 PM - Edit Reason: Miscalculation.

Posted: **Tue Apr 05, 2016 1:52 pm**

J.E. wrote:
DSMok1 wrote:So, given the crazy WPA in certain specific situations, are the results more highly regressed than normal RAPM? In other words, is the optimum lambda larger?
The optimal lambda is indeed larger, but I'm not sure yet whether that's a function of the "crazy swings" or how I define current win probability, which seems to be a (small) problem on its own. ...

There are a lot of 'moving parts' in a analysis like this. Translating the RAPM/RWPA into predictions of "overall team success" (ostensibly this is some kind of win prediction) is also non-trivial.

Something else that comes to mind is that the WPA oriented regression might be leveraging the coaches' player assessment: Win probability gives larger numerical weight to catching up than to getting further behind. That means that players who spend more time on the floor when the team is behind will tend to have more positive WPA totals. I'm not sure how significant that non-linearity works out to be, but if coaches tend to field particular players more when the team is behind, those players will tend to have more positive WPA.

Posted: **Tue Apr 05, 2016 3:24 pm**

J.E. wrote:For example, for determining win probability for a situation with, say, 39:15 left and up 10, I can't just look up all occurances with the exact same time and score difference to get an expected win% - the sample would simply be too small and the noise would be huge.

I used 15 sec intervall, which would roughly mean that at 96 Pace each possession would be covered separately. That means I had 192 "buckets" with an actual scoring margin aligned to W/L (1/0) for the home team; OT games were ignored. As raw data I used the bbv dataset from 2006 to 2011 (playoffs included).

An obvious issue is the missing information which team has possession of the ball. And I did not test whether using the OT games as well would have an effect.

J.E. wrote:Right now it's the former: 0.632

Obviously, as I stated before, the result x has to be interpreted as raise/lower the win probability by x percentage points.

permaximum wrote:I know that guy's posts are very rough and don't explain much but that's what I got from it.

Actually, J.E. explained it very well in the first post. But I guess you got a little bit confused here, because I answered a question about the interpretation, while giving an example of interpreting the results differently at the end (as percentage instead of percentage points), which wouldn't make much sense.

Here is my first post which xkonk replied to: viewtopic.php?f=2&t=9114#p26838

mystic wrote:The interpretation would actually be that a player would raise/lower the win probablity over average by x percentage points.

https://en.wikipedia.org/wiki/Percentage_point (just in case, you don't know what percentage points means)

Posted: **Tue Apr 05, 2016 3:59 pm**

mtamada wrote:If the regression is OLS ...

No, I use Ridge Regression not OLS.

mtamada wrote: Some people have already mentioned examples where the math could lead to weird results such as 108% win probabilities.

The ridge regression doesn't know that the response vector contains probability differences (probability goes from 0 to 1). And the best approximation for the win probability at any point during the game is indeed non-linear, but that gets calculated before running the ridge regression anyway. What I need is the win probability before and after the possession. Then I take the difference of those two probabilities and write that difference into the response vector instead of the result of the possession. The design matrix doesn't get changed.

Now, at what point could the math lead to weird results? 108% doesn't exists at all, but comes from a misinterpretation of the results. Taking the results from J.E. for Green for example, simply means that with Green playing all 200 possessions (100 offensive and 100 defensive possessions) of a theoretical game with such an amount of possessions, that on neutral court Green plus average players would win 63.2 % of the times against a team full of average players. That's it.

APBRmetrics

(Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability

Re: (Adjusted) Impact on Win Probability