mystic wrote:
Now, at what point could the math lead to weird results? 108% doesn't exists at all, but comes from a misinterpretation of the results. Taking the results from J.E. for Green for example, simply means that with Green playing all 200 possessions (100 offensive and 100 defensive possessions) of a theoretical game with such an amount of possessions, that on neutral court Green plus average players would win 63.2 % of the times against a team full of average players. That's it.
Since I provided the misinterpretation, let me be more explicit and someone can tell me where I went wrong.
J.E. wrote: A team's probability to win a game is dependent on a) time left in the game, b) current lead (can be negative), c) whether they have the ball and d) the players on the court.
To give an example: If your team is down 2 with 1s to go (a scenario where you're unlikely to win), and you hit a 3 as time expires you changed your team's probability of winning the game from <50% to 100%
On the other hand, if you're up 2 with 1s to go, and hit a 3, you only moved your team's probability of winning from ~>95% to 100%
Somewhat obviously, it is more important for your team to go on a run in a tight game, vs when being up by 20+
One can use the standard APM framework to carry out the analysis, but the y-vector - filled with points-per-possession in APM - gets replaced with changes of win probability.
Thanks to the APM framework we can then estimate each player's influence on win probability, controlling for factors a)-d) from above - most importantly controlling for who you're on the court with.
J.E.'s running a ridge regression as opposed to a standard one, but we can set that aside. He's trying to get the B's that best fit an equation that looks like (change in win probability per possession) = B0 + B1*(time in game) + B2*(current lead) + B3*(possession of ball) + (lots of Bs times player on/off court). Once the regression provides the Bs, if you want to get the regression's estimate for some situation you just enter all the appropriate factors - players on court, time in game, etc.
When J.E.'s table says that Draymond is 13.2 per 200 possessions, I assume that at the least, that means that the B for Draymond is actually something like 0.066 per possession, and J.E. has multiplied the output by 200 to give a more readable number. That isn't especially important. My understanding of how to interpret a regression coefficient is that you say that changing the value of an X by 1 is expected on average to change the value of y by the appropriate B
holding all other Xs constant. In J.E.'s example, if the time in game is 1 second, the lead is 2, and possession is "your team", and a group of average players going against another group of average players, the y is something like 5 if your team hits a 3 (the ~100% - ~95%). But say that before hitting the 3, they call time-out and substitute Draymond in. Now they hit the 3. Instead of adding 5%, they should add 5+.066 = 5.066%. So that was a mistake on my part; the example in my post before shouldn't have added 13% because it wasn't over 200 possessions.
But, I'm still not sure that you interpret it as '13.2% over an entire game'. It's over 200 possessions at some particular time left and lead, right? If the team in my example above played 200 games where they substituted Draymond in before hitting a 3 with 1 second left and a 2 point lead, they would gain 5.066% instead of 5% 200 times. And you could still have some odd results, like if the team were to substitute Draymond in for defense after the 3 instead of for offense before the 3. They would go from 100% to 100.066%. If you had one of the Warriors' best lineups against an average group in the same circumstance, you could get up to about 100.2%. Right?