LRAPM

J.E. · Post by **J.E.** » Thu Nov 27, 2014 12:13 pm

I recently found out that the "logistic regression" method of my favorite Python library can do (L2-)regularization, which is pretty much necessary when dealing with NBA lineup data because of the inherent multicollinearity.

To quote Wikipedia

In statistics, logistic regression [..] is a type of probabilistic statistical classification model. It is also used to predict a binary response from a binary predictor

I just recently started playing around with it, but don't know much about it yet. Early tests indicate that it's decent but not ground-breaking, but this may be due to my (as of now) unfamiliarity with it.

How it works:
In standard RAPM, the response vector "y" consists of 0s, 1s, 2s, 3s (and maybe sometimes 4s and 5s), depending on how many points were scored that particular possession (~observation). In (standard) logistic regression the response vector consists of only 0s and 1s. What I did, for now, is run 3 separate regressions using 3 different "y" vectors:
- "1" if exactly 1 point was scored that possession, else "0"
- "1" if exactly 2 points were scored that possession, else "0"
- "1" if exactly 3 points were scored that possession, else "0"

(we thus get 6 coefficients per player, 3 for offense and 3 for defense)

The coefficients (~player ratings) can tell us whether a player raises or lowers his team's chances of scoring 1/2/3 points in this possession (and the opponent's team chances of scoring). To know "how much" a player raises or lowers his team's chances to score 1/2/3 points is trickier. Standard RAPM assumes a linear relationship:
Add a +3 player to an average squad and the team is expected to score 3 points over average. Add another +3 player and the team is expected to score 6(=3+3) points over average.
Logistic regression is different: You sum up the relevant coefficients (i.e. for those players that are on the court) + intercept and then plug the resulting number into the logistic function, a special case of sigmoid function, which is defined as

which gives you a probability (~ "the probability that 1/2/3 points will be scored this possession")

The sigmoid function is, obviously, not a linear function - it has an "S"-shape. As such, adding a positive player to an average lineup will give you more of a probability increase than if you added him to an already good lineup. With logistic regression you have to know the rest of the lineup (their coefficients) to know how much a player will change the outcome of a possession.

A few words on this particular implementation of logistic regression:
- the penalization parameter is 0<C<=1 with smaller numbers corresponding to stronger regularization. (Note that in ridge the penalization parameter can be 0<=alpha<inf with higher numbers corresponding to stronger regularization)
- Optimal C, found through crossvalidation, was 0.005
- intercept scaling is important as the intercept is also subject to regularization. Values of 10-100 seem to do the job here. Failure to do so will lead to the (absolute) value of the intercept being "too low"

Some results:
Chances of scoring 1 point:

Code: Select all

╔══════════════════╦════════╗
║      Player      ║ Coeff  ║
╠══════════════════╬════════╣
║ Ty Lawson        ║ 0.118  ║
║ Dwight Howard    ║ 0.116  ║
║ Jamaal Crawford  ║ 0.112  ║
║ Ramon Sessions   ║ 0.109  ║
║ Derrick Williams ║ 0.109  ║
║ Evan Fournier    ║ 0.105  ║
║ Mason Plumlee    ║ 0.100  ║
║ Elliot Williams  ║ 0.096  ║
║ Alan Anderson    ║ 0.091  ║
║ Andrei Kirilenko ║ 0.087  ║
║ ..               ║        ║
║ Intercept        ║ -3.250 ║
╚══════════════════╩════════╝

An intercept of -3.25 means that a lineup with players that all have 0 as coefficient (or their coefficients sum to 0) have a ~4% chance of scoring 1 point ( 1 / ( 1 + exp( - - 3.25) ) = 3.9 )

Chances of scoring 2 points:

Code: Select all

╔═══════════════════╦════════╗
║      Player       ║ Coeff  ║
╠═══════════════════╬════════╣
║ LaMarcus Aldridge ║ 0.126  ║
║ Anthony Davis     ║ 0.120  ║
║ Al Jefferson      ║ 0.112  ║
║ David Lee         ║ 0.107  ║
║ Zach Randolph     ║ 0.104  ║
║ Tony Allen        ║ 0.103  ║
║ Andre Drummond    ║ 0.100  ║
║ Rodney Stuckey    ║ 0.099  ║
║ Al Horford        ║ 0.096  ║
║ Ramon Sessions    ║ 0.089  ║
║ ..                ║        ║
║ Intercept         ║ -0.600 ║
╚═══════════════════╩════════╝

.. and 3

Code: Select all

╔══════════════════╦═══════╗
║      Player      ║ Coeff ║
╠══════════════════╬═══════╣
║ Deron Williams   ║ 0.165 ║
║ Patty Mills      ║ 0.159 ║
║ Anthony Tolliver ║ 0.157 ║
║ Damian Lillard   ║ 0.142 ║
║ Pero Antic       ║ 0.141 ║
║ Jordan Farmar    ║ 0.141 ║
║ Steph Curry      ║ 0.139 ║
║ Manu Ginobli     ║ 0.131 ║
║ Ray Allen        ║ 0.128 ║
║ Chris Copeland   ║ 0.127 ║
║ ..               ║       ║
║ Intercept        ║ -2.20 ║
╚══════════════════╩═══════╝

If nothing else, this could be useful for simulation as it gives us more detailed information on the potential outcome of a possession

Crow · Post by **Crow** » Thu Nov 27, 2014 5:50 pm

If Lrapm were implemented, would there be a stronger basis for believing the lineup lrapm as calculated or would the 120% factor still be employed at set level or would it have to vary now?

In general Jerry, what do you know or think about RAPM blends and optimizing them properly?

v-zero · Post by **v-zero** » Thu Nov 27, 2014 6:31 pm

J.E. I actually have a question about the playing from ahead factor...do you think it is valid to include as part of the model when intending to represent player value? Surely it is an attribute of each player (the extent to which they cause their team to 'coast' by being good or play catch up by being bad), and not something that should be factored out?

As for LRAPM, it's a very cool idea. I've done the same before with box score numbers, but never bothered to also put it in the RAPM style framework - initial results look pretty solid.

J.E. · Post by **J.E.** » Thu Nov 27, 2014 6:37 pm

Crow wrote:the 120% factor

the what?

sndesai1 · Post by **sndesai1** » Thu Nov 27, 2014 6:49 pm

Crow wrote:If Lrapm were implemented, would there be a stronger basis for believing the lineup lrapm as calculated or would the 120% factor still be employed at set level or would it have to vary now?

In general Jerry, what do you know or think about RAPM blends and optimizing them properly?

isn't the 120% used with bpm?

Crow · Post by **Crow** » Thu Nov 27, 2014 7:16 pm

Daniel said BPM's need to adjust lineup sum of individual BPM to be accurate at lineup level was similar to what RPM 'did' to account for behavior when way ahead or behind.

v-zero is asking about the same thing in different words.

To my impression the play from ahead / behind adjustment could be the crudest, least individual, least accurate element in BPM or RPM. Perhaps LRAPM applied and studied to very specific game situations and by player could create an understanding that could replace this factor in RPM.

xkonk · Post by **xkonk** » Fri Nov 28, 2014 8:40 pm

Are there libraries that will fit regularized multinomial regressions? You could predict the probability of the small class of possible results (0,1,2,3,4) in one model using the entire data set instead of fitting three regressions and then trying to figure out how to cobble them together. I haven't looked into it, but I imagine R must have such a function.

J.E. · Post by **J.E.** » Sat Nov 29, 2014 12:01 am

xkonk wrote:Are there libraries that will fit regularized multinomial regressions? You could predict the probability of the small class of possible results (0,1,2,3,4) in one model using the entire data set instead of fitting three regressions and then trying to figure out how to cobble them together. I haven't looked into it, but I imagine R must have such a function.

The Python function I've used so far (from the sklearn library) can be fed with values other than just {0, 1} for the y-vector, but says this in the description

In the multiclass case, the training algorithm uses a one-vs.-all (OvA) scheme, rather than the “true” multinomial LR.

I'm not entirely sure how that affects the outcome (i.e. the coefficients)

If I do feed it with a y-vector that contains 0s, 1s, 2s, 3s (rather than just 0s and 1s) it simply returns 4 intercepts and 4 sets of coefficients. Neither the intercept nor the coefficients are different from if I had run 4 different logistic regressions on the different possession outcomes

Crow · Post by **Crow** » Sat Nov 29, 2014 6:16 pm

Putting the play ahead / behind factor aside since there was no response yet, what about rapm blends? Is it fair to say that a lot of error rapm has is over fitting and, if so, would a rapm blend, to the extent they are overfit in different ways, tend to bring the avg. error down? That would be the hope.

APBRmetrics

LRAPM

LRAPM

Re: LRAPM

Re: LRAPM

Re: LRAPM

Re: LRAPM

Re: LRAPM

Re: LRAPM

Re: LRAPM

Re: LRAPM