To quote Wikipedia
In statistics, logistic regression [..] is a type of probabilistic statistical classification model. It is also used to predict a binary response from a binary predictor
I just recently started playing around with it, but don't know much about it yet. Early tests indicate that it's decent but not ground-breaking, but this may be due to my (as of now) unfamiliarity with it.
How it works:
In standard RAPM, the response vector "y" consists of 0s, 1s, 2s, 3s (and maybe sometimes 4s and 5s), depending on how many points were scored that particular possession (~observation). In (standard) logistic regression the response vector consists of only 0s and 1s. What I did, for now, is run 3 separate regressions using 3 different "y" vectors:
- "1" if exactly 1 point was scored that possession, else "0"
- "1" if exactly 2 points were scored that possession, else "0"
- "1" if exactly 3 points were scored that possession, else "0"
(we thus get 6 coefficients per player, 3 for offense and 3 for defense)
The coefficients (~player ratings) can tell us whether a player raises or lowers his team's chances of scoring 1/2/3 points in this possession (and the opponent's team chances of scoring). To know "how much" a player raises or lowers his team's chances to score 1/2/3 points is trickier. Standard RAPM assumes a linear relationship:
Add a +3 player to an average squad and the team is expected to score 3 points over average. Add another +3 player and the team is expected to score 6(=3+3) points over average.
Logistic regression is different: You sum up the relevant coefficients (i.e. for those players that are on the court) + intercept and then plug the resulting number into the logistic function, a special case of sigmoid function, which is defined as

which gives you a probability (~ "the probability that 1/2/3 points will be scored this possession")
The sigmoid function is, obviously, not a linear function - it has an "S"-shape. As such, adding a positive player to an average lineup will give you more of a probability increase than if you added him to an already good lineup. With logistic regression you have to know the rest of the lineup (their coefficients) to know how much a player will change the outcome of a possession.
A few words on this particular implementation of logistic regression:
- the penalization parameter is 0<C<=1 with smaller numbers corresponding to stronger regularization. (Note that in ridge the penalization parameter can be 0<=alpha<inf with higher numbers corresponding to stronger regularization)
- Optimal C, found through crossvalidation, was 0.005
- intercept scaling is important as the intercept is also subject to regularization. Values of 10-100 seem to do the job here. Failure to do so will lead to the (absolute) value of the intercept being "too low"
Some results:
Chances of scoring 1 point:
Code: Select all
╔══════════════════╦════════╗
║ Player ║ Coeff ║
╠══════════════════╬════════╣
║ Ty Lawson ║ 0.118 ║
║ Dwight Howard ║ 0.116 ║
║ Jamaal Crawford ║ 0.112 ║
║ Ramon Sessions ║ 0.109 ║
║ Derrick Williams ║ 0.109 ║
║ Evan Fournier ║ 0.105 ║
║ Mason Plumlee ║ 0.100 ║
║ Elliot Williams ║ 0.096 ║
║ Alan Anderson ║ 0.091 ║
║ Andrei Kirilenko ║ 0.087 ║
║ .. ║ ║
║ Intercept ║ -3.250 ║
╚══════════════════╩════════╝
Chances of scoring 2 points:
Code: Select all
╔═══════════════════╦════════╗
║ Player ║ Coeff ║
╠═══════════════════╬════════╣
║ LaMarcus Aldridge ║ 0.126 ║
║ Anthony Davis ║ 0.120 ║
║ Al Jefferson ║ 0.112 ║
║ David Lee ║ 0.107 ║
║ Zach Randolph ║ 0.104 ║
║ Tony Allen ║ 0.103 ║
║ Andre Drummond ║ 0.100 ║
║ Rodney Stuckey ║ 0.099 ║
║ Al Horford ║ 0.096 ║
║ Ramon Sessions ║ 0.089 ║
║ .. ║ ║
║ Intercept ║ -0.600 ║
╚═══════════════════╩════════╝
Code: Select all
╔══════════════════╦═══════╗
║ Player ║ Coeff ║
╠══════════════════╬═══════╣
║ Deron Williams ║ 0.165 ║
║ Patty Mills ║ 0.159 ║
║ Anthony Tolliver ║ 0.157 ║
║ Damian Lillard ║ 0.142 ║
║ Pero Antic ║ 0.141 ║
║ Jordan Farmar ║ 0.141 ║
║ Steph Curry ║ 0.139 ║
║ Manu Ginobli ║ 0.131 ║
║ Ray Allen ║ 0.128 ║
║ Chris Copeland ║ 0.127 ║
║ .. ║ ║
║ Intercept ║ -2.20 ║
╚══════════════════╩═══════╝