Problems with Linearity Assumptions

Home for all your discussion of basketball statistical analysis.
Post Reply
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Problems with Linearity Assumptions

Post by DSMok1 »

Nick Neuteufel has been bringing up on Twitter some issues with the linearity assumption used by Generalized Linear Models used by such stats as Box Plus/Minus, SPM, and others.

In particular, DRB% fails to show any linear link function to RAPM, or even any linear link function for any transformation of DRB%.

An example of the issue:
Image
(p-val of GVLMA is 6.068e-13.)
DRB% is a non-linear predictor that violates the linearity assumption of linear regression. More on that: http://t.co/PAhWovJFUl
Links to some of the tweets so the conversation can be followed:
https://twitter.com/Neuteufel/status/548135893354442754
https://twitter.com/Neuteufel/status/549630822232653826

Nick promised an upcoming post on the subject, so when that appears I'll link to it here.

EDIT: Paper on Global Validation of Linear Model Assumptions (a good read): http://www.google.com/url?sa=t&rct=j&q= ... 1339,d.aWw
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
nrestifo
Posts: 52
Joined: Tue Oct 07, 2014 1:23 pm

Re: Problems with Linearity Assumptions

Post by nrestifo »

I talked to him about this. I've been looking forward to reading it.
mtamada
Posts: 163
Joined: Thu Apr 14, 2011 11:35 pm

Re: Problems with Linearity Assumptions

Post by mtamada »

Well if transformations don't work, then an easy and standard patch is to use what econometricians call slope dummies, i.e. allow the regression line to have one or more kinks in it. That results in a piecewise linear regression line; if we really need to have a curved regression line then we can resort to cubic splines.

If the non-linearity is truly serious then we'll get much better estimates of the effects of defensive rebounding.

But eyeballing that graph, the non-linearity doesn't look that large to me. Plus or minus half a point throughout almost all of the range. Yes half a point is large in some contexts, but plus-minus regressions have large standard errors to begin with. Will an improved functional form lead to a large revision in our estimate of the effect of defensive rebounds?

Maybe; and given the statistical significance that he's found it could well be worthwhile to use a better-fitting functional form.

The other question is will these improved functional form lead to different estimates of the other coefficients? That's one of the hidden pitfalls of mis-specifying the functional form for defensive rebounds: the other estimates become inaccurate. Maybe we find out that the old regressions have been mis-evaluating assists, shooting, etc. Again we won't know until we re-run the regressions, but my guess is that this won't cause major revision of the estimates.
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: Problems with Linearity Assumptions

Post by v-zero »

Methods such as the use of boosted decision stumps (with linear regressions at their terminal nodes if required) can avoid the linearity question for the most part.
Crow
Posts: 10624
Joined: Thu Apr 14, 2011 11:10 pm

Re: Problems with Linearity Assumptions

Post by Crow »

What would be the objections to modeling defensive rebounds by a different form than most or all of the rest?
xkonk
Posts: 307
Joined: Fri Apr 15, 2011 12:37 am

Re: Problems with Linearity Assumptions

Post by xkonk »

If memory serves, some potentially helpful predictors for SPM were discarded simply because the results were undesirable (e.g., Dennis Rodman was rated really, really highly). Why not make the executive decision that this non-linearity won't be adjusted for?
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Problems with Linearity Assumptions

Post by DSMok1 »

xkonk wrote:If memory serves, some potentially helpful predictors for SPM were discarded simply because the results were undesirable (e.g., Dennis Rodman was rated really, really highly). Why not make the executive decision that this non-linearity won't be adjusted for?
That is not the case; BPM was not adjusted to choose predictors based on outputs.

Early on, I experimented with a lot of non linearity, but it was not stable out of sample at all.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
xkonk
Posts: 307
Joined: Fri Apr 15, 2011 12:37 am

Re: Problems with Linearity Assumptions

Post by xkonk »

I wasn't referring to BPM specifically; I had your ASPM more in mind. I think this thread might be what I was thinking of: viewtopic.php?f=2&t=21&p=34&hilit=Rodman+DRB#p34 . It raises a fair question though: if you were willing to change the form of the regression for one metric, why not another?
mtamada
Posts: 163
Joined: Thu Apr 14, 2011 11:35 pm

Re: Problems with Linearity Assumptions

Post by mtamada »

DSMok1 wrote: Early on, I experimented with a lot of non linearity, but it was not stable out of sample at all.
Yeah, there are a ton of examples where non-linear models have more weaknesses than strengths.

I suspect that this is one of them, or that some mildly alternative functional form will cover most of the non-linearity, and won't lead to radically different overall results.

But I don't know that for sure. When we detect non-linearity, then we need to investigate ways of dealing with it. So by all means NickN should continue his research, it's potentially important. But I don't expect it to be, especially hearing that you looked into it extensively already.


Another way to describe it: to assume a linear functional form is absurd and leads to inherently weak models. But when we use a non-linear functional form, it often (not always, but often) ends up being even worse than the linear one. Typically I'll try to find a simple patch of some sort that deals with the worst of the non-linearities, such as slope dummies (this is assuming that transformations failed to deal with the problem, which is evidently the case here).
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Problems with Linearity Assumptions

Post by DSMok1 »

xkonk wrote:I wasn't referring to BPM specifically; I had your ASPM more in mind. I think this thread might be what I was thinking of: viewtopic.php?f=2&t=21&p=34&hilit=Rodman+DRB#p34 . It raises a fair question though: if you were willing to change the form of the regression for one metric, why not another?
Right, that was really early on, when I was just starting out. I was going by smell test back then... :)
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
colts18
Posts: 313
Joined: Fri Aug 31, 2012 1:52 am

Re: Problems with Linearity Assumptions

Post by colts18 »

I don't think you should take Defensive rebounds away from the regression. It helps identify the good defenders. It even does a good job with perimeter defenders. MJ, Kobe, LeBron, Wade, Carter were all guys with really good DReb for guards and they were generally good defenders. Dreb is a good proxy for height which is correlated with defense for perimeter players.

Here are some suggestions to add to your model.
1. Usage%. I assume usage% is not linear just like Dreb%. Look to see if you can model usage% better. I'm not sure a 40% usage% is 2x more valuable than a 20% usage%.

2. Games played and games started. Games played is a good proxy for injuries which makes players less effective. You should look into it. Games started is a proxy for being a good player plus it does a good job of adjusting for competition faced.

3. Adjust 3 point rate for height. A big man with a high 3 point rate provides more spacing than a guard. Of course you can use that same regression to downgrade those big men on defense because they are generally not good on defense (ex: Ryan Anderson, Mullens, etc.)

4. adjust free throw rate to usage. I think you didn't get any correlation for free throw rate because big men with low usage rates generally have high ft rates. Give more credit to high usage guys (usage that subtracts FTA) with high FTr. Instead of FTr, try using FTA per 100 possessions. That will reward high FTA guys instead of guys with 2 FTA/g on 3 FGA/g.

5. Look into high minutes low offensive stat guys without the defensive counting stats (blks, stls). Joe Dumars never had a positive DBPM during his career. Adjust the stat so that high MPG like him can get credit for defense if they aren't producing much on offense because those kinds of players wouldn't be playing 35-40 MPG if they couldn't play offense or defense.
Crow
Posts: 10624
Joined: Thu Apr 14, 2011 11:10 pm

Re: Problems with Linearity Assumptions

Post by Crow »

I asked Nick about it last week. He said he planned to do something this past weekend... but he didn't seem that committed to it. So at this point, I wouldn't count on it.
Post Reply