Page 1 of 2

Shot Difficulty metric

Posted: Wed May 13, 2015 8:58 pm
by knarsu3
http://nyloncalculus.com/2015/05/13/int ... -playoffs/

The Shot Difficulty metric uses multivariate logistic regression to adjust for shot distance, defender distance (capped at 10), shot clock, number of dribbles, where the game was played (home/road), when in the game the shot occurred (to adjust for the fact that defense is tighter in the clutch), and height difference, which is the height of the shooter minus the height of the defender.

Eventually, I'd like to add in point differential at the time of the shot or some type of win probability/leverage type statistic to better account for when in the game the shot was taken. Additionally, having location data for multiple defenders would really improve the model as a few of the shots in the article had two defenders guarding the shot.

The Shot Difficulty metric approximates FG% and so you can multiply by the point type of each shot to get an Expected Value on the Shot Difficulty, which would appropriately adjust for the three point shot being worth 1 more point. The players with the lowest cumulative EV on Shot Difficulty would be the guys who have the worst shot selection in the league.

Any other thoughts or suggestions?

Re: Shot Difficulty metric

Posted: Wed May 13, 2015 9:31 pm
by Mike G
...Curry’s shot is fairly similar to LeBron’s. Both shots were on the road with the closest defender distance about the same. Although, one difference is that Curry probably did get fouled. But in terms of difficulty, they were fairly similar. Curry’s shot was slightly closer but he also attempted his shot over a taller defender—Tyreke Evans—who is three inches taller than Curry...
Evans was one defender, Anthony Davis was the other. Maybe both guys fouled. The 'defender distance' is shown as 4.6 -- shouldn't it be closer to zero?

A defender might be right on your hip, but if he isn't facing you, he isn't really defending.
Maybe the relevant 'defender distance' would be how close he came to blocking the shot? That incorporates the skill of the defender, his size, hops, and everything else.

Re: Shot Difficulty metric

Posted: Wed May 13, 2015 10:12 pm
by knarsu3
Mike G wrote:
...Curry’s shot is fairly similar to LeBron’s. Both shots were on the road with the closest defender distance about the same. Although, one difference is that Curry probably did get fouled. But in terms of difficulty, they were fairly similar. Curry’s shot was slightly closer but he also attempted his shot over a taller defender—Tyreke Evans—who is three inches taller than Curry...
Evans was one defender, Anthony Davis was the other. Maybe both guys fouled. The 'defender distance' is shown as 4.6 -- shouldn't it be closer to zero?

A defender might be right on your hip, but if he isn't facing you, he isn't really defending.
Maybe the relevant 'defender distance' would be how close he came to blocking the shot? That incorporates the skill of the defender, his size, hops, and everything else.
I missed pointing this out in the article but I noticed something similar for Pierce's shot. The defender distance comes from the SportVU shot logs found here (as does everything else): http://stats.nba.com/player/#!/201939/t ... shotslogs/

They listed Evans as the closest defender at 4.6 feet away but I agree with you that it seems closer. This is obviously one of the issues with the data- even though SportVU measures defender distance, it can't measure the up/down and so I think some shots are probably more difficult than their defender distance might indicate. Unfortunately, I don't really have an easy answer for improving that aspect of the model. I tried including height differential to help and I think it does but even still, it's not perfect.

This could be interesting to look at with the Vantage data because they track hand up/hand down but an issue with their data is there is no exact distances- it's binned instead (contested, pressured, etc.). As I mentioned in the article, if we had the location data of multiple defenders, that would definitely help the model. But unfortunately for now, those will be some of the limitations.

Re: Shot Difficulty metric

Posted: Thu May 14, 2015 12:19 am
by italia13calcio
Did you check for any interaction terms? It seems like they would be likely to pop up.

Re: Shot Difficulty metric

Posted: Thu May 14, 2015 4:16 pm
by NateTG
Whenever I run logistic regressions, I want to try ln(x) as a dependent variable for anything strictly positive like shot distance.

Re: Shot Difficulty metric

Posted: Fri May 15, 2015 3:45 pm
by knarsu3
italia13calcio wrote:Did you check for any interaction terms? It seems like they would be likely to pop up.
Like what specifically? Hadn't really thought of any that would make sense.

Re: Shot Difficulty metric

Posted: Fri May 15, 2015 3:48 pm
by knarsu3
NateTG wrote:Whenever I run logistic regressions, I want to try ln(x) as a dependent variable for anything strictly positive like shot distance.
Thanks- yeah I tried ln(defender distance) and shot distance^2 and cubed. These actually do help. But I'm noticing a big problem is for shots near the basket- I may treat this as a separate model.

Re: Shot Difficulty metric

Posted: Fri May 15, 2015 5:13 pm
by DSMok1
Seems like motion of the shooter should be very significant for longer shots. Do you have any way to account for this? Is a player fading away or standing still? This would change some near the basket.

Re: Shot Difficulty metric

Posted: Fri May 15, 2015 10:55 pm
by italia13calcio
knarsu3 wrote:
italia13calcio wrote:Did you check for any interaction terms? It seems like they would be likely to pop up.
Like what specifically? Hadn't really thought of any that would make sense.
Shot Distance and Distance from defender is what initially popped into mind. I've played around with the data in the past and noticed that distance from defender matters a lot more near the rim. Might help with your problem with shots near the rim. I've done a bit of work with the same dataset in the past, would be happy to talk over stuff.

Re: Shot Difficulty metric

Posted: Sat May 16, 2015 5:42 pm
by AcrossTheCourt
I really don't think you should pool layups/dunks together with long jump shots. It seems that they're fundamentally different. Think about this: what does a high number of dribbles suggest for a shot inside? And outside?

I've done work on this before, and I think there's a better form:
http://analyticsgame.com/nba/stat-explo ... ntage.html

I use it for individual players too, calculating their "expected" open FG% based on distance.

I'm a bit wary of using shot clock time and dribbles and other factors. The shot clock doesn't physically guard him. It's really only forcing you into a tougher shot, which you should discern from other variables. We don't have every variable, so it's picking up other things like who's taking the shot and and where the second closest defender is, etc. And are you really using a linear term for dribbles in the equation? (I know the equation itself isn't linear.) What's the effect? Can you graph the FG% change based on every relevant variable? Like the elasticity. Just show it over the possible range -- 0 dribbles to 10 or more.
italia13calcio wrote:
knarsu3 wrote:
italia13calcio wrote:Did you check for any interaction terms? It seems like they would be likely to pop up.
Like what specifically? Hadn't really thought of any that would make sense.
Shot Distance and Distance from defender is what initially popped into mind. I've played around with the data in the past and noticed that distance from defender matters a lot more near the rim. Might help with your problem with shots near the rim. I've done a bit of work with the same dataset in the past, would be happy to talk over stuff.
In that case, it makes more sense to do a model for shots near the rim and shots away from the basket, not a messy interaction term.

Re: Shot Difficulty metric

Posted: Sat May 16, 2015 5:46 pm
by AcrossTheCourt
Touch time might be important for catch-and-shoot shots (no dribbles.) Taking a shot right away in rhythm is different than holding for five seconds.

Re: Shot Difficulty metric

Posted: Sat May 16, 2015 6:28 pm
by italia13calcio
AcrossTheCourt wrote:I've done work on this before, and I think there's a better form:
http://analyticsgame.com/nba/stat-explo ... ntage.html

....

In that case, it makes more sense to do a model for shots near the rim and shots away from the basket, not a messy interaction term.
Pretty cool stuff, hadn't seen that before. Makes sense too. And I agree, seperate models is probably better, but if you are using just one model, like he initially was, I would think an interaction term would probably be good.

Re: Shot Difficulty metric

Posted: Sat May 16, 2015 10:38 pm
by knarsu3
DSMok1 wrote:Seems like motion of the shooter should be very significant for longer shots. Do you have any way to account for this? Is a player fading away or standing still? This would change some near the basket.
I didn't include this in there but that would basically be shot type right? i.e. fadeaway jump shot as opposed to jump shot. I'm sure this would be significant but part of the reason I chose not to include it in there is because it's not listed in the SportVU shot logs making it harder to reproduce immediately after a game. The other issue is that I do wonder how accurate the shot type data is in the pbp but I'm going to try that.

Re: Shot Difficulty metric

Posted: Tue May 19, 2015 4:03 am
by knarsu3
AcrossTheCourt wrote:I really don't think you should pool layups/dunks together with long jump shots. It seems that they're fundamentally different. Think about this: what does a high number of dribbles suggest for a shot inside? And outside?
I'm actually separating them now. I've split it with >5 feet and <5 feet.
AcrossTheCourt wrote:I've done work on this before, and I think there's a better form:
http://analyticsgame.com/nba/stat-explo ... ntage.html

I use it for individual players too, calculating their "expected" open FG% based on distance.
This is pretty great. I'll have more to comment on this in a bit.
AcrossTheCourt wrote: I'm a bit wary of using shot clock time and dribbles and other factors. The shot clock doesn't physically guard him. It's really only forcing you into a tougher shot, which you should discern from other variables. We don't have every variable, so it's picking up other things like who's taking the shot and and where the second closest defender is, etc. And are you really using a linear term for dribbles in the equation? (I know the equation itself isn't linear.) What's the effect? Can you graph the FG% change based on every relevant variable? Like the elasticity. Just show it over the possible range -- 0 dribbles to 10 or more.
But shots later in the shot clock are more difficult, regardless of shooter (found that in another study I did earlier). I realize the shot clock and the number of dribbles aren't guarding the shooter but at the same time, it does make the shot more difficult. This is what I was going for with the metric.

I was but I've scrapped that model. It doesn't make sense. Here's the new model:
Image

Will now be looking/graphing the different covariates with FG% over the range.

Re: Shot Difficulty metric

Posted: Tue May 19, 2015 4:53 am
by knarsu3
Here's what I have for the dribbles covariate vs. Avg Shot Difficulty when holding all other variables at their mean (in my current model where I'm using ln(dribbles+1):

Image

Here it is comparing it to actual FG% at each # of dribbles (again, only for long shots defined >5 feet):
Image

It does actually look like it could be a linear fit but a logarithmic fit seems to work too (and makes more intuitive sense given the difference between 0 and 1 dribbles).