putting some math to the problem of shot selection

Crow · Post by **Crow** » Sun Mar 04, 2012 4:56 pm

Over the course of an entire season if you shifted 20-40% of the shots that came between 15-20 seconds to half instead coming between 5-10 seconds and half coming between 10-15 seconds what would be your expected gain based on average efficiencies? How many wins or how large a fraction of an expected win would that represent? How do the marginal efficiencies compared to the average ones? How useful would team level data be for these questions?

EvanZ · Post by **EvanZ** » Sun Mar 04, 2012 5:14 pm

Crow, shifting to earlier jump shots would mean giving up later inside shots which are more valuable. There's clearly some optimum to be found between waiting for a good inside shot and taking an earlier jumpshot. As I suspected, these data should greatly aid finding that.

mcoughlin · Post by **mcoughlin** » Sun Mar 04, 2012 6:22 pm

Hi,

Sadly the data I have does not specify the shot location where a shooting foul occurs, so I am unable to account for that differential with the data I have.

Efficiency for shots within intervals of 0-4, 4-15, 15-22, 22-50 within elapsed shot-clock intervals of 0-5,5-10,10-15,15-20,20-24:
0.6518 0.6166 0.5892 0.5632 0.5266
0.4460 0.4153 0.4091 0.3999 0.3785
0.4283 0.4294 0.4124 0.4072 0.3793
0.3398 0.3764 0.3818 0.3774 0.3498

Number of shots per game for the same intervals:

20.1455 12.7363 12.1705 9.2510 4.5185
3.3302 6.0803 10.1781 10.0724 6.1288
3.4463 8.5774 13.2599 11.8291 7.1106
5.5456 10.4075 11.4550 10.4033 7.1519

Shooting long 2's during 5-10 and 10-15 has an increase in efficiency over 15-20 of:
0.0223 0.0052

Moving ALL 15-20 shots to these intervals yields a net point increase of:
0.5276 (2*0.0223*11.8291) 0.1230 (2*0.0052*11.8291)

Is this what you meant? Note, I did this quickly and there may be mistakes in my code.

Thank you,
Michael

Crow · Post by **Crow** » Sun Mar 04, 2012 7:06 pm

EvanZ wrote:Crow, shifting to earlier jump shots would mean giving up later inside shots which are more valuable. There's clearly some optimum to be found between waiting for a good inside shot and taking an earlier jumpshot. As I suspected, these data should greatly aid finding that.

Good point, if you were substituting the chance for inside shoot for a choice to take an earlier outside jumper. But to what extent can you look for / decide to take an inside shot, even if it is somewhat forced? Degree of contest needs to be a part of the analysis. I happen to think guards forgo inside passes / inside "looks" by their big men more than they should (in favor of taking lots of time looking for their own shot or taking it, or giving it to another perimeter player to do the same, but there is heightened turnover risk from interior passes so I don't know how far from optimum decision-making they are. I don't know if they are too risk averse about turnovers in addition to being shot happy themselves, but they might be.

EvanZ · Post by **EvanZ** » Sun Mar 04, 2012 7:10 pm

I'm just making the point that once you decide to take a shot, you clearly give up the opportunity to have another one. If you told me that 2-pt shots taken early on were more efficient than inside shots taken later, or even much more efficient than 2-pt shots taken later, I'd agree that teams might want to take those shots earlier.

I think these data suggest why they don't do it.

Crow · Post by **Crow** » Sun Mar 04, 2012 7:13 pm

Thanks for the data Michael.

Based on what you report shifting all 15-20 second jumpers only yield a small fraction of a point. Shifting 20-40% of them would yield even less. And paying any price in lost opportunities for inside shots or foul shots off those attempts would really make it unwise to make this shift.

Crow · Post by **Crow** » Sun Mar 04, 2012 7:36 pm

The 82 games study makes clear there is a big difference between low paint shots and high paint shots, baseline 2s and wing 2s. You got to be real close or you have not gained on immediate offensive efficiency and all 3 point zones are better choices than shots from fairly close but not at the rim.

Based on what was reported at Sloan about offensive rebounding you don't gain anything from that with 5-10 feet jumpers either, in fact you get less offensive rebounds / 2nd chances with mid-range shots than from point blank or from 3.

A general preference for uncontested or lightly contested 3 pt attempts over any 2 pt jumpers but wide open mid-rangers seems wise.

Michael's data shows 3 pt shots per game for the time intervals of 5-10 seconds, 10-15, and 15-20 seconds pretty evenly distributed. It would be a leap to say that the level of opportunities are the same but i tmay be roughly true. The quality of the FG% results are pretty close.

The top level data suggests you should do everything to try to get the shot off before about 17 seconds elapsed and it might be better to try to be done a bit sooner. Accounting for the prospects of an inside shot or foul and better chances for offensive rebounds however might push things back so that you might be willing to go down to 7 seconds or even 5 seconds left before you just throw up anything.

schtevie · Post by **schtevie** » Thu Mar 08, 2012 3:53 pm

Michael, thank you for the work and the lovely plots. I am curious how this influences people's opinions on the emperor's new clothes (same as the old clothes).

mcoughlin wrote:

First a couple points about the data, then its interpretation, regarding optimal shot selection.

To directly address the question of whether a shot should be taken or the offense continued, the total points realized (in expectation) needs to be defined. The plot above shows points per shot attempt for various distances. It does not include points from free throws or those expected after an offensive rebound. For present purposes, the effect of these factors on the plots can be described with reasonable accuracy.

Taking data from the 82game shot location study Crow referenced, I come up with a basic estimate of the additional ft/fga for "In the Paint", "2 pt-Jumpers", and "Threes" to be approximately 0.34, 0.10, and 0..05 respectively. Adding such factors (and multiplying the efficiency of 3 pointers by 1.5) what the plots would show (assuming the foul "premium" is constant over the shot clock) is an incremental efficiency gain equivalent to 0.58 points per shot of "0 to 4 footers" over other 2s and 0.28 points per shot for 3 pointers. And this is just the premium from including foul shooting.

We also know that taking into account the points from expected continuing possessions would create a larger wedge still. Offensive rebounds are more likely to follow missed 3 pointers than missed 2 point jumpers. And we know that, though the offensive rebounding difference is a bit of a wash for missed close shots vs. 2 point jumpers, the additional points for put back opportunities make the continuing possession more valuable in the case of a missed shot from close in.

The bottom line is that the relevant picture in our mind should be a blue line relatively higher than the "braided" red and green than it is now and a turquoise line splitting the difference about midway.

The question then is whether such a reality can be said to represent (approximate) optimal shot selection. The answer, of course, depends critically on the likelihood of turnovers. I think what would speak decisively to the point would be data on the probability of a future turnover for each second of shot clock time.

Pending this, there is a specific, incontrovertible statement that can be made given what is shown (again, after taking into account the expected points from foul shots and continuing possessions). The red/green "braid" defines the current shot threshold for the average NBA offense. NBA offenses take these "worst" shots early and often. And if NBA offenses approximate an optimal shot selection, this implies that a better quality shot (and its consequences) can not be expected.

My conjecture is that this ain't so and that the counterfactual gains are large from a competitive standpoint. And hopefully someone will be inspired to present the turnover data to speak directly to the point.

EvanZ · Post by **EvanZ** » Thu Mar 08, 2012 5:49 pm

schtevie wrote: We also know that taking into account the points from expected continuing possessions would create a larger wedge still. Offensive rebounds are more likely to follow missed 3 pointers than missed 2 point jumpers. And we know that, though the offensive rebounding difference is a bit of a wash for missed close shots vs. 2 point jumpers, the additional points for put back opportunities make the continuing possession more valuable in the case of a missed shot from close in.

On the offensive rebounding point, I tried to roughly replicate some of the findings from that Sloan paper using PBP data, by regressing different shot types on OREB%.

Code: Select all

Call:
lm(formula = ORB ~ HOME + DUNK + LAYUP + HOOK + MID + LONG + 
    BLOCK:DUNK + BLOCK:LAYUP + BLOCK:HOOK + BLOCK:MID, data = reb_locR, 
    weights = OPP)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2381 -0.3629 -0.1728  0.3224  1.8020 

Coefficients: (1 not defined because of singularities)
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.272021   0.002251 120.835  < 2e-16 ***
HOME         0.007217   0.002167   3.330 0.000870 ***
DUNK         0.088506   0.013321   6.644 3.07e-11 ***
LAYUP        0.080680   0.003305  24.411  < 2e-16 ***
HOOK        -0.022137   0.006641  -3.333 0.000859 ***
MID         -0.016206   0.001226 -13.216  < 2e-16 ***
LONG               NA         NA      NA       NA    
DUNK:BLOCK  -0.004805   0.019967  -0.241 0.809819    
LAYUP:BLOCK  0.030273   0.004930   6.141 8.26e-10 ***
HOOK:BLOCK   0.093669   0.016619   5.636 1.75e-08 ***
MID:BLOCK    0.005074   0.001520   3.339 0.000842 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.4582 on 63237 degrees of freedom
Multiple R-squared: 0.02309,	Adjusted R-squared: 0.02295 
F-statistic: 166.1 on 9 and 63237 DF,  p-value: < 2.2e-16

You can read these as %'s. LONG (3-pt) says NA, but that's really the intercept (27%). A missed dunk has about a 9% increase in chance of getting rebounded by the offense (so 36%). Layup is about 8% (35%). A hook shot is about 2% lower (25%), which is somewhat surprising, maybe because the defender is already basically blocking out the shooter and need only turn around to grab the rebound. Mid-range shot is 1.6% lower (25%). And then you can see the results after a block, which actually increases the % chance on layups (30%) and hooks (34%).

These are consistent with the old data that 82games published from their game charting studies. The Sloan rebounding paper using optical tracking gave the following %'s:

<6 ft ... 36%
6-10 ft ... 28%
10-22 ft .. 21.5%
>23 ft .. 25.5%

Again, roughly consistent with the regression. In PBP "mid-range" is any 2-ft jumper, so that's why it falls between those two ranges shown there. Just thought this might be interesting. I should point out that a more direct way of doing this is to keep track of the state of every single play and count the rebounds/shot types directly, but my code is not exactly set up to do that right now, so the regression is the best approximation I can give currently.

schtevie · Post by **schtevie** » Thu Mar 08, 2012 8:13 pm

Evan, thanks for the results. My guess is that no one familiar with these general issues is prepared to disagree with the notion that foul shooting and continuing possessions on net augment the returns for close-in shots and three pointers relative to shots in between.

That said, it occurs to me that there is a particular observation about Michael's presentation that I didn't make and that should be made. What is novel about it - as regards the question of shot selection - is the establishment of the relatively shallow slope one sees for medium/long 2s throughout the shot clock (not just for half court offenses). So, basically, the quality of the medium/long 2 you get at 10 seconds, on average, is approximately what you get at 20 seconds (a 0.03 to 0.04 difference, not nothing, but not that big of a deal given the alternatives available).

Why this is important for the larger question at hand is that it perhaps eliminates (pending data on turnover probabilities) the strongest objection to the argument I raised about how the aggregate data about the efficiency of long 2s very strongly suggests that NBA offenses don't optimize in their shot selection (again, on average). The steeper the slope (for a given average) the more likely medium/long 2s taken early in the shot clock are optimal (given turnover probabilities).

And it could well be the case that the average data presented does pass this "weak" optimality test - this test being the lowest bar to clear. That is, taking a medium/long two of average efficiency of 0.43 at 5 seconds might yield the roughly equivalent number of points to waiting for a shot at a 3 or a close in shot, what with the possibility of a turnover and the slight decline in the expected quality of a medium/long 2. Hopefully, we'll see.

But to establish optimality, there are other hurdles that must be cleared. I already cited the "bad shooter" data, where about two thirds of medium/long 2s are taken by players whose yearly averages are worse than a last second shot. And this is such a high proportion and given the flatness of the distribution in question, it must be the case that these are taken throughout the shot clock, and these will surely be below the optimal threshold (in the sense that turnover probabilities cannot be nearly high enough that continuing with the offense won't yield a higher return).

And if there is an objection to this point, that I can't quite formulate, on the grounds of sharing the ball and everyone must get their shots because that's how basketball is played (or some such) there is a final point to be make: the point isn't to make a fetish about "bad shooters" but about "bad shots" with "bad shooters" just being a serviceable proxy.

But what about evidence for "bad shots"? It turns out that Ed Peterson provided some in his 82games article on Open/Contested Shots: http://www.82games.com/saccon.htm. Surprise, surprise, what the article shows is that taking a open shot generally yields a higher return than a contested one, and this is true across players, and one can infer from shot volumes that this is true across shot types.

So what's the bottom line?

Michael's data shows an approximately flat line for medium/long 2s, representing a "stable" mix of bad and good shots, taken by bad shooters and good. So even should the current medium/long 2 average, interpreted as a threshold, imply approximate optimality (in the context of average turnover rates) it is surely the case that upon being divided into "bad" and "good" shots, the former imply the opposite.

EvanZ · Post by **EvanZ** » Thu Mar 08, 2012 9:06 pm

schtevie wrote:.

That said, it occurs to me that there is a particular observation about Michael's presentation that I didn't make and that should be made. What is novel about it - as regards the question of shot selection - is the establishment of the relatively shallow slope one sees for medium/long 2s throughout the shot clock (not just for half court offenses). So, basically, the quality of the medium/long 2 you get at 10 seconds, on average, is approximately what you get at 20 seconds (a 0.03 to 0.04 difference, not nothing, but not that big of a deal given the alternatives available).

Recall my earlier post:

Right, and this is what I assume explains the PPS vs. time distribution. Worse shots (those nasty long 2-pters) are saved for "later", after other better options have been eliminated (due to defensive pressure, offensive ineptness, whatever). The shot clock time is not causative. It's an effect. Your mid-range jumpshot won't have significantly greater efficiency simply because you take it earlier.

In fact, that last assumption/hypothesis of mine would presumably be simple enough to test and relatively straightforward to interpret. Really, we just need to categorize each shot in a way that I do the PSAMS metric (inside shots, mid-range, 3-pt, and foul shots) and plot the time-distribution of each type. I think that would go a long way to informing this discussion

That was a prediction I made. I assumed Michael was picking up from my suggestion, but maybe it was his own inspiration. And when he posted it, I also commented that it matched more or less my prediction, the part about the slopes not being too significant, as well.

schtevie · Post by **schtevie** » Fri Mar 09, 2012 1:39 am

Evan, I don't think it is right to say that "those nasty long 2-pters" are saved for "later". Their nastiness, as Michael's data show, shows up early and often and not after other better options have been eliminated. That is the basis of the argument that significant counterfactual gains exist.

That there is some slope to the results implies that shot clock time is, as one should expect, causative in a loose sense - players/coaches are conscious of worse vs. better and discipline themselves somewhat. The point is that optimality isn't close to being achieved. First and foremost, I would suggest, because the proper analytical frame has not been adopted (never mind the culture shock in implementation).

Finally, that mid-range (more to the point long) jumpshots won't have significantly greater efficiency simply because you take them earlier is a result that quite clearly drops out from data from hoopdata. A suggestive fact: last season, the 10th best long 2 shooter, who played over 41 games and averaged at least two attempts from that distance - what is an arbitrary criterion, that was KG, and he had a success rate of 47%. If 47% is elite performance, and so close to the mean, you know that efficiency can't be much related to shot clock time.

mcoughlin · Post by **mcoughlin** » Fri Mar 09, 2012 5:02 pm

Hi all,

I have spent some time revamping / editing my code, and I feel a little more confident in the output, but again, its probably not perfect.

Please let me know if there are any questions / concerns with how the plots are produced. Please notice now how the points plot is now points per attempt instead of simply number made / number attempted. I am more than happy to re-run something and / or try to generate something if I have the data for it.

Thank you,
Michael

schtevie · Post by **schtevie** » Fri Mar 09, 2012 8:39 pm

Michael, great stuff! And if you're taking requests, here are the data manipulations I would like to see. The goal is to define a shot threshold for each second of shot clock time and thereby address the issue of the possible optimality of NBA shot selection.

To define a threshold, one must begin by specifying a certain, key variable value: the lag between a realized shot opportunity and the next one expected to become available. That is to say the decision rule to shoot or pass (more generally, to continue with the offense) depends upon the expected productivity of the offense at the point of the next scoring opportunity, taking into account the probability of a turnover occurring between now and then.

I would specify two values, then repeat the entire exercise twice, these being three seconds and four seconds. This would correspond to a half-court offense having a maximum of either four or five "looks" (these taking place within the last 16 seconds of the shot clock).

With this specified, these are the calculations required:

(1) Realized Points Per Possession as a function of shot clock time (RPPP t). This means, for each second of shot clock time, tracking the average number of points scored per prior to possession reverting to the opposing team.

(2) As suggested above, for each second of shot clock time, the probability of a turnover occurring prior to the next "look" (TO t+x, where x = 3 or 4).

(3) Calculating the threshold as a function of time (Th t). For each second of shot clock time (using a lag of either three or four seconds): (Th t) = (1-(TO t+x)) * (RPPP t+x)

I think that I have specified the decision rule correctly.

Anyway, if you can deliver this, Christmas comes early.

And should there be a problem with calculating (RPPP t) in particular if you cannot track to the end of possessions with your data, the next best, of course, would be the approximation of taking total points within the shot clock and then separately tabulating offensive rebounds and multiplying this by the appropriate average of PPP that you have available. (This could be a weighted average by shot type and time, if you happen to have the relevant data. Not that I think it will matter that much.)

To reiterate what these data can demonstrate, they would provide a weak test of the existence of optimal shot selection in the NBA. By this I mean that there is no presumption that an approximation of optimal shot selection currently exists in the NBA; there is but the actual shot selection. If it so turns out that a "significant" number of shots are taken per game that yield in expectation scoring outcomes that are "significantly" below the actual, inferred thresholds, then we can say, at a minimum, that NBA shot selection is suboptimal (subject to caveats such as we would need to account for garbage time factors, but this would be a good first cut.)

mcoughlin · Post by **mcoughlin** » Fri Mar 09, 2012 9:15 pm

I can do so, but I don't quite have a handle on (1) yet. Can you write down an equation like you have for the other 2? Math speak helps me!

Thank you,
Michael

APBRmetrics

putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection