putting some math to the problem of shot selection

Trepidation · Post by **Trepidation** » Fri Feb 17, 2012 10:18 pm

Ok Even. I just reread your post, and I don't think I ever intended to contradict it. I think we just had a breakdown in communication.

gravityandlevity · Post by **gravityandlevity** » Sat Feb 18, 2012 2:33 am

Hi Trepidation,

Thanks for your comments. I'll do my best to reply to some of them, but I have to admit that I'm not entirely sure which part of the analysis you object to. If I don't answer your questions, please try to clarify for me.

Trepidation wrote:... I have a question at (22). Why did you choose to take a ratio of the integration of shot quality over the period of time chosen rather than solving for t when F is maximal? It would seem to me that the best basketball strategy would be to cherry pick the best point to take the shot rather than count up all the probabilities of success over the duration and average them out.

I should maybe clarify what F is in equation (22). F is the average points per possession scored by a team that follows the strategy I derived. This strategy is defined as: shoot the ball whenever you got a shot opportunity whose expected number of points scored is better than the function f(t). Deriving the function f(t) was the point of the paper. As best I can tell, the function f(t) that best corresponds to NBA teams is shown on page 6 of this discussion.

In this sense there is no meaning sense to the question "which time t is optimal?". Shot opportunities are assumed to arise randomly throughout the shot clock, and any shot that arises at time t should be taken when its quality exceeds f(t).

Trepidation wrote:GravAndLev, this graph could be a result of the diminishing returns garnered by teams trying improve an already below average quality possession, meaning the data used in the later period of the shot clock is biased towards these already poor possessions.

I think what you're saying here is that the quality of shot opportunities can depend on shot clock time, with later shot clock times tending to correspond to worse shot opportunities. This may very well be true, and is outside my model, as discussed above. (My comment on page 5 addresses this in particular).

Trepidation wrote:It seems to me that you have the concept that shot quality is a function of time left on the clock, and decreases from left to right. This may be apparent and true statistically, but is not very useful analytically because this function is in fact a complex result. I can think of two major contributors to this function.. possession quality, and the team's ability to improve on this quality over time.

I am not claiming that the quality of shot opportunities decreases with time. This is indeed a very hard thing to discern from data, and I am not willing to speculate about its relationship at the moment. I am only claiming that the quality of shots taken decreases with time. This is, as you say, a complex result, since it is a convolution of what players are choosing to do and what opportunities are arising. Coming up with a model for explaining it was the whole purpose of the paper. But the general decrease of points per shot with time is as it should be, as it implies that NBA teams are more selective with their shots earlier in the shot clock. My (tentative) claim in the paper is that players are too selective at early shot clock times.

And again, you may be right about the quality of shot opportunities decreasing with time, but I can't see a reasonable way to take that into account without invoking (even more) arbitrary assumptions.

Trepidation · Post by **Trepidation** » Sat Feb 18, 2012 4:36 am

Ok. I'll break down my logic in a part by part manner so you can better understand why I chose the words I did.

But first of all, I will try to illustrate where exactly I think we have a lack of agreement in active principles. When I say the data for shots taken in the later part of the shot clock are biased, I mean you are not using a simple random sample, which is essential for statistical analysis. The shots that occurred during this period of time were selected not from a fair polling of all possessions, but rather biased towards those possessions that were never very good to begin with.

This also by the way creates a continuity in the data diagram going from data from fast-breaks to regular positions. Just like the data is biased towards selecting naturally poorer quality shots in the later periods of the shot clock, those shots occurring early in the shot clock are being selected from a pool of very good quality possessions.

If the shots taken in the early part of the shot clock are selected from a rich pool, and the shots taken later in the shot clock are taken from a poor pool, your graph will show that these early shots are more likely to succeed.

To state that this implies that shot quality diminishes as a function of time left on the shot clock is misleading, because the real nature of the dynamics is that shot quality is partially dependent on possession quality. Since a good possession tends to be successful and end in the early part of the shot clock, and poor possessions tend to be unsuccessful and end in the late part of the shot clock, it causes the data to appear the way it is observed in your graph.

The silver lining in this, is that it does not diminish it without mitigation. The shot quality of a function of time left on the shot clock is also affected, in the opposite direction, due to effective manipulation of the shot quality by means of breaking down the defense.

If we can find some manner of predicting through statistics how well particular teams are able to raise their shot quality over the duration of the possession as a function of how much time is remaining on the shot clock, we could then use this data as a meter to gauge whether a player ought to take a shot opportunity now, or wait for a likely better shot later.

One quick question. Did you follow the example of the coin flip predictor? If this is understandable, you will see how it also proves that in a system like this, winning the game generally occurs earlier rather than later, maximizing on the very first opportunity. However it is also clear to see that interpreting that one should use this information and attempt to call the coin earlier, you will be losing more games than necessary.

The optimal strategy of the coin flip prediction game is beyond my means of calculating, but I do know that one should not just pick the first over 50% chance (if there are more than 2 flips), because there is a "chance of a chance" greater than this later. This "chance of a chance" I state here is for illustrative purposes only, as I'm sure you are aware that this statistically is simplified into a single complex chance. Graphing this complex chance as a function of how many flips are remaining is essential for judging when to accept an early opportunity, generating a lower threshold of acceptance the fewer flips are remaining.

This is an interesting simulation of shot taking strategy in basketball, and a would be a great initial exploration, but if a model that uses subsequent flips that are dependent on previous flips, and tend to improve in success chance each iteration, this would be almost identical to the principle I am attempting to state that exists in a typical basketball possession.

Phew that's enough.. I know I wrote a lot here, but I think it was all necessary. Thanks to all readers!

Trepidation · Post by **Trepidation** » Sat Feb 18, 2012 5:18 am

Ahhh.. thanks for the description about (22). That is a good strategy. But we need to work on what graph we are integrating over. I have some ideas, but we need to come to a consensus to the principles currently being discussed before more new principles should be introduced.

schtevie · Post by **schtevie** » Sat Feb 18, 2012 2:25 pm

EvanZ wrote:
schtevie wrote:This example, of course, is contrived, but it illustrates an important point. What it says is that you want to organize an offense so as to give early looks to those players that have greater probability mass in the upper tail of the shot distribution (as long as they aren't sinks, of course, who won't move the ball along if the good shot opportunity doesn't obtain, or are more likely to turn the ball over, counteracting the potential scoring gains). All else equal, picking off these "upper tails", in rank order, is the key to optimizing on (3).
Right, and this is what I assume explains the PPS vs. time distribution. Worse shots (those nasty long 2-pters) are saved for "later", after other better options have been eliminated (due to defensive pressure, offensive ineptness, whatever). The shot clock time is not causative. It's an effect. Your mid-range jumpshot won't have significantly greater efficiency simply because you take it earlier.

(1)I don't think the word "saved" is precise. When time runs out you take what you can get. Perhaps there are shot types that disproportionately occur in shot clock "garbage time". It depends, I suppose, if the offense runs out of ideas toward the end of the shot clock. I think that was Brian's explanation as to why the "post-defensive rebound originating" half-court offense tailed off in the last few seconds (compared to "dead ball originating" offenses). But such a distinction is completely bizarre (and a whole different issue, really).
(2) As for shot clock time not being an effect and not a cause of outcomes, I don't understand this framing. It certainly is a cause for shot selection.
(3) And as for the idea that mid-range jump shots won't show greater efficiency when taken earlier in the shot clock, I suspect that this is not true, and would be clearly established if the correct data were there to speak to the issue. There is a difference between a Ray Allen taking a 16 footer at a spot he wants on a planned catch and shoot with 15 seconds on the shot clock vs. a lesser light taking a 22 footer with a beast in his face, trying to beat the shot clock. And then there is the dog that didn't bark: the non-inclusion of relevant foul shooting. The data are biased in the sense that the better quality shot opportunities are disproportionately fouled.

EvanZ wrote:In fact, that last assumption/hypothesis of mine would presumably be simple enough to test and relatively straightforward to interpret. Really, we just need to categorize each shot in a way that I do the PSAMS metric (inside shots, mid-range, 3-pt, and foul shots) and plot the time-distribution of each type. I think that would go a long way to informing this discussion.

This would surely better inform the discussion, in terms of what is taken when, but until the foul issue referred to is dealt with, inferences about optimality will be constrained.

The presentation of the data I would like to see would be points per possession as a function of time, that is inclusive of the effects of turnovers, fouls shots, and subsequent offensive-rebounding originating possessions. This would give a more informed view of the opportunity costs of offensive decision-making with respect to shot clock time.

EvanZ · Post by **EvanZ** » Sat Feb 18, 2012 5:16 pm

schtevie wrote: (2) As for shot clock time not being an effect and not a cause of outcomes, I don't understand this framing. It certainly is a cause for shot selection.

Right, I agree. My statement implied that I don't think it affects shot efficiency. Those may or may not be related. We don't have the data right now to be sure which one of us is correct about this. My hypothesis is that a 22-ft jumper early in the shot clock is going to be roughly equivalent to one late in the shot clock. Of course, you may be right, and I may be wrong. I could imagine that early in the shot clock players have less defensive pressure (and maybe the shot clock is also acting as a psychological pressure). Anyway, until we have the data, we can't know either way. Certainly we can't know the effect size.

Trepidation · Post by **Trepidation** » Sat Feb 18, 2012 7:11 pm

My hypothesis is that a 22-ft jumper early in the shot clock is going to be roughly equivalent to one late in the shot clock.

That is closer to the truth than has been discussed so far. In fact it is actually fairly accurate, I wouldn't hesitate to make that comment. But more to the spirit of why you said it, it doesn't go far enough. Shots taken later in the shot clock are *more* likely to go in in a simple random sample, but since the data we are using is not a simple random sample, the graph leans the wrong direction. I fully described how this is plausible.

schtevie · Post by **schtevie** » Thu Feb 23, 2012 12:07 am

A final bit of "non-empirical" long 2 mischief, to assist those interested in better contextualizing their beliefs about the degree to which NBA offenses employ optimal shot selection with respect to the shot clock.

If we indeed accept the null that the current state of play is optimal (or approximately so) the direct implication is that the play of every player is optimal as well (or approximately so). Taking last season's long 2 performance as the referent we find (again, courtesy of hoopdata.com) that 136 players shot a higher percentage (43% and above) than what was realized for the "last second" shooting average in Brian's data (what was 42%, tabulated over four seasons).

Subtracting the contributions of the 131 "overachievers" from the long 2 pool, the remaining players are then found to have combined for 7.7 attempts per game on average in 2010-11 and to have completed these at the rate of 36%, or 0.72 points per attempt.

So, the question to ask is whether it makes sense that these realized 0.72 points per attempt on long 2s were optimal given the exhibited potential of the average offense (against the shot clock).

If you subtract out all these "underachiever" long 2s from the "half-court" offensive numbers provided by Brian (what I take to be shots taken from 16 seconds on down) what remains is a shooting efficiency of 1.01 points per attempt. And then if you assume that long 2s not taken by below-average shooters would reap this as an average return, at 7.7 shots per game, you are looking at 2.3 extra points per game.

Too high of a baseline? (Probably so, but my guess is not by much, for reasons previously discussed.)

If instead you assume the actually observed half-court offensive return of 0.96 (what is inclusive of the "contribution" of "optimal" long 2s) you are looking at 1.8 extra points per game by "redeploying" these 7.7 shots per game.

Still too high of a baseline? (Not sure about this at all.)

If the final assumption is that all the below-average long 2s taken throughout the half-court offense are instead held to the last second (figuratively), yielding 0.84 points per attempt (what is almost surely too low) you are looking at an extra 0.95 points per game.

And these are the hypothetical gains from just one particular type of shot that happens to be very conspicuous in the data.

Food for thought. Or not.

schtevie · Post by **schtevie** » Sat Feb 25, 2012 10:28 pm

That was not so final. Let me revise the comments above (due to some math errors found upon revisiting the calculations) but then also I would like to extend my remarks so as to reinforce the general point I was trying to make.

The issue is to what extent should one believe that NBA shot selection is optimized with respect to the shot clock? Data used in Brian's article show attempts and points per shot taken at each second of the shot clock (what might include "and 1s" but, as I understand them, they don't incorporate other foul shooting). This resulting efficiency measure shows an average of 0.843 points scored per shot in the final second of the shot clock.

The initial, "crude" argument I introduced is that we can infer significant deviations from optimal shot selection with respect to the shot clock, a priori, because there is a very popular NBA shot, long 2s (defined via hoopdata as 16-23 ft) that on average is less efficient still: 0.788 in 2010-11. This argument did not go over without opposition.

So to refine the point, I thought it would be useful to get a sense of the number of "long 2" attempts that were strictly worse than the points per shot available in the last second of the shot clock. And an estimate of this can be provided by the looking at the totals for players whose season average was below the 0.843 threshold. In the post above, I stated that last season's numbers showed 7.7 such attempts per game. This is incorrect. That is the approximate number of attempts per game for those shooting above the threshold (then there was another math error, but never mind). The actual number of below-threshold shot attempts per game (so defined and assuming no further math errors...erg) is 13.9 (what is 68% of total "long 2s" attempted). And then the average points per shot implied (no "and 1s" included) is 0.730.

Taking the next inferential step, trying to estimate counterfactual points, were all such below average attempts redeployed, let's avoid the "pie in the sky" stories and just assume that by more prudent shot selection, the actually observed, "desperation heave", yielding 0.843 points per shot would have been available. If this seems plausible, then the counterfactual gain is 1.572 points per game. Not nothing.

But as the Cat in the Hat might say, "that is not all!"

There is nothing special about below-threshold "long 2s". This shot was featured only because of how conspicuous it is, where not only the overall average was low, but also because "long 2s" are fouled at a below average rate, and because they are a bad shot to take if the hope is to continue the offense with an offensive rebound. The general argument is that there are many, many suboptimal shots taken (with respect to the shot clock) and attempts should be made to estimate the overall number and their effect.

We can perform the exact same exercise with 3 pointers (and here I adjust the relevant threshold to an eFG% of 0.375, owing to the aforementioned higher OR% effect). Doing so, the result is that only 0.6 3 point attempts are taken by below-threshold 3 point shooters per game (what is only 3.6% of all 3 attempts - an interesting comparative fact in its own right). And the average points per shot on these "bad 3s" are 0.519. Accordingly, the counterfactual gain of redeploying the "bad 3s" (with the same 0.843 baseline) is only 0.208 points per game.

But, again, that is not all, that is not all!

What about medium-range jumpers? These too have a below average success rate (FG% of 39.3 in 2010-11, sez hoopdata). However these are also fouled more, so one is a bit less confident on that account in knowing what the relevant threshold should be. Still, for present purposes (that we are just trying to get a sense of the dimension of the phenomenon) and given that the counterfactual estimates offered are not inclusive of free throws (hence provide a distincly lower bound) and because medium-range 2s are on average (I would guess) fouled less than the overall average NBA shot, I think we are fine keeping with the 0.843 threshold.

Looking at below-threshold, medium-range jump shooters, we see that in 2010-11 they took 4.5 shots per game (what were 61.4% of all medium range jumpers) yielding only 0.681 points per shot. And in the now familiar exercise, compared to a last-second baseline of 0.843 points per shot, redeploying these could have yielded an extra 0.726 points per game.

And that is all.

Well, we could continue for all shot distances, however the assumptions regarding free throws would become more tenuous, and a lot of the shots at the rim come on fast breaks, and this whole argument revolves around counterfactual opportunities in the half court.

But it is worth summarizing these results (again, what are for 2010-11). If you sum the shot attempts of all players whose season average on medium 2s and long 2s was below the "last second" threshold of 0.843 points per shot and 3s below a threshold of 0.75 points per shot, you see 19 such shot attempts per game - what is 23.4% of total attempts (and higher still of those within the half court).

Then, if you assume that all such attempts could instead have yielded only the 0.843 points per shot, the total gain would be 2.51 points per game.

This is not a small number.

But is something like this feasible? What is the argument that at least a big chunk of it is not? These 19 shot attempts per game are distributed throughout the shot clock (there being way too many of them to have been jammed in the last few seconds). How long does it take to reset an offense, or at a minimum to get the ball in the hands of a better shooter, or the same shooters in better circumstances? How hard is it to do better than a last second shot?

Crow · Post by **Crow** » Wed Feb 29, 2012 2:26 am

I kinda drifted away from this conversation. Maybe I'll re-read it and have something to add in the future but I appreciate the effort others have made.

mcoughlin · Post by **mcoughlin** » Sun Mar 04, 2012 3:08 pm

Hi,

To answer xkonk's question about long-2 efficiency, I made a quick plot from the data from http://www.basketballgeek.com/data/, calculating the efficiency for four different intervals vs. shot-clock times.

From these data, I calculate a mean efficiency for the 15-22 ft interval of 0.41 or so. Note, the code did NOT account for putbacks by demanding that a shot was made on the other end first, as has been pointed out in this thread as being important. I hope to get to that later.

Thank you,
Michael

EvanZ · Post by **EvanZ** » Sun Mar 04, 2012 3:34 pm

From before:

Right, and this is what I assume explains the PPS vs. time distribution. Worse shots (those nasty long 2-pters) are saved for "later", after other better options have been eliminated (due to defensive pressure, offensive ineptness, whatever). The shot clock time is not causative. It's an effect. Your mid-range jumpshot won't have significantly greater efficiency simply because you take it earlier.

In fact, that last assumption/hypothesis of mine would presumably be simple enough to test and relatively straightforward to interpret. Really, we just need to categorize each shot in a way that I do the PSAMS metric (inside shots, mid-range, 3-pt, and foul shots) and plot the time-distribution of each type. I think that would go a long way to informing this discussion.

Interesting! It looks to me like my hypothesis of mid-range shots having similar efficiency throughout the shot clock turns out to be just about right. (Mid-range shots are always bad shots.) All shots show a decrease in efficiency, but inside shots much more than others. Very cool. Thanks!

Mike G · Post by **Mike G** » Sun Mar 04, 2012 3:36 pm

Do these efficiencies include FT?

Crow · Post by **Crow** » Sun Mar 04, 2012 4:13 pm

I am pretty sure they don't and including that consideration could change the big picture. Is there a differential in the rate that short2 attempts, long2s and 3 pt attempts draw fouls and how big are the differences between them? This was accounted for in a small scale 82 games study in the past http://82games.com/locations.htm along with turnovers. Foul and turnover rates do vary significantly. Thus there is a more complicated answered about which shots have higher overall efficiency. Any 3 attempt was better in this cut (which did not address the degree of "contest" and the shot clock) than any 2 except from the low paint. Wing 2s are the clear worst shot.

When a player with the ball starts in a three-point zone, the most efficient move he can make is none at all and moving in for a two point jumper is the worst shot possible for the team on average.

2 pt jumpers receive 30% of the fouls of in the paint shots but are subject to 80% as many turnovers. 3 pt jumpers receive a bit less than half the fouls but are subject to a bit more than 50% the turnovers as compared to 2 pt jumpers. There is more detail on shooting fouls and FTAs available and they show larger differentials between these shot types.

EvanZ · Post by **EvanZ** » Sun Mar 04, 2012 4:42 pm

Looking at it again, I think the very large spike on inside shots at the beginning must be from transition layups/dunks.

If you take that out, the slope of all the curves looks similar. This suggests to me that all shots become less efficient at about the same rate, which my assumption is caused not by the shot clock per se, but opposing defenses not allowing good shots as early, and teams waiting to take better shots (even if they are better 2-pt shots).

All in all, though, I think these data are consistent with the hypothesis that teams wait to take better types of shots (inside ones), and that simply taking a shot earlier, if that shot is not an inside shot, is not going to improve efficiency significantly.

APBRmetrics

putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection

Re: putting some math to the problem of shot selection