APBRmetrics

Posted: **Fri Sep 23, 2011 4:50 am**

The Marginal Revolution blog has a little discussion. The authors, Gabel and Redner, build into their random walk model the rate at which scoring plays occur, team strength, "anti-persistence" (a score by team A leads to a sharply increased probability that team B will score next, because team B automatically gains possession), and a "restoring force" (teams with larger leads have slightly smaller probabilities of scoring than teams with smaller leads or deficits).

Their conclusions: their model closely matches actual NBA scoring patterns. The apparent hot streaks and cold streaks that we see in games can be explained as occuring due to natural random chance (similar to the lack of a hot hand effect). Game outcomes are decided more by chance than by team strength differential. The restoring force exists but is small.

My reactions: it's nice that their model matches the data well, but so do a large number of models (Pythag, Bell Curve, etc. although those don't model individual scoring plays). The model doesn't appear to even have the ability to measure hot streaks or cold streaks, instead they derive the distribution of scoring streaks implied by their model and find that their theoretical "values match the emprical data very closely". Again nice, but I would like to see actual testing of the hypothesis, with a parameter to measure streakiness and a test of whether that parameter is different from 0. Such a model might "match the empirical data" even more closely, and show that streaks do occur more often than a random walk model predicts.

I don't know how to interpret their claim that "the bias that stems from intrinsic differences in team strengths is not the dominating factor in determining the outcome of a typical NBA basketball games." They measure the "bias velocity" as 0.0037 points per second and use that number to calculate the value of the "Peclet number" as 0.55. They say small values indicate that the random walk is "dominated by diffusion and the effects of the drift are minuscule", and judge 0.55 to be "a small, but not negligible, Peclet number". Might by an okay conclusion, but they're averaging over all NBA games. Clearly a game between two evenly matched teams will have pretty much a random outcome. Whereas a Lakers vs Timberwolves outcome is likely to be dominated by team quality differential. Averaging all games, maybe "small but not negligible" is an okay conclusion. It'd be interesting to compare that number for other sports.

So, an interesting model but I think too limited and inflexible (at least in the authors' version, maybe there are variations of the model which permit more flexibility). I find Markov models such as the one that Kenny Shirley presented at NESSIS a few years ago more appealing -- one can get as complicated as one wishes by adding more states to the model. Standard Markov models do not permit (non-random) streaks either, but there are Markov models with memory.

P.S. I hope somebody stops the Mad Spammer.

Posted: **Fri Sep 23, 2011 3:13 pm**

Also via Tyler Cowen, a nutritious piece from Brian Skinner, what is a good starting guide to intuition about shot clock issues: http://arxiv.org/abs/1107.5793

Posted: **Fri Sep 23, 2011 7:52 pm**

Nice, but still lacking a bit in detail. Although there is indeed an important team question about when to shoot, I think the crucial decision is the one faced by an individual player: should I shoot now, or pass, or dribble -- or hold the ball while waiting for a better opportunity to shoot, pass, or dribble. Often that last option is the worst one, but not always, especially if the guy with the ball is a good passer; think of Sabonis holding the ball in the high post waiting for a cutter to get open. So rather than the binary shoot/don't shoot choice, there are actually four options, and five different players who might have the ball. (And of course there are sub-options; maybe dribbling is the best choice, but only if you dribble to the right spot because dribbling to the wrong spot might get you trapped in the corner. OTOH if you're Gary Payton, you might have the ability to dribble yourself out of trouble.)

Posted: **Fri Sep 23, 2011 10:54 pm**

You know what they always say, "basketball is a game of biased random walks".

As "Sandy" brought up, I'm interested in the time-out question. Do coaches actually stop runs by calling time-outs? I'm skeptical (especially after seeing this paper). It's something I wanted to look into, but haven't really found the time.

Posted: **Sun Sep 25, 2011 12:26 am**

this was part of their conclusion:

"...basketball is a complex sport that requires considerable analysis to understand and respond to its many nuances. as a result, a considerable history has built up to quantify every aspect of basketball and thereby attempt to improve a team's competitive standing. however, this competitive and evolutionary rat race largely eliminates gross systematic advantages between teams, so that all that remains from a competitive standpoint are small surges and ebbs in performance that arise from the underlying stochasticity of the game..."

can anyone rephrase this in colloquial english?

Posted: **Sun Sep 25, 2011 5:54 am**

Brian Skinner patiently responded to my e-mails and graciously created and shared this file related to his paper. If you want to manipulate some of the underlying assumptions and generate automatic graphs displaying the results use this download link: http://www.easy-share.com/54CF5028E73A1 ... rdrate.xls

With regards to the statements in the first paper that Mike and Bob quoted, I don't claim to have digested and understood all they said & did but I'll take a stab at it. I may get it wrong or stir up more uncertainty in the process though.

I think part of what they are saying is that more of the point differential on individual games is random than coming from the underlying long term strength difference between the teams involved. If as they say the average point differential is 10.7 points, I think I can see that statement might be true on average but they seem like they are making a bit too much of it.

From their general tone, it sorta sounds like they are suggesting that there is or will be movement toward greater team parity in team knowledge (like Morey?) and team play on the court and the later isn't happening according to team records. Maybe they don't mean that but it sounds like they are suggesting it to me in the latter quote. They might also be suggesting that perceived team strength is to a good measure influenced by random walks and may not be as real as they seem to be or are treated or that real team strength is only laboriously and only partially revealed by season's end after the walk is over. Are point differential results becoming more dominated by randomness? That could be checked but I don't think they presented anything on that and I'd be somewhat surprised if it were the case.

Posted: **Sun Sep 25, 2011 7:13 am**

Hi everyone.

The excel file that crow posted was made by me. Feel free to let me know if you have any questions about it. My goal was to evaluate (theoretically) the optimal shooting rate for a team as a function of shot clock time. This is explained more fully in the paper that schtevie linked to above, but I can answer questions about it if you want.

As for the Gabel and Redner paper, I'm a little surprised that they thought this was worth publishing. I sort of thought it was well-known that basketball games (and most sports games) look a lot like biased random walks. Their conclusion seems to be that "on the whole, the random part of a basketball score is larger than the biased part". This is pretty much how I understand the quote bchaikin posted: "At the NBA level, success is more a result of luck than the fairly small differences in skill between teams."

It's a pretty crude description of basketball that is probably mostly correct when averaged over the whole league. But that doesn't mean there is no such thing as a truly good team (as opposed to just a lucky team).

Crude and over-generalized descriptions is the way physicists (myself included) like to do things! If you don't make rash assumptions you can never solve anything. And mtamada, for a physicist a model that "can get as complicated as one wishes" is generally a bad thing! It gives us flashbacks of Copernicus's "epicycles".

Posted: **Sun Sep 25, 2011 8:49 am**

bchaikin wrote:this was part of their conclusion:

"...basketball is a complex sport that requires considerable analysis to understand and respond to its many nuances. as a result, a considerable history has built up to quantify every aspect of basketball and thereby attempt to improve a team's competitive standing. however, this competitive and evolutionary rat race largely eliminates gross systematic advantages between teams, so that all that remains from a competitive standpoint are small surges and ebbs in performance that arise from the underlying stochasticity of the game..."

can anyone rephrase this in colloquial english?

"Basketball has been analyzed to such a degree that all the analysis nullifies the actual value of analysis. Therefore [it is the author's contention] what you are really seeing in games are random blips of variance". (Had to double check stochasticity, it's basically any system contains predictable actions + unpredictable actions... How genius!)

I think wrong, but then again, I could be completely mistaken in understanding what they are trying to say. (I don't think so though)

Crow wrote: . Are point differential results becoming more dominated by randomness? That could be checked but I don't think they presented anything on that and I'd be somewhat surprised if it were the case.

No they aren't, which is why I don't understand the conclusions.

Posted: **Sun Sep 25, 2011 3:52 pm**

This is pretty much how I understand the quote... "At the NBA level, success is more a result of luck than the fairly small differences in skill between teams."

at first this is what i thought they meant too. yet to anyone watching nba basketball (over a few decades) you would think that this of course does not make sense, otherwise you would not have dominant teams for multiple years...

but then again could it be that maybe their conclusion is meant to mean that the variation in score differential is quite low compared to the total points scored, and thus a random walk is why you dont see alot of 145-33 games? i'm not sure...

Basketball has been analyzed to such a degree that all the analysis nullifies the actual value of analysis.

it certainly sounds to me like this is what they are saying - but again they are not real specific as to what they mean by eliminating "gross systematic changes between teams"...

i think i understand the concept of a random walk - it's often used to describe radiation escaping from the center of the sun, or when describing brownian motion. but these examples typically treat the factors involved as point particles, with each assumed to be identical (like photons or molecules). but are the authors here trying to conceptualize nba players as point particles with little difference between them?...

Crude and over-generalized descriptions is the way physicists (myself included) like to do things!

what kind of physics do you do?...

Posted: **Sun Sep 25, 2011 4:56 pm**

Hmm... My last statement about how "crude and over-generalized ... is the way we like to do things" was meant as a flippant and self-deprecating remark, but I guess it sounds a lot like an indictment on the quality of my own work and that of the entire physics community.

My field, personally, is condensed matter physics. In this field people generally have the mindset that you should solve a problem exactly when you can, but when you can't you should keep simplifying the problem until it becomes simple enough for you to solve.

Gabel and Redner don't know how to "solve basketball", but they can solve a simplified model in which basketball is described as a random walk. (The "point particle" here isn't the players; it's the differential score, which "walks" randomly up and down). The purpose of their paper is to show that this simple description captures most of the statistical trends at the league-wide level.

I think it's perfectly true that their model can't explain why one team would be consistently better than another over decades (i.e. Lakers over Clippers). And in fact, you'll notice that all all their results are for the league as a whole, and aren't meant to explain how one particular team performs over time.

If I had written this paper, my conclusion might have said something like "we're not claiming that there is no such thing as a consistently good team in the NBA, only that league-wide statistics would like pretty much the same even if there weren't."

In that sense, their conclusion is very similar to that of "hot hand" studies.

Posted: **Mon Sep 26, 2011 1:32 am**

mtamada wrote:Nice, but still lacking a bit in detail. Although there is indeed an important team question about when to shoot, I think the crucial decision is the one faced by an individual player: should I shoot now, or pass, or dribble -- or hold the ball while waiting for a better opportunity to shoot, pass, or dribble. Often that last option is the worst one, but not always, especially if the guy with the ball is a good passer; think of Sabonis holding the ball in the high post waiting for a cutter to get open. So rather than the binary shoot/don't shoot choice, there are actually four options, and five different players who might have the ball. (And of course there are sub-options; maybe dribbling is the best choice, but only if you dribble to the right spot because dribbling to the wrong spot might get you trapped in the corner. OTOH if you're Gary Payton, you might have the ability to dribble yourself out of trouble.)

Mike, it is unproductive to anticipate detail in this model of the kind you describe. Whether the notional player in possession of the theoretical ball chooses to pass, dribble, or hold the ball is besides the point. Put a black box over all this. What matters is the nature of the (assumed) shot distribution(s) and its (their) frequency. The (more) interesting results flow from these.

Brian steps on his lede by invoking a representative play. The general approach is far more robust.

Posted: **Mon Sep 26, 2011 7:59 pm**

gravityandlevity wrote:And mtamada, for a physicist a model that "can get as complicated as one wishes" is generally a bad thing! It gives us flashbacks of Copernicus's "epicycles".

This brings up an interesting potential difference in viewpoints. There's a sense in which the modelling approaches are different, but also a sense in which they are the same. The similarity is in trying to avoid models with too much complexity and flexibility: In the case of Markov models, when I mentioned making the models more "complicated" I merely meant adding more states to the model, not using a more complicated version of Markov models. These states can become highly specific and numerous (which is what Kenny Shirley did in his model): to not just distinguish between 3-pt baskets and 2-pt baskets (Gabel and Redner in contrast in their main model assumed that all scoring plays yield the same number of points, something like 2.07 IIRC), but also 4-pt plays, technical fouls, jump balls, etc. Shirley's model was "complicated" (in the sense that I was using the word) because he had something like 23 different states that he tracked. But it was still a plain vanilla Markov Model (no memory in it e.g.).

Those are very solid, empirically observable states in a basketball game. I.e. a "complicated" model (in the sense that I was using the word, and in this example) doesn't have to mean adding epicycles. It can mean adding Uranus and Neptune to the model (or maybe a better analogy: Ganymede, Europa, etc. and maybe a few asteroids).

But there's also a sense in which the physicists' approach might differ from the approach of say economists. Economists will typically use some version of regression in their models, and will typically make abundant use of "dummy variables": binary variables to indicate that an observation was or was not in a certain classification (could be gender, could be rookie status, could be college education, could be free agent status, could be anything). These dummy variables can be valuable measures of some real category with real effects. But they can also be used as ad hoc patches to holes in the model, even a way of adding epicycles to make the model fit better.

From what you're saying, physicists might be less inclined to use dummy variables than economists are. An analogy might be Einstein's cosmological constant, which he put into his model but only as a last resort; economists would be less hesitant. (However, it must also be recognized that physicists are often dealing only with experimental error whereas social scientists are dealing with models with sources of error which are more varied and larger. The velocity of money in macroeconomics is almost as important as the velocity of light is in physics, but harder to successfully model because it's not a constant.)

APBRmetrics

Physicists model NBA games as a random walk

Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk

Re: Physicists model NBA games as a random walk