Running Regressions On Pace
Running Regressions On Pace
Hi, I'm confused as all heck and thought I'd try asking for help here. I'm reading a book called "Conquering Risk" in which the author details a regression formula for calculating expected pace. I feel as though he's skipping over some perhaps valuable information, likely some basic math skills he's assuming I have? If I may, I'm going to quote the book exactly, so there's nothing lost in translation:
"When evaluating a matchup between two teams, the first factor to consider is: What is your estimated pace? A pace rating of 90 would mean that you expect each team to have about 90 possessions in the game.
"How should you estimate the pace rating for a matchup between teams X (the visitor) and Y (the home team)? One possibility is to take some sort of average—an arithmetic or geometric mean of all games played by that team in the current season. A somewhat better method is to use a regression. Each past data point for this regression will include three variables:
"X's average pace, Y's average pace, actual pace between X&Y
[Here's the first place in which I get confused. What's the third variable here? Does 'actual pace between X&Y' mean the average between both teams' pace? Does it mean their pace in previous head-to-head meetings? Idk.]
"With data in that format, I ran three years of WNBA pace data through Zunzun's regression function finder. One of the best fits was a second order polynomial produced by Zunzun.com below:
"Pace proj = a + bx + cy + dx^2 + ey^2 + fxy
"Where x is the visitor's average pace, y is the home team's average pace, and:
"a = 232
b = -4.35
c = -1.71
d = 0.0276
e = 0.00675
f = 0.0192"
OK, end of quote. There's no further explanation preceding this formula or after it, nothing in the footnotes. He explains what pace is, but I already knew what pace is. My question is simply this: Where in the heck did he get a, b, c, d, e and f from?!?!?! The only thing he accounts for are x and y and then some third variable I don't yet understand. I can plug a hypothetical team's pace into the slots for x and y, but I don't get the rest of the formula. Is this just me? If so, can I get an explanation in layman's terms? You'll have the undying gratitude of a faceless random stranger over the Internet. Thank you.
"When evaluating a matchup between two teams, the first factor to consider is: What is your estimated pace? A pace rating of 90 would mean that you expect each team to have about 90 possessions in the game.
"How should you estimate the pace rating for a matchup between teams X (the visitor) and Y (the home team)? One possibility is to take some sort of average—an arithmetic or geometric mean of all games played by that team in the current season. A somewhat better method is to use a regression. Each past data point for this regression will include three variables:
"X's average pace, Y's average pace, actual pace between X&Y
[Here's the first place in which I get confused. What's the third variable here? Does 'actual pace between X&Y' mean the average between both teams' pace? Does it mean their pace in previous head-to-head meetings? Idk.]
"With data in that format, I ran three years of WNBA pace data through Zunzun's regression function finder. One of the best fits was a second order polynomial produced by Zunzun.com below:
"Pace proj = a + bx + cy + dx^2 + ey^2 + fxy
"Where x is the visitor's average pace, y is the home team's average pace, and:
"a = 232
b = -4.35
c = -1.71
d = 0.0276
e = 0.00675
f = 0.0192"
OK, end of quote. There's no further explanation preceding this formula or after it, nothing in the footnotes. He explains what pace is, but I already knew what pace is. My question is simply this: Where in the heck did he get a, b, c, d, e and f from?!?!?! The only thing he accounts for are x and y and then some third variable I don't yet understand. I can plug a hypothetical team's pace into the slots for x and y, but I don't get the rest of the formula. Is this just me? If so, can I get an explanation in layman's terms? You'll have the undying gratitude of a faceless random stranger over the Internet. Thank you.
-
- Posts: 262
- Joined: Sun Nov 23, 2014 6:18 pm
Re: Running Regressions On Pace
those are just variables in the quadratic function. Instead of running his own regression,he used a program that looks at all of the data points, and finds the best type of function to explain/fit it.
Just plug all of the numbers he gave you into the formula + home pace + away pace.
Just plug all of the numbers he gave you into the formula + home pace + away pace.
Re: Running Regressions On Pace
Thanks. Makes more sense now. But I'm guessing those numbers only apply to those three specific years of WNBA data? Whereas I happen to be looking for NBA data. Ty.
Re: Running Regressions On Pace
It might help a little if the variables weren't overloaded.OnKPDuty wrote:Thanks. Makes more sense now. But I'm guessing those numbers only apply to those three specific years of WNBA data? Whereas I happen to be looking for NBA data. Ty.
Suppose I have team A hosting team B playing tonight, and I want to guess what the pace of the game will be. I don't know but I know the average pace for team A and the average pace for team B so far, and I expect that the game pace is related to those. In fact, I'm guessing that the relationship between average team paces and game paces is described by a formula like:
Pace = a + b (home average pace) + c (away average pace) + d (home average pace)^2 + e (away average pace)^2 + f (home average pace) (away average pace)
But I don't know what the values for a,b,c,d,e and f are.
So what I do is look at the history of games I have data for and try to pick values for a-f so that expression on the right hand side of the equation ends up as close to the actual observed pace as possible over all the games that I have data for. (There's some math you can do so that you don't just guess and check.)
Now, you're asking where the formula came from and the answer is that he tried a couple of different formulas, and that one seemed to work well.
If you have NBA pace data, there are free tools that will allow you to run your own regression to find the values.
Re: Running Regressions On Pace
The "pace" that a team records in a season is actually the average of league avg pace and that team's innate pace.
The Clippers are an avg pace team at 95.8 poss/48. The Jazz are slowest at 91.0.
When they meet, the Clipps play their possessions at a pace of 95.8/48 min.
The Jazz play as many possessions; and apparently they are trying to play at a pace = 91.0 - (95.8-91.0) = 86.2
With half the possessions played at 95.8 and half at 86.2, we expect the game to be played at (95.8 + 86.2)/2 = 91.0
If the Jazz played another "91.0" team, we could expect the game to have 86.2 poss/tm.
The Clippers are an avg pace team at 95.8 poss/48. The Jazz are slowest at 91.0.
When they meet, the Clipps play their possessions at a pace of 95.8/48 min.
The Jazz play as many possessions; and apparently they are trying to play at a pace = 91.0 - (95.8-91.0) = 86.2
With half the possessions played at 95.8 and half at 86.2, we expect the game to be played at (95.8 + 86.2)/2 = 91.0
If the Jazz played another "91.0" team, we could expect the game to have 86.2 poss/tm.
Re: Running Regressions On Pace
From a sort of theoretical perspective the best way to average the pace of two teams is to invert them, then average them, then invert that average. This is because the inverse of pace is time, and each team plays the same number of possessions (more or less) so what you really want to know is how long the average possession takes.
Re: Running Regressions On Pace
So the Jazz are tying to play slower but basically the Clippers won't let them? So if the Jazz were to play a team with an average identical pace of 91, you can instead expect a pace of 86.2 (assuming 95.2 is the average pace of all their past opponents)? I didn't know this. Is this generally considered to be true and/or even common knowledge? Ty.Mike G wrote:and apparently they are trying to play at a pace = 91.0 - (95.8-91.0) = 86.2
Hmmm … may I ask is this the same thing Mike G is saying in the post above yours? Otherwise I honestly am not enough of a math wiz to figure out if I am then getting conflicting information. In the meantime I'm going to brush up on inverses and get back to you. Ty.v-zero wrote:From a sort of theoretical perspective the best way to average the pace of two teams is to invert them, then average them, then invert that average. This is because the inverse of pace is time, and each team plays the same number of possessions (more or less) so what you really want to know is how long the average possession takes.
Re: Running Regressions On Pace
yeah, using the harmonic mean seems logical to mev-zero wrote:From a sort of theoretical perspective the best way to average the pace of two teams is to invert them, then average them, then invert that average. This is because the inverse of pace is time, and each team plays the same number of possessions (more or less) so what you really want to know is how long the average possession takes.
Re: Running Regressions On Pace
At the very least you'd still need a league average pace involved in the computation - otherwise one would always predict the fastest paced team in the league to be slower than their average in every game situation. Or vice versa.v-zero wrote:From a sort of theoretical perspective the best way to average the pace of two teams is to invert them, then average them, then invert that average. This is because the inverse of pace is time, and each team plays the same number of possessions (more or less) so what you really want to know is how long the average possession takes.
Re: Running Regressions On Pace
The Jazz may encourage the Clipps to slow it down, but more likely the Clipps prefer to play faster when they have the ball. The Clipps may somehow give incentive for the Jazz to speed it up. Every matchup is gonna be different, and lineups may be juggled to accommodate.OnKPDuty wrote:So the Jazz are tying to play slower but basically the Clippers won't let them? So if the Jazz were to play a team with an average identical pace of 91, you can instead expect a pace of 86.2 (assuming 95.2 is the average pace of all their past opponents)? I didn't know this. Is this generally considered to be true and/or even common knowledge? Ty.Mike G wrote:and apparently they are trying to play at a pace = 91.0 - (95.8-91.0) = 86.2
But in general, we guess the Clipps average 15 seconds in their offensive poss., and the Jazz avg 16.7.
Together in a game, they avg 31.7 per dual possession, which results in 91.0 poss/48 min.
(Clipps were NBA avg pace team, chosen for that reason.)
v-zero wrote: From a sort of theoretical perspective the best way to average the pace of two teams is to invert them, then average them, then invert that average. This is because the inverse of pace is time, and each team plays the same number of possessions (more or less) so what you really want to know is how long the average possession takes.
No, that gives basically the same result as just averaging their "paces"-- which as normally given are already the avg of team+opponent innate pace.OnKPDuty wrote:Hmmm … may I ask is this the same thing Mike G is saying in the post above yours?..
Correct. Team "paces" are traditionally shown as they've been averaged in games against all opponents during the season. There may be small variations from league avg, but I don't know of a source for opponent pace.Statman wrote:... you'd still need a league average pace involved in the computation - otherwise one would always predict the fastest paced team in the league to be slower than their average in every game situation...
And I don't know how common is this knowledge. Apparently it's not universal.