nba-retrodiction-contest-part-3-the-perfect-blend

Home for all your discussion of basketball statistical analysis.
Guy
Posts: 75
Joined: Wed Jan 18, 2012 6:15 pm

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by Guy »

Agreed, it's huge: out-of-sample testing is way more valid
I don't see it. There are only two possibilities here:
1) The relationship between boxscore stats and real value is quite constant over time. In this case, using the "future" data to estimate coefficients won't matter.
OR
2) The relationship between the stats and value changes over time. And in that case, using a 10-year average for the ASPM coefficients isn't much of an advantage. What would be much better then is to predict a given year using a model based on the prior 3-4 years.

So either way, I don't see how ASPM has any big unfair advantage here. Certainly nothing approaching RAPM's. (If we were talking about customized annual coefficients, based on the year being predicted, then I could see a problem. )
Crow
Posts: 10623
Joined: Thu Apr 14, 2011 11:10 pm

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by Crow »

Alex provides some replies on his site.

http://sportskeptic.wordpress.com/2012/ ... pbr-board/

I think the dialog is high knowledge and productive even if it might be characterized in other ways also by some.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by J.E. »

Crow wrote:http://sportskeptic.wordpress.com/2012/ ... pbr-board/
I think the dialog is high knowledge and productive even if it might be characterized in other ways also by some.
About 16h ago I wrote a well-meant reply where I even offered some help in getting the '06 RAPM names fixed. Other comments have been posted since then, but not mine. So basically, there is no dialog. Why he would not post my comment is absolutely beyond me. The fact that he didn't means that I'm done with this thing
Guy
Posts: 75
Joined: Wed Jan 18, 2012 6:15 pm

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by Guy »

Why he would not post my comment is absolutely beyond me.
J.E., your comment was caught in a spam filter. It's posted now, with a reply from Alex.
mtamada
Posts: 163
Joined: Thu Apr 14, 2011 11:35 pm

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by mtamada »

Guy wrote:
Agreed, it's huge: out-of-sample testing is way more valid
I don't see it. There are only two possibilities here:
1) The relationship between boxscore stats and real value is quite constant over time. In this case, using the "future" data to estimate coefficients won't matter.
OR
2) The relationship between the stats and value changes over time. And in that case, using a 10-year average for the ASPM coefficients isn't much of an advantage. What would be much better then is to predict a given year using a model based on the prior 3-4 years.

So either way, I don't see how ASPM has any big unfair advantage here. Certainly nothing approaching RAPM's. (If we were talking about customized annual coefficients, based on the year being predicted, then I could see a problem. )

You've left out the third possibility: white noise i.e. random error. I.e. suppose your situation (1) occurs. The true value can be constant, but the data are ALWAYS subject to randomness, and our estimate of the true values WILL fluctuate from season to season. Guaranteed.

And even if we did have 100% correct estimates, they would not perfectly predict next season's stats, because there'll be white noise next season too.

By improperly using next season's stats in our prediction for next season, our predictions get an unfair advantage: next season's white noise gets incorporated into the estimates, resulting in estimates which do a better-than-truly-possible job of predicting next season's stats.

Situation (2) is somewhat different, but similar principles apply. This is universal in statistics: forecast errors are greater than estimation errors.

(On average, obviously. Sometimes by blind luck somebody's forecast turns out to be 100% correct. The part where Berri is correct is in saying that we cannot therefore jump to the conclusion that that is the best model. Blind squirrels, acorns, etc. Every time there's a stock market crash, there'll be dozens of people who'll whip out their investment newsletters showing that they'd predicted that crash months or even years before. But get enough blind squirrels, and the acorn will be found -- to bring in another animal analogy, it's like those monkeys typing Shakespeare. Those investment advisors are unlikely to be able to predict the next crash, and the lucky squirrel is unlikely to lead you to the next acorn.)

OTOH although Berri is correct that we should not blindly accept the model with the best predictions, needless to say model accuracy is of very high importance.
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by EvanZ »

mtamada wrote:
(On average, obviously. Sometimes by blind luck somebody's forecast turns out to be 100% correct. The part where Berri is correct is in saying that we cannot therefore jump to the conclusion that that is the best model. Blind squirrels, acorns, etc. Every time there's a stock market crash, there'll be dozens of people who'll whip out their investment newsletters showing that they'd predicted that crash months or even years before. But get enough blind squirrels, and the acorn will be found -- to bring in another animal analogy, it's like those monkeys typing Shakespeare. Those investment advisors are unlikely to be able to predict the next crash, and the lucky squirrel is unlikely to lead you to the next acorn.)
Just as players have skill and luck, so do statistical models/predictions, especially when we're talking about relatively small differences (and sample sizes). For a relevant example, everyone should look at the changes in Mike's prediction tracking thread. Maybe the order has settled down now, but if we run the same models again next year, that order will almost certainly change.
bbstats
Posts: 227
Joined: Thu Apr 21, 2011 8:25 pm
Location: Boone, NC
Contact:

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by bbstats »

Thread summary: Needs more out-of-sample.

EDIT: Although if anyone is in the mood, using advanced stats to predict possessions (rather than to predict rAPM) would seem extremely ideal for a box-score metric.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by J.E. »

@DSmok1: if you want, I can supply you with 1 year ('02), 2 year ('02+'03), 3 year ('02+'03+'04) (and so on) RAPM. You could estimate coefficients for each X-year RAPM and those can then be used to make predictions in left out years.
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by DSMok1 »

J.E. wrote:@DSmok1: if you want, I can supply you with 1 year ('02), 2 year ('02+'03), 3 year ('02+'03+'04) (and so on) RAPM. You could estimate coefficients for each X-year RAPM and those can then be used to make predictions in left out years.
Sounds great!

I'd like to see 02-06 and compare with 07-11, and also have even/odd splits (02-04-06-08 vs. 03-05-07-09). Smaller groupings (like 3 year samples) would also be useful. I could also approximate RAPM's std errors, by looking at the R^2 on my regression--it should by definition go up or at least stay the same with fewer-year samples; the only reason it wouldn't would be from higher error on the RAPM side.

Remember, these have to be equally weighted.

Let me know when/if you can get to putting some of these up online. Thanks!
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
Crow
Posts: 10623
Joined: Thu Apr 14, 2011 11:10 pm

Re: nba-retrodiction-contest-part-3-the-perfect-blend

Post by Crow »

An actual summary of Alex's articles & the commentary there and here would be handy if anyone was willing to attempt. What the experts now agree to, what is still subject to contention / difference of perspectives. It is a lot to digest. Which metrics come out best on explanatory and predictive power and overall on the best / fairest tests? Do the best blends for each test still trump any metric by itself? Amidst all the discussion of detail, it might help to plainly state these things (again).
Post Reply