Guy wrote:Agreed, it's huge: out-of-sample testing is way more valid
I don't see it. There are only two possibilities here:
1) The relationship between boxscore stats and real value is quite constant over time. In this case, using the "future" data to estimate coefficients won't matter.
OR
2) The relationship between the stats and value changes over time. And in that case, using a 10-year average for the ASPM coefficients isn't much of an advantage. What would be much better then is to predict a given year using a model based on the prior 3-4 years.
So either way, I don't see how ASPM has any big unfair advantage here. Certainly nothing approaching RAPM's. (If we were talking about customized annual coefficients, based on the year being predicted, then I could see a problem. )
You've left out the third possibility: white noise i.e. random error. I.e. suppose your situation (1) occurs. The true value can be constant, but the data are ALWAYS subject to randomness, and our estimate of the true values WILL fluctuate from season to season. Guaranteed.
And even if we did have 100% correct estimates, they would not perfectly predict next season's stats, because there'll be white noise next season too.
By improperly using next season's stats in our prediction for next season, our predictions get an unfair advantage: next season's white noise gets incorporated into the estimates, resulting in estimates which do a better-than-truly-possible job of predicting next season's stats.
Situation (2) is somewhat different, but similar principles apply. This is universal in statistics: forecast errors are greater than estimation errors.
(On average, obviously. Sometimes by blind luck somebody's forecast turns out to be 100% correct. The part where Berri is correct is in saying that we cannot therefore jump to the conclusion that that is the best model. Blind squirrels, acorns, etc. Every time there's a stock market crash, there'll be dozens of people who'll whip out their investment newsletters showing that they'd predicted that crash months or even years before. But get enough blind squirrels, and the acorn will be found -- to bring in another animal analogy, it's like those monkeys typing Shakespeare. Those investment advisors are unlikely to be able to predict the next crash, and the lucky squirrel is unlikely to lead you to the next acorn.)
OTOH although Berri is correct that we should not blindly accept the model with the best predictions, needless to say model accuracy is of very high importance.