Page 1 of 2
nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 3:39 am
by Crow
http://sportskeptic.wordpress.com/2012/ ... ect-blend/
"As you can see, the explanatory blend does a better job of explaining what happened than any individual metric. Similarly, the predictive blend does a better job at predicting than any individual metric. But they rely on different combinations; explaining what happened is mostly Wins Produced and ASPM whereas predicting what will happen is a fairly equal blend of four metrics contrasted against a fifth; only PER gets left out."
I suggested Alex pursue this line of thought in the past. I am glad he did, whether I influence him much or not.
"ASPM appears to do a really good job overall. It describes what happened well, it predicts the next season well, and it contributes a good amount to both of the best blends."
Perhaps DSMok1's performance with ASPM may be related to his self-diagnosed issue with player minute assumptions and / or it might be an aberation.
"Comparing new WP to old, we see that it explains what happened slightly worse but makes predictions slightly better." I guess defensive rebounding is not that pivotal...
at team level. Could still be at player level.
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 3:54 am
by J.E.
There are soo many things wrong with that analysis though. I wouldn't trust anything that comes out of this
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 4:04 am
by Crow
Could you share a few of those concerns? I'd appreciate your review.
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 2:02 pm
by J.E.
here's what I don't like:
-where did he get "old" RAPM for 2000 and 2001 from? I'm 99.9% sure it's never been published. There's no PBP data for those years, so where do they come from? A table that lists things that never existed is not a good start for a blog post that is supposed to give us meaningful numbers.
-in the same vein, how come he doesn't have new RAPM for 2002-2006? It's been there for months. If he got RAPM before 2002-2006 was added, that probably means he's using an older version of prior informed RAPM
-He compares metrics according to their average error, but doesn't care if the average was build over a different set of seasons (what is this..)
-Comparing metrics to predict end-of-season point differential is like asking the weather man how many times it will rain in the next week. What we care about is what day it will rain on. If two metrics agree on predicted point differential for a specific team, it might be for entirely different reasons. I guess it's OK to not go down to possession level for this, but game level would be nice.
-I would have tried to to see what regression to the mean can do for each metric.
-last but not least: The point of retrodiction is to not use data from the "future". Once you do, it's not retrodiction anymore. As far as I see, ASPM uses 8 year RAPM (from 2002 to 2010 I guess) to build a player metric, then it gets used to "predict" years from 2002 to 2010. How is that not using future information?
Further, aside from the actual results and how he got there, he makes lots of (not so important) statements that are flat out wrong. Sometimes he contradicts his own wrong statement in the very next sentence. Makes it hard to read
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 2:38 pm
by DSMok1
J.E. wrote:here's what I don't like:
-where did he get "old" RAPM for 2000 and 2001 from? I'm 99.9% sure it's never been published. There's no PBP data for those years, so where do they come from? A table that lists things that never existed is not a good start for a blog post that is supposed to give us meaningful numbers.
Valid--ask him.
J.E. wrote:-in the same vein, how come he doesn't have new RAPM for 2002-2006? It's been there for months. If he got RAPM before 2002-2006 was added, that probably means he's using an older version of prior informed RAPM
Valid question
J.E. wrote:-He compares metrics according to their average error, but doesn't care if the average was build over a different set of seasons (what is this..)
That is a definite issue.
J.E. wrote:-Comparing metrics to predict end-of-season point differential is like asking the weather man how many times it will rain in the next week. What we care about is what day it will rain on. If two metrics agree on predicted point differential for a specific team, it might be for entirely different reasons. I guess it's OK to not go down to possession level for this, but game level would be nice.
This was meant as a first pass, a very basic comparison. Certainly, the measure of "explaining what happened" was not a particularly worthwhile thing to evaluate.
J.E. wrote:-I would have tried to to see what regression to the mean can do for each metric.
Again, this was a first pass. I know for sure ASPM does much better with regression to the mean.
J.E. wrote:-last but not least: The point of retrodiction is to not use data from the "future". Once you do, it's not retrodiction anymore. As far as I see, ASPM uses 8 year RAPM (from 2002 to 2010 I guess) to build a player metric, then it gets used to "predict" years from 2002 to 2010. How is that not using future information?
I'm don't believe that is much of an issue--I don't think that would make much difference in whether the metric is accurate or not.
J.E. wrote:Further, aside from the actual results and how he got there, he makes lots of (not so important) statements that are flat out wrong. Sometimes he contradicts his own wrong statement in the very next sentence. Makes it hard to read
That's not particularly kind, J.E.. I think it's a good starting point; we can build upon and improve upon this baseline evaluation.
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 2:44 pm
by mystic
I agree, J.E., there is a lot of things not correct. I made a comment about some things in part 2.
If we use, for example, the last 4 years only:
ASPM: 2.52
newRAPM: 2.62
WS: 2.73
newWP: 2.84
oldWP: 2.98
PER: 3.21
oldRAPM: 3.21
As we can see, newRAPM (whatever that really is) is second overall. When I make that test with a new version of my SPM, I get 2.60 as average error for the last 4 years, but 2.32 for the last 10. An ensemble of random picked numbers gives me around 5 as average error, the prior season team performance 3.48 for the last 4 years, while being 3.05 for the 10 years.
Taking all that into account we can very well expect that RAPM would perform better for the other 6 years as well. Which makes Alex' conclusions just wrong.
Let us take the 4yr average of 3.48 and the 10 yr of 3.05 as a way to adjust for different timespans. Meaning. ASPM*3.05/3.48=2.21; WS would be 2.39, newWP would be 2.49, my SPM would be 2.28. And RAPM would 2.3. ASPM has in reality 2.24, WS has 2.37, my SPM has 2.32, while newWP has 2.63.
Well, we are pretty close for WS, ASPM and SPM, but farther away for newWP (which can be used as argument that WP is actually not really stable). Now, we can pretty much say that RAPM is at least as useful for predictions as ASPM or WS or my SPM. Makes much more sense to arrive at my conclusion than saying that RAPM is as good as WP.
Additional to that he should have used an estimation of the rookie performances based on either a model via draft position or age or whatever instead of using the real values. The thing is, when a team is using a lot of minutes for rookies, the boxscore-based metrics will look better than they really are, because a lot of the used minutes are already from the tested season. That is not exactly an out-of-sample test. It would make such a test more complicated, but I think everyone can at least quickly build model to make a predicion of the rookie performance based on draft positions.
Another issue is that RAPM is adjusted for the strength of the opponents. The teams are not playing the same SOS, that will influence the results. My metric for example goes down to 2.14 in average error to the SRS instead of the MOV. I suspect a similar thing for RAPM. While I suspect other metrics to have a similar error like they have now or even getting slightly worse.
Maybe we can set up a bigger test, a test in which multiple people participate. Well, RAPM is imho only really useful from 2003/04 on, because the 2002 dataset is smaller. So, everyone would be asked to prepare a dataset from 2003/04 on by just using prior seaons informations and predicted values for rookies. We can even devide such a test into a version with multiple seasons used and with only one season used. We might as well learn something about the influence of injuries, etc. to the predictive power. It might also give us a better clue about how to use the metrics as predictive tools.
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 2:57 pm
by J.E.
J.E. wrote:-Comparing metrics to predict end-of-season point differential is like asking the weather man how many times it will rain in the next week. What we care about is what day it will rain on. If two metrics agree on predicted point differential for a specific team, it might be for entirely different reasons. I guess it's OK to not go down to possession level for this, but game level would be nice.
This was meant as a first pass, a very basic comparison. Certainly, the measure of "explaining what happened" was not a particularly worthwhile thing to evaluate.
If it's meant as a first pass, then why does he already make definite statements concerning metric accuracy
J.E. wrote:-I would have tried to to see what regression to the mean can do for each metric.
Again, this was a first pass. I know for sure ASPM does much better with regression to the mean.
Everyone will do better, I'm sure. I'd guess WP would benefit the most from this
J.E. wrote:-last but not least: The point of retrodiction is to not use data from the "future". Once you do, it's not retrodiction anymore. As far as I see, ASPM uses 8 year RAPM (from 2002 to 2010 I guess) to build a player metric, then it gets used to "predict" years from 2002 to 2010. How is that not using future information?
I'm don't believe that is much of an issue--I don't think that would make much difference in whether the metric is accurate or not.
That's what you say, but how do we know for sure? I'm not knocking ASPM as a metric. I know it performed well in last years retrodiction contest without using future data. But here, it violates the one principle of retrodiction by using future information, and that should instantly disqualify the metric. I don't see how that's a point of discussion
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 3:01 pm
by mystic
J.E. wrote:
That's what you say, but how do we know for sure? I'm not knocking ASPM as a metric. I know it performed well in last years retrodiction contest without using future data. But here, it violates the one principle of retrodiction by using future information, and that should instantly disqualify the metric. I don't see how that's a point of discussion
I agree. ASPM should use a dataset from probably 2002 to 2006 to make prediction from 2007 to 2011 instead. We can very well asume that a different dataset will change the values.
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 3:04 pm
by J.E.
mystic wrote:Additional to that he should have used an estimation of the rookie performances based on either a model via draft position or age or whatever instead of using the real values. The thing is, when a team is using a lot of minutes for rookies, the boxscore-based metrics will look better than they really are, because a lot of the used minutes are already from the tested season. That is not exactly an out-of-sample test. It would make such a test more complicated, but I think everyone can at least quickly build model to make a predicion of the rookie performance based on draft positions.
I don't think he did that. I believe I read that he gave all rookies a -1.92. If he did what you said, that would be horrible (I somewhat sure he didn't , though)
Another issue is that RAPM is adjusted for the strength of the opponents. The teams are not playing the same SOS, that will influence the results. My metric for example goes down to 2.14 in average error to the SRS instead of the MOV. I suspect a similar thing for RAPM. While I suspect other metrics to have a similar error like they have now or even getting slightly worse.
Valid point. That would be a non issue if he did predictions on game level.
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 3:13 pm
by mystic
J.E. wrote:I don't think he did that. I believe I read that he gave all rookies a -1.92. If he did what you said, that would be horrible (I somewhat sure he didn't , though)
The method here is the same, except I’m only going to look at team point differential (not wins) and any player with fewer than 100 minutes played the previous season are granted their production for that season. This avoids any issues with rookies. It also makes the predictions more accurate overall, but that shouldn’t give any particular metric an advantage over the others.
http://sportskeptic.wordpress.com/2012/ ... -happened/
Maybe I misunderstood that part, but for me that means that every player with fewer than 100 minutes is assigned with the value of the tested season. And unlike Alex I think that this is a distinct advantage for a metric like WP, which would likely be end up similar to PER, if that is taken out.
Where did you get the -1.92 from?
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 3:45 pm
by J.E.
mystic wrote:J.E. wrote:I don't think he did that. I believe I read that he gave all rookies a -1.92. If he did what you said, that would be horrible (I somewhat sure he didn't , though)
The method here is the same, except I’m only going to look at team point differential (not wins) and any player with fewer than 100 minutes played the previous season are granted their production for that season. This avoids any issues with rookies. It also makes the predictions more accurate overall, but that shouldn’t give any particular metric an advantage over the others.
http://sportskeptic.wordpress.com/2012/ ... -happened/
Maybe I misunderstood that part, but for me that means that every player with fewer than 100 minutes is assigned with the value of the tested season. And unlike Alex I think that this is a distinct advantage for a metric like WP, which would likely be end up similar to PER, if that is taken out.
Where did you get the -1.92 from?
I first read
http://sportskeptic.wordpress.com/2011/ ... he-method/ and assumed the fix for low minute players was just for the "explaining what happened"-post. Seems I was wrong. Add that to the list of problems.
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 4:41 pm
by DSMok1
mystic wrote:J.E. wrote:
That's what you say, but how do we know for sure? I'm not knocking ASPM as a metric. I know it performed well in last years retrodiction contest without using future data. But here, it violates the one principle of retrodiction by using future information, and that should instantly disqualify the metric. I don't see how that's a point of discussion
I agree. ASPM should use a dataset from probably 2002 to 2006 to make prediction from 2007 to 2011 instead. We can very well asume that a different dataset will change the values.
We can do it on all years from 1978 to 2002--Alex has that full data set; it's available online here:
https://docs.google.com/leaf?id=0Bx1NfC ... x&hl=en_US
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 6:09 pm
by Guy
I'm not knocking ASPM as a metric. I know it performed well in last years retrodiction contest without using future data. But here, it violates the one principle of retrodiction by using future information, and that should instantly disqualify the metric. I don't see how that's a point of discussion
While it would be better to test ASPM developed on earlier years, this seems like a rather minor complaint. Unless I'm misunderstanding Daniel's method, the "future" data is not player-specific, but is only used to refine universally-applied values for the boxscore stats. I would expect that to produce only a tiny advantage vs. a totally pure out-of-sample test. And if we are concerned about unlevel playing fields, it seems to me that RAPM's use of multiple years of performance data is a bigger problem. When being compared to RAPM, shouldn't other metrics be evaluated based on a weighted multi-year average?
J.E. has made a number of valid criticisms (and a couple not so valid). But I would hope for a more generous tone. Alex has made his process totally transparent, he is responsive to suggestions for adding metrics, and as best I can tell he is simply trying to figure out the right answer without bias toward/against any particular metric. That all seems deserving of praise, to me.
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sat Jan 28, 2012 7:24 pm
by J.E.
Guy wrote:While it would be better to test ASPM developed on earlier years, this seems like a rather minor complaint.
There is a zero percent chance that ASPM performs just as well or better with no future data. I'd rather see it tested without future information instead of assuming that it's "minor" or "not of much of an issue". Nobody knows for sure, and unless you do know, you cannot exactly use it to make definite statements about it's prediction performance, at least not in that timeframe.
And if we are concerned about unlevel playing fields, it seems to me that RAPM's use of multiple years of performance data is a bigger problem. When being compared to RAPM, shouldn't other metrics be evaluated based on a weighted multi-year average?
I'm not exactly forcing people to use just one year. Everyone is free to use how many years they want.
J.E. has made a number of valid criticisms (and a couple not so valid). But I would hope for a more generous tone. Alex has made his process totally transparent, he is responsive to suggestions for adding metrics, and as best I can tell he is simply trying to figure out the right answer without bias toward/against any particular metric. That all seems deserving of praise, to me.
His tone in answer to many of the comments on his site isn't exactly kind, either. I also don't get the feeling that he's making a sincere effort to find the best metric, as evidenced by his design decisions and his sketchy interpretation of the results in favor of WP, coupled with statements like "PER and old RAPM aren’t even within a point, which is a pretty poor showing" "It also serves as a strike against old RAPM and PER, and to a lesser extent new RAPM and APM" (and so on..)
Re: nba-retrodiction-contest-part-3-the-perfect-blend
Posted: Sun Jan 29, 2012 7:08 am
by mtamada
J.E. wrote:Guy wrote:While it would be better to test ASPM developed on earlier years, this seems like a rather minor complaint.
There is a zero percent chance that ASPM performs just as well or better with no future data. I'd rather see it tested without future information instead of assuming that it's "minor" or "not of much of an issue". Nobody knows for sure, and unless you do know, you cannot exactly use it to make definite statements about it's prediction performance, at least not in that timeframe.
Agreed, it's huge: out-of-sample testing is way more valid -- and almost always has substantially larger error terms -- than using the same sample to estimate the parameters and measure the errors. Especially if the model makes use of various fudge factors, normalization, seasonal adjustments, etc. Those can be smart things to use to better describe what happened during a season, but when it comes to true prediction or even true retrodiction, you don't get to use next year's results when coming up with predictions of next year's results. (The fact that that last sentence is hard to parse illustrates the inherent nonsensical circularity of the procedure).
Don't get me wrong, in statistics we use in-sample measures of errors and goodness-of-fit all the time. But (if we're wise) we recognize the inherent limitations of such results: you can come up with a model which "explains" what happened in the data real well. But it might not work for figuring out what to do for next year. (Just ask economists who were using macroeconomic models in 2006, and failed to see the size of the upcoming Great Recession. Or Wayne Winston after he disparaged Kevin Durant in 2008, his rookie season.)