OK, prompted by a question someone asked on another site, I went ahead and compared LambdaPM's ability to predict the margin of victory for the home team to Vegas. Basically, you can consider the Vegas line as an estimator and see how well the two compare.
Of course, LambdaPM "cheats" a bit in the following two ways:
- (A) When training the LambdaPM algorithm, it uses a full end of season box score rather than whatever the league box score is up to that point (just because it is a hassle getting the correct league-wide box score for say the Nth game of the season. It isn't impossible to do, just a hassle I didn't want to deal with.) I don't think this is too big a deal though, you don't expect league-wide per 36 minute stats to change that much.
(B) When coming up with the prediction for a game, LambdaPM is given access to how many possessions each player plays in the game. But if you think about it, this isn't a dealbreaker either, because it probably isn't hard to predict how many minutes important players will play in each game.
Anyway, so I wrote a bit of code to grab all the Vegas game predictions for the 2010-2011 season (there is a website covers.com that has them available.)
Then for each of the techniques I considered in the paper (home court advantage predictor, APM, LambdaPM(R), LambdaPM(R,2)) I basically used each of them as a rule to place bets.
Basically, if the technique differs in Cutoff=1, 2,3,4,5 from what Vegas predicts, then I place a "bet." Since we also know what the final margin is, we can see whether the bet was a winning one or a losing one.
As an example, the HCA estimate predicts that the home team will win by roughly 3 points. If the Vegas line is -6, then this means Vegas is predicting the home team to win by 6 points. If my cutoff is 1, then the difference between the HCA prediction and Vegas's prediction is big enough for me to place a bet.
So in short, we train each of these algorithms on the first 820 (and also separately, first 410 and first 205) games of the regular season, then see how well they would have done in the last 410 if we'd used them to gamble (subject to caveats A and B above, of course.)
Here are the results for training on the first 820 games, evaluating on last 410:
The green and the blue rows are the most interesting ones. Let's focus on cutoff=5. Setting the cutoff that high means that we want a disagreement betwen the technique in question and vegas's estimate to be at least 5 before we decide to place a bet. As one sort of expects, using the HCA estimate even with this cutoff is pretty bad; your winning percentage is no better than the coinflipper. APM, LambdaPM(R,1) and LambdaPM(R,2) all do pretty well, getting winning percentages of 54.1%, 57.1%, and 55.1%, respectively.
Of course, we cannot draw too much from this since I'm only evaluating on 410 games...the sample size is too small to say anything. But it is kind of interesting still, I think.
You can also take a look at training_size=410, training_size=205 here:
https://spreadsheets.google.com/spreads ... y=CJn-xegG
Take a look at training_size=205 especially, the third sheet. APM, LambdaPM(R,1), LambdaPM(R,2) do much worse than before, posting winning percentages of 48.5%, 51.7%, and 52.6%, respectively (probably not much better than coin-flipping.)
I have two guesses regarding this poor performance:
- 1) Perhaps 205 games is simply too few games to build up a good model of the NBA. You could probably improve the performance a lot by incorporating data from the 2009-2010 season. I'm actually a bit curious to see how much this would help.
2) It is also possible that the regularization parameters I chose are poor; I just used the same regularization parameters obtained from cross-validation on 810 games. So there might be some improvement from re-estimating the regularization params.
I also took a look at some of the statistical properties of Vegas's estimate for the final margin of victory versus LambdaPM.
The same story seems to be going on...LambdaPM (arguably) outperforms the Vegas line quite when it has 820 games to train on, but gets its teeth kicked in with only 205 (See Table #2 and Table #7 of the paper linked on the first page:
https://docs.google.com/viewer?a=v&pid= ... y=CJ6UzpUB).
I think the next step is for me to go back and take a look at this 2009-2010 NBA dataset, and figure out a good way of pooling old and new data. I guess doing this also means I can have something for this retrodiction challenge.