Effect of Roster Turnover on Team Win Predictions
Posted: Tue Aug 23, 2016 3:23 pm
Here in APBR Prediction contests, the goal is to predict team wins. Everyone is encouraged to use everything at their disposal to win this friendly contest. Generally, each entry is the result of the proper combination of lots of prediction components. Player evoluation metrics, rookie evoluation models, SOS, minutes projection, HCA and B2B effects, regression to the mean and sometimes subjectivity.
But rarely, we make a mistake of thinking the winning entries are those that value and predict player performances better than others regardless of the team context. Some years, it can be true. Some years it's not. However if one thing's certain, the winning entries are better than others when it comes to prediction at team-level.
For example, in the last two years, AJbasket's PT-PM and RPM blend won the contest. That has given us hope that metrics based on PT data has potential to be better than other box-score and RAPM variants. Is that really true or there is more to this story?
I have always said the best metric at evaluating player value should be the one that's out-of-sample prediction at 100% roster turnover rate is better than others. When you predict team wins, given the fact that there's nearly 65% of average roster continuity in the league, the entries that capture team-level performance better than others tend to do better in the APBR prediction contests.
So, how can we know which entries are better at predicting individual player performance regardless of the team context? Since there's only a very small sample if we just consider games that have 100% of minutes from new players, I'm going to simply give weight to the roster turnover rate of each team when we calculate RMSE and MAE.
The tricky part is deciding on how much weight we should give to roster turnover. In theory, we should give as much weight as needed to better spread out entries and differentiate those which are better at capturing individual player value than others. In order to decide on that I've come with;
Roster Turnover Rate = Minutes From New Players/ Total Team Minutes*100 (You can find a nice chart of roster continuity it at http://www.basketball-reference.com/friv/continuity.cgi)
W-RMSE: Roster turnover weighted RMSE.
W^2-RMSE: Squared roster turnover weighted RMSE
W^4-RMSE: 4th-power of roster turnover weighted RMSE
W^8-RMSE: 8th-power of roster turnover weighted RMSE
W-MAE: Roster turnover weighted MAE.
W^2-MAE: Squared roster turnover weighted MAE
W^4-MAE: 4th-power of roster turnover weighted MAE
W^8-MAE: 8th-power of roster turnover weighted MAE
I calculated above for each entry at 2014-15 and 2015-16 prediction contests. In order to get a better understanding of how the roster turnover affected the entries, I made RMSE and MAE charts for each season. Since there are lots of entries, the images are a little bit big. You should click on them to actually see what's going on and you may need sharp eyes
2014-15 MAE

2014-15 RMSE

2015-16 MAE

2015-16 RMSE

AVERAGE ENTRY'S MAE

AVERAGE ENTRY'S RMSE

If you analyse the charts, you can quickly grasp that the difference between better and worse entries got bigger. In average, those entries that are better at predicting wins for those teams with the high roster turnover rate either started to catch the best entries or surpassed them and got well ahead.
However, it hit the wall at W^8 and most of the entries get better at that weight. (Especially average entries charts come in handy in this situation.) The reasons for that are simple enough. At that weight power, predictions for those teams with a few highest roster turnover rates such as PHI are weighted immensely (can be considered a small sample) and those kind of teams are generally on the rebuilding stage, getting new rookies via trading for draft picks and picking -at best- average level players from the free agency. These teams are known to win only a handful of games and APBR prediction contest participants are well aware of the rebuilding teams' succes along with the values of rookies and replacement level players. So at W^8, entries are getting pseudo-improvements thus making it harder to attribute the improvent to better independent player evaluation.
In theory, if I would make a chart of single metrics such as PER, BPM, WS, RAPM etc. for the last 20 years instead of APBR entries and leave the rookies out, this issue wouldn't be a problem and the metrics would differentiate themselves more and more with more weight on roster turnover and eventually the best one at capturing individual player value would surpass others. Wheter it would require W^8 or W^16 is irrelevant and besides the point. (until the sample becomes very small)
If we go back to the actual topic, after analyzing the charts and considering the problem with W^8, it's better and wiser to use W^4 and make a list of entries by W^4-RMSE and W^4-MAE. I should mention here I like RMSE more because it's more reliable and precise and a better indicator of the model if we're predicting team wins. If we were predicting score margin, MAE might be better. As a side note, I definetely don't believe in pythagorean wins for NBA games because of variety of reasons. So I used actual wins.
Anyways, here are the lists for 2014-15 and 2015-16 APBR contest entries sorted by W^4-RMSE and W^4-MAE.
2014-15 MAE
2014-15 RMSE
2015-16 MAE
2015-16 RMSE
Now, what to make of these results?
1. When we look at the entries' W^4 placements and positional changes, we understand that (Roster Turnover Weight)^4 is not enough to completely differentiate better player valuers from worse.
For example, in 2014-15 HoopDon and especially in 2015-16 Statman and to a degree Accuscore was highly-probably going to surpass other entries with more weight on roster turnover.
2. However, because of the reasons that were shared above, we can't use more weight. At W^8 it becomes problematic and noisy and the sample gets too small. For some entries, it might prove to be resourceful to check how the values change at W^8 or at even more weights but it wouldn't be reliable enough to make a list for.
3. When we analyze the first two points, it's obvious the only way for making a reliable roster-turnover weighted list for APBR entries is going for W^4 and then evaluating positional/value changes to decide for which entries are the best at team-independent player performance prediction/evaluation.
Also, we should give the benifit of the doubt to all participants since some significant changes may relate to simple randomness and pure luck.
From now on, I'll try to calculate roster turnover weighted results for each years' APBR prediction contests.
In the future, I'll do a same kind of test between 2001-2016 for RAPM, WS, PER, MPG, USG and BPM for years 2015 and 2016. Fortunately randomness, luck, small sample size, problems with w^8 won't be a problem for them and we'll see how roster turnover changes things for those metrics.
But rarely, we make a mistake of thinking the winning entries are those that value and predict player performances better than others regardless of the team context. Some years, it can be true. Some years it's not. However if one thing's certain, the winning entries are better than others when it comes to prediction at team-level.
For example, in the last two years, AJbasket's PT-PM and RPM blend won the contest. That has given us hope that metrics based on PT data has potential to be better than other box-score and RAPM variants. Is that really true or there is more to this story?
I have always said the best metric at evaluating player value should be the one that's out-of-sample prediction at 100% roster turnover rate is better than others. When you predict team wins, given the fact that there's nearly 65% of average roster continuity in the league, the entries that capture team-level performance better than others tend to do better in the APBR prediction contests.
So, how can we know which entries are better at predicting individual player performance regardless of the team context? Since there's only a very small sample if we just consider games that have 100% of minutes from new players, I'm going to simply give weight to the roster turnover rate of each team when we calculate RMSE and MAE.
The tricky part is deciding on how much weight we should give to roster turnover. In theory, we should give as much weight as needed to better spread out entries and differentiate those which are better at capturing individual player value than others. In order to decide on that I've come with;
Roster Turnover Rate = Minutes From New Players/ Total Team Minutes*100 (You can find a nice chart of roster continuity it at http://www.basketball-reference.com/friv/continuity.cgi)
W-RMSE: Roster turnover weighted RMSE.
W^2-RMSE: Squared roster turnover weighted RMSE
W^4-RMSE: 4th-power of roster turnover weighted RMSE
W^8-RMSE: 8th-power of roster turnover weighted RMSE
W-MAE: Roster turnover weighted MAE.
W^2-MAE: Squared roster turnover weighted MAE
W^4-MAE: 4th-power of roster turnover weighted MAE
W^8-MAE: 8th-power of roster turnover weighted MAE
I calculated above for each entry at 2014-15 and 2015-16 prediction contests. In order to get a better understanding of how the roster turnover affected the entries, I made RMSE and MAE charts for each season. Since there are lots of entries, the images are a little bit big. You should click on them to actually see what's going on and you may need sharp eyes

2014-15 MAE

2014-15 RMSE

2015-16 MAE

2015-16 RMSE

AVERAGE ENTRY'S MAE

AVERAGE ENTRY'S RMSE

If you analyse the charts, you can quickly grasp that the difference between better and worse entries got bigger. In average, those entries that are better at predicting wins for those teams with the high roster turnover rate either started to catch the best entries or surpassed them and got well ahead.
However, it hit the wall at W^8 and most of the entries get better at that weight. (Especially average entries charts come in handy in this situation.) The reasons for that are simple enough. At that weight power, predictions for those teams with a few highest roster turnover rates such as PHI are weighted immensely (can be considered a small sample) and those kind of teams are generally on the rebuilding stage, getting new rookies via trading for draft picks and picking -at best- average level players from the free agency. These teams are known to win only a handful of games and APBR prediction contest participants are well aware of the rebuilding teams' succes along with the values of rookies and replacement level players. So at W^8, entries are getting pseudo-improvements thus making it harder to attribute the improvent to better independent player evaluation.
In theory, if I would make a chart of single metrics such as PER, BPM, WS, RAPM etc. for the last 20 years instead of APBR entries and leave the rookies out, this issue wouldn't be a problem and the metrics would differentiate themselves more and more with more weight on roster turnover and eventually the best one at capturing individual player value would surpass others. Wheter it would require W^8 or W^16 is irrelevant and besides the point. (until the sample becomes very small)
If we go back to the actual topic, after analyzing the charts and considering the problem with W^8, it's better and wiser to use W^4 and make a list of entries by W^4-RMSE and W^4-MAE. I should mention here I like RMSE more because it's more reliable and precise and a better indicator of the model if we're predicting team wins. If we were predicting score margin, MAE might be better. As a side note, I definetely don't believe in pythagorean wins for NBA games because of variety of reasons. So I used actual wins.
Anyways, here are the lists for 2014-15 and 2015-16 APBR contest entries sorted by W^4-RMSE and W^4-MAE.
2014-15 MAE
Code: Select all
Org.Position Team MAE W-MAE W^2-MAE W^4-MAE W^8-MAE Change
-------------- ------------------ ------ ------- --------- --------- --------- --------
2 AJ-PTPM 6.03 6.29 6.41 6.23 4.95 1
7 HoopDon 6.60 6.85 6.90 6.56 5.18 5
11 italia13calcio 6.90 6.88 6.82 6.68 6.22 8
3 Average 6.14 6.46 6.66 6.69 5.69 -1
14 AcrossTheCourt 7.09 7.21 7.17 6.77 5.29 9
5 Crow 6.27 6.64 6.83 6.85 6.05 -1
9 DrPositivity 6.73 6.89 6.99 6.90 6.19 2
10 ESPNFallForecast 6.80 7.00 7.09 6.93 5.70 2
4 FiveThirtyEight 6.20 6.66 6.94 7.04 6.00 -5
6 mystic 6.34 6.80 7.07 7.10 5.78 -4
1 bbstats 6.00 6.56 6.94 7.27 6.85 -10
16 v-zero 7.30 7.48 7.49 7.27 5.94 4
12 Sportsbook 6.95 7.37 7.57 7.59 6.72 -1
8 nbacouchside 6.67 7.17 7.49 7.64 6.88 -6
13 fpliii 6.99 7.45 7.78 7.92 7.21 -2
15 sndesai1 7.17 7.70 7.99 7.96 6.65 -1
17 Bobbofitos 7.93 8.12 8.22 8.27 7.46 0
18 MikeG 8.15 8.69 9.17 9.92 10.03 0
Code: Select all
Org.Position Team RMSE W-RMSE W^2-RMSE W^4-RMSE W^8-RMSE Change
-------------- ------------------ ------- -------- ---------- ---------- ---------- --------
3 Crow 8.09 8.37 8.58 8.69 8.03 2
1 AJ-PTPM 7.82 8.23 8.54 8.76 7.95 -1
8 HoopDon 8.63 8.91 8.99 8.79 7.61 5
6 italia13calcio 8.40 8.56 8.69 8.85 8.52 2
2 bbstats 8.01 8.33 8.62 8.90 8.39 -3
4 Average 8.16 8.47 8.72 8.93 8.30 -2
11 AcrossTheCourt 8.89 8.93 8.99 8.97 8.20 4
5 FiveThirtyEight 8.16 8.67 9.07 9.37 8.54 -3
9 nbacouchside 8.68 9.06 9.31 9.41 8.61 0
13 ESPNFallForecast 9.09 9.26 9.41 9.52 8.84 3
15 v-zero 9.14 9.32 9.46 9.58 8.87 4
7 mystic 8.40 8.89 9.29 9.64 8.91 -5
14 fpliii 9.11 9.48 9.67 9.65 8.87 1
10 DrPositivity 8.80 9.26 9.58 9.76 9.07 -4
12 Sportsbook 9.06 9.41 9.69 9.97 9.49 -3
16 Bobbofitos 9.64 9.79 9.97 10.20 9.76 0
17 sndesai1 9.70 10.50 10.99 11.27 10.31 0
18 MikeG 10.05 10.52 11.00 11.70 11.64 0
2015-16 MAE
Code: Select all
Org Position Participant MAE W-MAE W^2-MAE W^4-MAE W^8-MAE Change
-------------- --------------------------- ------ ------- --------- --------- --------- --------
3 bbstats 5.83 5.79 5.65 5.30 4.74 2
17 AccuScore 6.70 6.35 6.14 5.76 5.29 15
2 kmedved 5.73 5.91 6.02 5.97 5.41 -1
6 tarrazu 5.87 5.87 5.95 6.06 6.13 2
5 nbacouchside 5.87 6.08 6.22 6.27 6.04 0
4 Nylon Calculus Aggregated 5.83 6.18 6.36 6.38 6.04 -2
29 statman 7.33 6.70 6.53 6.45 6.41 22
9 DSmok1 6.00 6.21 6.38 6.60 6.89 1
11 CBS Sportsline 6.40 6.61 6.72 6.67 6.13 2
1 AJbaskets 5.73 6.11 6.35 6.68 7.09 -9
7 caliban 5.93 6.35 6.63 6.81 6.57 -4
20 Sports Illustrated 6.93 6.99 7.01 6.92 6.62 8
8 ampersand5 6.00 6.44 6.73 7.00 7.09 -5
16 Crow 6.67 6.87 6.99 7.07 6.84 2
10 Average 6.27 6.59 6.84 7.08 7.11 -5
12 Five Thirty Eight 6.40 6.76 6.99 7.10 6.89 -4
15 AcrossTheCourt 6.62 6.85 7.04 7.16 7.01 -2
18 fpliii 6.73 6.94 7.10 7.25 7.17 0
19 sndesai1 6.87 6.94 7.08 7.32 7.64 0
13 Jim Jividen 6.47 6.85 7.13 7.43 7.58 -7
25 Yooper 7.20 7.45 7.56 7.49 6.97 4
14 rsmth 6.47 6.85 7.13 7.49 7.76 -8
24 italia13calcio 7.17 7.36 7.54 7.63 7.01 1
30 15 Pyth + Regression 7.66 7.65 7.77 7.71 7.09 6
27 ESPN Fall Forecast 7.23 7.48 7.72 8.07 8.25 2
28 Dr Positivity 7.27 7.62 7.80 8.11 9.09 2
23 LV Westgate 10/26 7.13 7.42 7.75 8.18 8.16 -4
22 Mike G 7.07 7.30 7.70 8.33 8.85 -6
21 BasketDork 7.00 7.22 7.58 8.35 9.37 -8
33 SportsFormulator 8.03 8.17 8.26 8.46 8.90 3
31 nrestifo 7.67 8.20 8.55 8.63 7.66 0
26 numberFire 7.20 7.59 8.04 8.68 9.40 -6
32 EvanZ 7.97 8.15 8.39 8.84 9.33 -1
34 Bleacher Report 8.40 8.68 8.92 9.17 9.01 0
35 tacoman206 8.60 9.11 9.49 9.87 9.69 0
2015-16 RMSE
Code: Select all
Org Position Participant RMSE W-RMSE W^2-RMSE W^4-RMSE W^8-RMSE Change
-------------- --------------------------- ------- -------- ---------- ---------- ---------- --------
4 bbstats 7.34 7.31 7.15 6.71 5.85 3
3 kmedved 7.13 7.21 7.26 7.18 6.67 1
5 Nylon Calculus Aggregated 7.37 7.51 7.52 7.35 6.77 2
9 nbacouchside 7.66 7.59 7.55 7.37 6.84 5
27 statman 8.89 7.89 7.61 7.48 7.37 22
1 AJbaskets 7.03 7.27 7.39 7.52 7.64 -5
7 DSmok1 7.58 7.56 7.55 7.53 7.50 0
11 tarrazu 7.84 7.70 7.66 7.59 7.51 3
6 ampersand5 7.38 7.57 7.67 7.74 7.61 -3
2 caliban 7.12 7.42 7.62 7.75 7.48 -8
8 Average 7.62 7.70 7.81 7.88 7.70 -3
23 AccuScore 8.58 8.31 8.22 7.93 7.14 11
15 Crow 8.10 8.10 8.08 7.99 7.60 2
17 AcrossTheCourt 8.31 8.25 8.25 8.11 7.66 3
14 Five Thirty Eight 8.06 8.22 8.30 8.19 7.70 -1
12 Jim Jividen 7.92 8.02 8.14 8.25 8.18 -4
16 fpliii 8.21 8.26 8.32 8.40 8.22 -1
18 Sports Illustrated 8.34 8.48 8.54 8.47 8.08 0
10 rsmth 7.73 8.08 8.35 8.67 8.75 -9
13 CBS Sportsline 8.02 8.41 8.62 8.68 8.18 -7
19 sndesai1 8.54 8.59 8.66 8.71 8.60 -2
21 Yooper 8.58 8.76 8.83 8.74 8.32 -1
20 ESPN Fall Forecast 8.56 8.63 8.79 9.06 9.14 -3
22 BasketDork 8.58 8.61 8.78 9.24 9.85 -2
28 SportsFormulator 9.12 9.20 9.28 9.50 9.95 3
24 Mike G 8.63 8.59 8.93 9.54 10.01 -2
26 LV Westgate 10/26 8.74 8.91 9.18 9.62 9.81 -1
34 15 Pyth + Regression 9.73 9.68 9.77 9.67 9.02 6
25 numberFire 8.73 9.00 9.37 9.88 10.40 -4
30 EvanZ 9.28 9.39 9.55 9.89 10.13 0
29 italia13calcio 9.27 9.62 9.94 10.22 9.87 -2
32 nrestifo 9.47 9.94 10.27 10.47 9.83 0
33 Bleacher Report 9.58 9.92 10.24 10.65 10.69 0
31 Dr Positivity 9.44 10.05 10.36 10.77 11.73 -3
35 tacoman206 10.19 10.78 11.20 11.58 11.38 0
1. When we look at the entries' W^4 placements and positional changes, we understand that (Roster Turnover Weight)^4 is not enough to completely differentiate better player valuers from worse.
For example, in 2014-15 HoopDon and especially in 2015-16 Statman and to a degree Accuscore was highly-probably going to surpass other entries with more weight on roster turnover.
2. However, because of the reasons that were shared above, we can't use more weight. At W^8 it becomes problematic and noisy and the sample gets too small. For some entries, it might prove to be resourceful to check how the values change at W^8 or at even more weights but it wouldn't be reliable enough to make a list for.
3. When we analyze the first two points, it's obvious the only way for making a reliable roster-turnover weighted list for APBR entries is going for W^4 and then evaluating positional/value changes to decide for which entries are the best at team-independent player performance prediction/evaluation.
Also, we should give the benifit of the doubt to all participants since some significant changes may relate to simple randomness and pure luck.
From now on, I'll try to calculate roster turnover weighted results for each years' APBR prediction contests.
In the future, I'll do a same kind of test between 2001-2016 for RAPM, WS, PER, MPG, USG and BPM for years 2015 and 2016. Fortunately randomness, luck, small sample size, problems with w^8 won't be a problem for them and we'll see how roster turnover changes things for those metrics.