Draft projection models

VJL · Post by **VJL** » Mon May 20, 2013 2:28 pm

I built a few draft projection tools that I wanted to share here. They all use the NCAA data available on Basketball-Reference and Draft Express. The dataset starts around 1983, and then starts to approach complete coverage of draft candidates in the mid-90s.

You can view the 1983 to 2012 retrodictions for models 1 and 2 here (https://docs.google.com/spreadsheet/ccc ... SM0E#gid=0)
1983 to 2013 model 3 comparisons here (https://docs.google.com/spreadsheet/ccc ... NTVE#gid=0)
And 2013 predictions for models 1 and 2 here (https://docs.google.com/spreadsheet/ccc ... sp=sharing)

Model 1 (Peak Win Shares):
The goal of the first model is to predict how many win shares a player is expected to produce in his peak season before his 26th birthday. I found each NBA player’s peak pre-26 season in terms of win shares, or in the case of more recent players I estimated how many win shares they would collect in their age-25 season. I then used mixed-effect models with all of the basic box-score stats, age (in days), position, SOS, SRS, and a couple interactions as fixed effects and "era" (college seasons in 5-year blocks from 1980-85 to 2010+) as random effects to predict “peak win shares.” I ran a unique model with all players except ‘ego’ for each player in each year to keep the retrodictions out of sample.
After computing this value for each player in each year I ran an additional regression using “most recent college season prediction”, “2nd most recent college season prediction”, “n’th most recent college season prediction”… to explain observed win share peak. This sets the weights for each college year and allows me to compute one value for the player in the year he was drafted.

Model 2 (Outcome likelihood):
This model attempts to capture the high-variance gambling nature of the draft. Rather than pegging a specific expected production to each player, it gives them percent likelihoods of being a "bust" (0 WS), "bench-warmer" (> 0, < 5 WS), "starter" (> 5, < 10 WS), or "star" (> 10 WS) at their peak performance. This model uses multinomial regression with most of the same predictors as model 1 (though it includes height and weight).
For now this model does not try to account for the information given in each college season, but instead only looks at a player’s final season. I may add a similar function in the near future however.

Model 3 (Comparison finder):
This is just a fun little model that helps find past player seasons that are similar to ego’s. All it does is look for the players who minimize the absolute difference in average standard deviation across a set of statistics. The actual math chosen was a bit arbitrary and someone may conjure a better version, but here is what I am using for now:

Code: Select all

((abs(2P.X – 2P.Y) + abs(2PA.X – 2PA.Y) + abs(3P.X – 3P.Y) + abs(3PA.X – 3PA.Y) + abs(FTA.X – FTA.Y) + abs(FT.X – FT.Y))/3 + 
(abs(AST.X – AST.Y) + abs(TOV.X – TOV.Y))/2 + 
(abs(STL.X – STL.Y) + abs(BLK.X – BLK.Y))/2 + 
abs(TRB.X – TRB.Y) + 
abs(PF.X – PF.Y)/4 +
(abs(Height.X – Height.Y) + abs(Weight.X – Weight.Y))/2 +
abs(Age.X – Age.Y) + 
(abs(SOS.X – SOS.Y) + abs(SRS.X – SRS.Y))/2)   
) / 7.25

Crow · Post by **Crow** » Mon May 20, 2013 8:17 pm

Hi, thanks for sharing the models / datasets. I will look at them more closely later. I will say right now though that I think offensive winshares are more likely to be reasonably good at predicting future NBA performance while I do not think that defensive winshares will be hardly useful at all, as defensive winshares is a very crude metric that heavily uses team data (college and pro) and is not very able to capture the full effects of individual play (it captures steals & blocks but not any accompanying potentially helpful or harmful effects of players who try for and get a high rate of steals & blocks or an average or low rate). Might you be interested and willing to re-model player similars for just offense? I would think the similars found would be more accurate comparisons / predictions for that side of the court.

VJL · Post by **VJL** » Mon May 20, 2013 8:28 pm

I will say right now though that I think offensive winshares are more likely to be reasonably good at predicting future NBA performance while I do not think that defensive winshares will be hardly useful at all...

I am not using Win Shares, offensive or defensive, as a predictor. Win Shares (at the NBA level) are only used as the dependent variable. It seems to be the best readily available measure of "goodness" that goes back into the 90s. I have also played with a RAPM "wins above replacement" but the results aren't terribly different.

Might you be interested and willing to re-model player similars for just offense? I would think the similars found would be more accurate comparisons / predictions for that side of the court.

I could try it... but I am skeptical that removing steals, blocks, and rebounds will improve the comparison.

Crow · Post by **Crow** » Mon May 20, 2013 8:40 pm

Thanks for the clarification.

Don't feel obliged to try anything you don't think useful. It was just a suggestion based on what I interpreted / thought might be worth saying.

VJL · Post by **VJL** » Mon May 20, 2013 8:52 pm

Don't feel obliged to try anything you don't think useful. It was just a suggestion based on what I interpreted / thought might be worth saying.

I appreciate the comments. I apologize if I came off as dismissive, I guess I assumed there was a misunderstanding in how both models worked (due to a poor explanation on my part).

Edit:

Expanding on this... It seemed like you thought defensive winshares is somehow involved in the comparison metric which isn't the case. If you are actually suggesting removing defensive statistics like blocks and steals I would not expect it to improve things. Especially since the correlation between College and NBA statistics is much stronger for defensive statistics (blocks and rebounds in particular) than it is for offensive statistics (pretty much all of them except assists, but especially in the case of scoring efficiency) I think removing defensive stats would muddy things.

Crow · Post by **Crow** » Mon May 20, 2013 11:11 pm

No apology needed.

"It seemed like you thought defensive winshares is somehow involved in the comparison metric which isn't the case."

In my quick read I wondered; but you have clarified.

"If you are actually suggesting removing defensive statistics like blocks and steals I would not expect it to improve things."

I wouldn't suggest that. I was just noting, again, some of the big weaknesses of "Defensive Rating".

Crow · Post by **Crow** » Tue May 21, 2013 11:03 pm

In re-reading what you initially wrote I see that I was too hasty in trying to give you an opening comment. My apologies for causing the first step of discussion to be cloudy with respect to the role of win shares here.

I still think though that if one uses “win shares”, generically as you did (I believe) or as B-R WinShares following Oliver’s formula, then one is estimating total player impact and my main and basic point is that I don’t think boxscore stats alone do a very good of estimating defensive impact because it misses individual impact on shot defense, the largest / most important part of defense. Thus, my suggestion that the model does a better job as specified with offensive predictions than defensive impacts. That was the point I perhaps rushed to make and not so cleanly.

As for the data, here are a few observations:

In 2012 Anthony Davis is estimated to have almost as much chance of being a star as all the other members of that entry class combined. Only 6 other players given more than a 5% chance to be a star.

John Wall given tied for the highest chance of being a star in 2010 class but less than 40% of the chance given to Davis.

Among first rounders of the last 3 drafts:

Eric Bledsoe with the highest chance given to be a bust (22%) then Avery Bradley, Brackins and Rivers. Austin Rivers given 0% chance of being a star. Harrison Barnes and Drummond did not get strong projections. Kawhi Leonard with the highest chance to be precisely a starter in this 3 year period but only a 5% to be a star, though that is apparently what Popovich expects. Derrick Williams was listed 4th highest. Faried was the draft pick taken latest who had an estimated 40% or better chance to be a starter. Udoh with the second lowest chance to be a star among top 10 picks.

From the 2013 projections:

Nate Wolters with the 5th best chance given to be a star. Shabazz Muhammad given 0% to be a star and only a 5% chance to be a starter.

VJL · Post by **VJL** » Wed May 22, 2013 2:05 am

I still think though that if one uses “win shares”, generically as you did (I believe) or as B-R WinShares following Oliver’s formula, then one is estimating total player impact and my main and basic point is that I don’t think boxscore stats alone do a very good of estimating defensive impact because it misses individual impact on shot defense, the largest / most important part of defense. Thus, my suggestion that the model does a better job as specified with offensive predictions than defensive impacts. That was the point I perhaps rushed to make and not so cleanly.

Yes. Whatever errors there are in the "valuation" metric will naturally hurt the validity of the model. Winshares is a flawed measure of individual defense in particular, so the model may not predict defensive impact as well. Basically... it is predicting win shares not "true" value. Separating offense and defense would be difficult however. Since it is looking at total win shares across a season not a rate (like WS48) one implicit factor is "minutes played", which will be a function of both offense and defense and will impact total defensive and offensive winshares simultaneously. I have also made a version of this model using RAPM-WARP. The results were largely the same, but obviously with some differences. Of the 2013 class the most notable difference was that it didn't like Olynyk nearly as much... that would be consistent with the idea that WS doesn't appreciate defense as well. I'm using the WS version because I have WS data back to the 80s, but not RAPM data. I decided the benefits of a bigger dataset outweigh whatever accuracy RAPM adds over WS.

In 2012 Anthony Davis is estimated to have almost as much chance of being a star as all the other members of that entry class combined. Only 6 other players given more than a 5% chance to be a star.

John Wall given tied for the highest chance of being a star in 2010 class but less than 40% of the chance given to Davis.

Not sure if you are pointing these out because you find them dubious or not... but they seem pretty appropriate to me. Davis is the second highest scoring player in the past 30 years on the "expected win peak" measure. Only Shaq beat him with a 18.9. Shaq also had an even higher 75% chance of being a star... Interestingly, if he had left after his sophomore season it would have been over a 90% chance (he was that good at 19).

I also like that Kawhi and Faried are really high starter likelihood but low star. It is consistent with the fact that they performed well collegiality in statistics that are highly conserved between leagues. They are "we know who they are" kind of guys. Kahwi has a solid chance of eclipsing 10 WS in a season before 26, but only because he unexpectedly developed a shot, and I wouldn't bet on it anyway.

It didn't like Barnes, but Barnes also had a pretty pedestrian season by WS.

Missed on Drummond to an extent, but the model liked him a lot more than I personally did before the last draft.

Definitely missed on Bledsoe and Bradley... going back we could add Westbrook and Deron Williams to that list. It also hated Nash (1.9 win peak expected.) The model completely whiffs on future star point guards occasionally... however the point guards it likes have an excellent track record:

Only 3 misses in the top 30 (Will Avery, Tony Delk, and Lee Mayberry... and then Jay Williams, it even hits several future stars who hung around until late in the draft.

The most consistent pattern of misses I have found is shot-happy tweener forwards. Beasley, Derrick Williams, Glen Robinson, Donyell Marshall, Antoine Walker and probably some others all scored extremely well on the model and then failed to different extents. It is true that NBA teams overrated all of these guys as well, but not to the same extent. This would be a good reason to take Anthony Bennett's rating with a grain of salt or two.

Crow · Post by **Crow** » Wed May 22, 2013 8:05 pm

"Not sure if you are pointing these out because you find them dubious or not... but they seem pretty appropriate to me."

And to me too in those instances. I was just pointing out a few things to give examples of what people can see and consider in your data. I wouldn't use dubious to describe my reaction to some of the other data Something less judgmental probably. No method is going to predicting every player right or consistent with other measures.

Thanks for sharing the data again and the additional analysis. I intend to spend more time looking at your models as time permits.

VJL · Post by **VJL** » Wed May 22, 2013 9:56 pm

Thanks for sharing the data again and the additional analysis. I intend to spend more time looking at your models as time permits.

Much appreciated.

No method is going to predicting every player right or consistent with other measures.

Definitely. I am especially interested in finding patterns in the error. The Tweener shooting-forward is pretty apparent... It also seems to underestimate super-athletic guards, especially in the modern age. That isn't too surprising given the way the current rules favor them.

Jacob Frankel · Post by **Jacob Frankel** » Fri May 24, 2013 5:13 am

If anybody is interested I built a regression using college advanced statistics from KenPom.com, combine measurements, and RAPM. Top 10:

Seems a little off at first glance (Ray McCallum?). I'm going to continue to refine it and full results will be published fairly soon.

bchaikin · Post by **bchaikin** » Fri May 24, 2013 6:21 am

I have also made a version of this model using RAPM-WARP. The results were largely the same, but obviously with some differences. Of the 2013 class the most notable difference was that it didn't like Olynyk nearly as much

does your model take into account or adjust for level of competition? if so how?...

i ask as your comparables for 2013 kelly olynyk (gonzaga) are 2001 michael bradley (villanova), 1987 horace grant (clemson), 1990 alaa abdelnaby (duke), 1991 rich king (nebraska), and 2001 troy murphy (notre dame). olynyk played in the west coast conference, and gonzaga played just 7 of it's 32 games against top level competition (like the ACC, SEC, Big 10, Big 12, Pac 12, Big East), whereas these other players likely played most of their games against teams from those top conferences (i'm guessing)...

VJL · Post by **VJL** » Fri May 24, 2013 12:23 pm

does your model take into account or adjust for level of competition? if so how?...

For the expected wins and 'star likelihood' models SOS and SRS are simply included as predictors. I don't think this is ultimately the best way to do things, since SOS does not impact all skillsets equally (it seems to really effect scoring, but have little or not effect on some things like stealing and rebounding). However... that is how I managed it, and looking historically I don't think there is a pattern of smaller school guys popping up too high or getting unfairly tamped down.

For the comparison model, I posted the formula above. SOS and SRS combined are given the same weight as 'scoring', 'rebounding', 'assists' + 'turnovers'... So you can get comparables from very different settings, but they need to be particularly similar across the box scores to beat out guys from the same SOS. In the case of Kelly Olynyk, I am guess Gonzaga's high SRS pulled him away from other guys who played against weak competition, making it easy to comp with big school players.

VJL · Post by **VJL** » Fri May 24, 2013 12:28 pm

If anybody is interested I built a regression using college advanced statistics from KenPom.com, combine measurements, and RAPM.

How far back does the data go?

I am interested what you have found with the combine data. I was using that as well for an earlier model, but had to discard it when I wanted to go back to the 80s and 90s. I tried plugging it back in post-hoc for the post-2000 set, but at that point it didn't improve prediction at all. When it was working... it looked like no-step vert and standing reach were by far the strongest predictors.

Jacob Frankel · Post by **Jacob Frankel** » Fri May 24, 2013 5:29 pm

VJL wrote:
How far back does the data go?

I am interested what you have found with the combine data. I was using that as well for an earlier model, but had to discard it when I wanted to go back to the 80s and 90s. I tried plugging it back in post-hoc for the post-2000 set, but at that point it didn't improve prediction at all. When it was working... it looked like no-step vert and standing reach were by far the strongest predictors.

The data goes back to 2004, which isn't as far as I like, but is as far back as KenPom's numbers go. Is that too small of a dataset? I've been using DraftExpress for combine measurements. In our regression, the combine numbers did make a significant impact. Another thing that made a large impact was separating into two different groups by height and running separate regressions. The bigs performed much better than the smalls.

APBRmetrics

Draft projection models

Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models

Re: Draft projection models