My model is predictive based solely on actual on court performance on purpose. I'll run every college guy (that way I KNOW it is working like it should - I'm not limiting my data set to make the results "look" better so I won't have guys that won't be drafted rank high) - many of which I cannot trust the height numbers. I feel as soon as I try to add other none performance factors (height, standing reach, consensus on draft position, rank coming out of high school, etc) - I'm getting away from my point that it is possible to project future performance from college to pro based solely on college production. Before I started really delving into this - I was told that college production (one guy that told me was an actual NBA owner, simple guess who that is) was pretty much worthless. I want to show that it very well may be just the opposite - it might be the most important factor to consider, relative to age of course.colts18 wrote:I don't get why you aren't including height in your model. Your model is supposed to be predictive, so adding height should add prediction.
I expect I can eventually make the model better with added complexity beyond performance (something I would obviously do if I had 40 hours a week to work solely on this project) - but there is a TON of testing I'd have to do before I ever got comfortable delving into none performance factors & how much they'd affect results. But, yes, if I were in a position & had the time to test every factor I could come up with - I'd do whatever I could to smooth out the outliers (largest differences between projection & actual) that pop up - without creating new ones.
Heck - I haven't even come close to finishing the past 19 seasons of college ratings to FULLY test the past results of what I do now - let alone try to work none performance factors in. I feel I need to try to perfect (the best I can) one step at a time. Performance base projection based on historical precedent (across all statistical rating subsets) is the 1st step. I believe similarity scores based on the rating statistical subsets with the massive data set I have might be step #2 to improve step 1 (not certain) - but that'd be performance based also. Step 3 would then be introducing other none performance factors that seem to address more where the model misses.
If step one is "better" than actual draft history - steps 2 & possibly three would be icing. I THINK step 1 will outperform actual draft history on it's own when I get to test it.
But, all that being said - my & other draft models aren't ever going to be close to perfect - BUT in combination with good scouting & due diligence in learning about possible draftees - it can really help steer a team away from who is projected too high, & who will be the bargains. I think it could actually help get PRODUCTIVE players later in a draft, & especially help stock a D league affiliate. Scouts can't look at (let alone properly evaluate) 1000s of prospects every year, but properly put together models can & narrow that list greatly to help the scouts.
All the non performance data in the world was never going to get my model to spit out that Andrew Wiggins was a worthy #1 pick, let alone a future star. MAYBE it would have moved him up from #17 to, say, #10 - which means if a team was ever listening to me they wouldn't draft him anyway.