Fixing the Cavs with A.I
Fixing the Cavs with A.I
Hi All,
In this post, I put the Cavs through the machine learning ringer and find out:
What is the Cavs best overall line-up
Best line-up without Lebron and what they do well
How to get the most out of Isaiah Thomas line-ups
How to best play without Kevin Love
I also find an interesting reason for their poor defensive play and show video evidence of the Cavs poor defense in action.
Check it out at:
http://www.zigzaganalytics.com/home/fix ... vs-with-ai
			
			
									
						
										
						In this post, I put the Cavs through the machine learning ringer and find out:
What is the Cavs best overall line-up
Best line-up without Lebron and what they do well
How to get the most out of Isaiah Thomas line-ups
How to best play without Kevin Love
I also find an interesting reason for their poor defensive play and show video evidence of the Cavs poor defense in action.
Check it out at:
http://www.zigzaganalytics.com/home/fix ... vs-with-ai
Re: Fixing the Cavs with A.I
Highlighted lineups with 3-5 of Wade, Korver, Green,James and Frye doing great. Highlighted with less, doing poorly. Pretty simple but adding the stat WHYs helps.
Have you approached teams with your technique / data? Team analysts should either be doing similar or reach out to you.
Would you go on podcasts? That would increase your visibility. Which podcaster is paying attention and going to make an offer? Zach Lowe would give biggest visibility. Dunc'd On, The Ringer NBA pod, etc.
			
			
									
						
										
						Have you approached teams with your technique / data? Team analysts should either be doing similar or reach out to you.
Would you go on podcasts? That would increase your visibility. Which podcaster is paying attention and going to make an offer? Zach Lowe would give biggest visibility. Dunc'd On, The Ringer NBA pod, etc.
Re: Fixing the Cavs with A.I
Thanks crow....i have had some good conversations with 2 teams over the past few months about it. 1 was more trying to get a better understanding of the inner workings of the algorithm and understand how it can be applied and understood  by coaches, but it never went any further than a few weeks of enquisative emails back and forth. The other team asked me to run it specifically for them and give them findings. I ran it a couple of times and gave a specific lineup that I felt was under-utilised at the time as well as a bench player that needed more clock.  It had really good results when on the court. But again nothing further than the initial bit of help. 
I send posts directly to contacts at probably half the teams but it is pretty hard going. I try to innovate as much as I can and show Basketball acumen to go with the technical detail, it gets harder and harder to innovate with public data as I have to create a lot of it myself which is the the most time consuming part.
Yeah sure I’d love to speak on a podcast, but it’s pretty hard again to get that visibility.
			
			
									
						
										
						I send posts directly to contacts at probably half the teams but it is pretty hard going. I try to innovate as much as I can and show Basketball acumen to go with the technical detail, it gets harder and harder to innovate with public data as I have to create a lot of it myself which is the the most time consuming part.
Yeah sure I’d love to speak on a podcast, but it’s pretty hard again to get that visibility.
Re: Fixing the Cavs with A.I
Hope somebody steps up. A waiting game for wise buyer.
Have you considered creating / using adjusted plus minus data to deepen the research & findings?
What else could machine learning do to enhance understanding of RPM estimates? What about identifying "contexts" and / or RPM player types and player type lineup sequences?
			
			
									
						
										
						Have you considered creating / using adjusted plus minus data to deepen the research & findings?
What else could machine learning do to enhance understanding of RPM estimates? What about identifying "contexts" and / or RPM player types and player type lineup sequences?
Re: Fixing the Cavs with A.I
I haven’t looked at rpm at this stage, it’s purely plus minus. I’ll have a think about how I can incorporate it. It makes sense from the perspective of comparing the quality of lineups that play against each other at any given time, to then potentially cut some noise out of results.
Getting down to an individual player level is something I’d also like to get to, that is going to be a lot of data to collect.
I’ll do 1-2 more of these types of posts this season (and maybe 1-2 come playoff time) before moving onto some new ideas I’d like to get to before the season finishes. Might do some teams on the playoff bubble next for the team scouts.
For reference, the cavs post from start to posting on the website took approx. 9-11 hours in total. Done over 2 days.
			
			
									
						
										
						Getting down to an individual player level is something I’d also like to get to, that is going to be a lot of data to collect.
I’ll do 1-2 more of these types of posts this season (and maybe 1-2 come playoff time) before moving onto some new ideas I’d like to get to before the season finishes. Might do some teams on the playoff bubble next for the team scouts.
For reference, the cavs post from start to posting on the website took approx. 9-11 hours in total. Done over 2 days.
- 
				independent_variable
- Posts: 2
- Joined: Fri Feb 02, 2018 12:34 am
Re: Fixing the Cavs with A.I
an interesting approach - i read your original post as well and have a few questions though..... 
- isn't +/- inherently a noisy metric? especially when using such small minute samples with lineups? For example, the 10 best, 5-man lineups belong to the following teams: MIL, NOP, WAS, IND, BOS, GSW, NOP, PHI, HOU, HOU. While these are decent playoff teams, I think you need to understand how NOISY 5-man lineup data can be. All of these examples have between 40 and 80 MP together, that's less than 2 NBA games!
- in your modeling, it looks like you're throwing every metric available against the wall and seeing what sticks with linear regression (which is how you get the weights), wouldn't this cause a couple of issues:
- there is almost certainly collinearity in NBA metrics, EFG, ORtg, for example. GLM models assume independence between the features
> Are you doing any sort of feature selection or PCA?
- Did you scale your features? I didn't read any mention of this. This is key in GLM modeling.
- I didn't see any mention of error metrics in your modeling, how well did your modeling fit to the data? How do I know that I can trust these relationships?
- Do you know that the performance of lineups is linear? Would this kind of problem be better addressed with a tree-based model? Without error metrics we can't be sure of this.
While I commend your approach so far, my experience with lineup data is simply that there isn't enough sample (i.e. if a certain 5 man lineup happens to be used against GSW or HOU heavily, they'll suffer), and so this might not be the best problem for Machine Learning. Additionally, I don't see a lot of rigor regarding your modeling work, perhaps its there and you chose not to write about it, but if you want teams / media to get interested in your work, you'll need to be able to say WHY your model is good. And from what you've written to-date, I'm not convinced of that.
Not trying to be unnecessarily critical, but I've interviewed with teams in the past and the code tests, sample works, etc that they put me through are much more intense than this. Only trying to help.
			
			
									
						
										
						- isn't +/- inherently a noisy metric? especially when using such small minute samples with lineups? For example, the 10 best, 5-man lineups belong to the following teams: MIL, NOP, WAS, IND, BOS, GSW, NOP, PHI, HOU, HOU. While these are decent playoff teams, I think you need to understand how NOISY 5-man lineup data can be. All of these examples have between 40 and 80 MP together, that's less than 2 NBA games!
- in your modeling, it looks like you're throwing every metric available against the wall and seeing what sticks with linear regression (which is how you get the weights), wouldn't this cause a couple of issues:
- there is almost certainly collinearity in NBA metrics, EFG, ORtg, for example. GLM models assume independence between the features
> Are you doing any sort of feature selection or PCA?
- Did you scale your features? I didn't read any mention of this. This is key in GLM modeling.
- I didn't see any mention of error metrics in your modeling, how well did your modeling fit to the data? How do I know that I can trust these relationships?
- Do you know that the performance of lineups is linear? Would this kind of problem be better addressed with a tree-based model? Without error metrics we can't be sure of this.
While I commend your approach so far, my experience with lineup data is simply that there isn't enough sample (i.e. if a certain 5 man lineup happens to be used against GSW or HOU heavily, they'll suffer), and so this might not be the best problem for Machine Learning. Additionally, I don't see a lot of rigor regarding your modeling work, perhaps its there and you chose not to write about it, but if you want teams / media to get interested in your work, you'll need to be able to say WHY your model is good. And from what you've written to-date, I'm not convinced of that.
Not trying to be unnecessarily critical, but I've interviewed with teams in the past and the code tests, sample works, etc that they put me through are much more intense than this. Only trying to help.
Re: Fixing the Cavs with A.I
Most teams have trios with over 1000 minutes, headed toward 1500 plus for full season. On average teams have 5 trios over 600 minutes, headed toward 1000 plus. Pairs and trios should probably get more attention. If you focused on perimeter trios and interior pairs you could mix n match. And I think perimeters and interiors have more identity than random trios. RAPM for sub-lineup groups is always appreciated.
With one exception the big minute trios for the Cavs on raw plus minus were neutral to negative. They needed to change via trades and / or change in coaching. Were they aware of and motivated by top trio data? I dunno. Detroit, Lakers, Utah have pretty bad top trio data. They aren't going anywhere til they fix it. Maybe the trades and a fresh look will have a chance. They may need more trades and / or new coaching. For Utah and Lakers the trades did not directly affect any of these top trios. For Detroit they all are no longer possible. Pretty big difference. The Lakers top trios involve Ball, KCP, Ingram and Lopez. Good chance 2 will be gone by fall. Maybe all of them eventually.
Charlotte has pretty good top trios. Every trio over 600 minutes has the same 5 players. The 5 man lineups does fine, gets played a lot, injuries permitting. Somehow in finishing other lineups with 3 or less of these guys they are getting weaker results. Everyone outside that 5 should be scrutinized, again.
http://bkref.com/tiny/2BsiT
			
			
									
						
										
						With one exception the big minute trios for the Cavs on raw plus minus were neutral to negative. They needed to change via trades and / or change in coaching. Were they aware of and motivated by top trio data? I dunno. Detroit, Lakers, Utah have pretty bad top trio data. They aren't going anywhere til they fix it. Maybe the trades and a fresh look will have a chance. They may need more trades and / or new coaching. For Utah and Lakers the trades did not directly affect any of these top trios. For Detroit they all are no longer possible. Pretty big difference. The Lakers top trios involve Ball, KCP, Ingram and Lopez. Good chance 2 will be gone by fall. Maybe all of them eventually.
Charlotte has pretty good top trios. Every trio over 600 minutes has the same 5 players. The 5 man lineups does fine, gets played a lot, injuries permitting. Somehow in finishing other lineups with 3 or less of these guys they are getting weaker results. Everyone outside that 5 should be scrutinized, again.
http://bkref.com/tiny/2BsiT
Re: Fixing the Cavs with A.I
Thanks for the comments independent_variable, and don't worry i don't take any feedback to heart or the wrong way. With any of my posts/projects I try and solve questions & problems in a way that I haven't seen publicly before and there are certainly always ways to improve, so thanks for the questions. Helps me to learn and potentially apply some improvements.  I'll try and answer all of your questions/queries.....
I 100% agree +/- in noisy, especially with sample sizes on a 5 man unit being on court for just 30 seconds for example. I did some work around all 5 man lineups this season (per 4.5 minutes) and ran them through my algorithm to give a prediction on the +/- per 4.5 minutes. The sweet spot I found where the accuracy of the algorithm was cutting out noise pretty well was anything over 1 minute of court time together. So that is the blanket rule I apply to any 5 man lineup,i cut anything where they are on court for less than 1 minute in a game. No there are obviously always exceptions to this rule, so when I'm focusing on a certain 5 man lineup, I again make get the predictions from the algorithm and if any are way out of wack, I exclude them.
I also agree that the quality a lineup plays against is something I should start looking at factoring in as Crow suggested in an earlier reply. Something like RPM for example. However regardless of using straight +/- or something more accurate like RPM....trying to predict the outcome per 4.5mins isn't the sole purpose of what I'm trying to work out, its the patterns of what they do well in good performances vs what they do poorly in poor performances that I'm trying to discover.
Yes I'm using a lot of statistics as input to the algorithm, they are a mix of basic box score, advanced metrics and specific play by play stats. I don't want to assume I know what's important in winning or losing for a certain lineup hence I throw a fair bit at the algorithm.
In regards to independence between variables, yes when coming up with the final list of statistics I would feed the algorithm, I used a pearsons correlation coefficient to weed out any stats that heavily correlated. eg Assisted field goals vs unassisted field goals. There is a direct correlation. I don't use stats such as ORTG
When preparing the modelling for training I use the standardscaler() from Scikit learn.
For feature selection i use RFE.
For checking accuracy of the model, i used mean squared error. I also run a learning curve function, especially for the high level team scout as it gives me a good indication at what point in the season the algorithm gets a handle on teams. Typically what I have found is it's around the 15-20 game mark. Post that point, teams are pretty predictable and "are who they are" unless they make a trade or get a significant injury.
For the team level scouts, typically for any team the algorithm had a means squared error of between 3 and 8 points. Each team is different, some being more predictable than others.
I also ensure over fitting is tackled using train and test methods and k-folds. So the algorithm isn't getting accurate results because of over fit.
For the cavs scout which i have recently posted, this looks at specific 5 man lineups for 4.5 minutes....I looked at 5 lineups. The most accurate lineup had a MSE of 1.3 points, the worst being 4.4 points. So at best in 4.5minutes it's predicting a cavs lineup within 1.3 points of the actual +/-.
When choosing which algorithm I would ultimately use, I ran a spot-check on 10 different algorithms, some being non-linear algorithms such as K-Nearest Neighbour, CART (Class & Regression Tree), SVM.
I could go back and re-tune some of these non-linear algorithms, run them on the data set and see what kind of results i get.
It's very hard in a blog to explain all of the inner working of applying a machine learning method (running the algorithm is the easy part), all the pre-work is where 95% of the effort is. I don't want my posts to be long winded and 2 nerdy, I try and find the balance between nerdy and basketball purist. I want the "average joe" to be able to understand this, because ultimately it's players, coaches and front offices who will choose whether to apply "analytics" or not and I want to not only make a point about what the data is telling us but also have time to put it into basketball language where it can be then applied.
I also don't want readers eyes to glaze over with a 5000 word post that they will probably lose attention on, click off the page and not the get ultimate point i'm trying to make. So finding that balance is hard, and is the reason i have left off a lot of inner working of pre-work that is done in the machine learning process.
As I said, I certainly am not saying what I am dong with these scouts is 100% right, but I think it's certainly pointing us in the right direction of what we should be looking at with teams & lineups and also what we can ignore. Teams can get so overwhelmed with what to look at, what I hope these scouts can do is point them in the right direction for them to go and taker a closer look at. I apply what I think they should do, or what film is showing me but this may not fit with the teams/coaches strategy.
			
			
									
						
										
						I 100% agree +/- in noisy, especially with sample sizes on a 5 man unit being on court for just 30 seconds for example. I did some work around all 5 man lineups this season (per 4.5 minutes) and ran them through my algorithm to give a prediction on the +/- per 4.5 minutes. The sweet spot I found where the accuracy of the algorithm was cutting out noise pretty well was anything over 1 minute of court time together. So that is the blanket rule I apply to any 5 man lineup,i cut anything where they are on court for less than 1 minute in a game. No there are obviously always exceptions to this rule, so when I'm focusing on a certain 5 man lineup, I again make get the predictions from the algorithm and if any are way out of wack, I exclude them.
I also agree that the quality a lineup plays against is something I should start looking at factoring in as Crow suggested in an earlier reply. Something like RPM for example. However regardless of using straight +/- or something more accurate like RPM....trying to predict the outcome per 4.5mins isn't the sole purpose of what I'm trying to work out, its the patterns of what they do well in good performances vs what they do poorly in poor performances that I'm trying to discover.
Yes I'm using a lot of statistics as input to the algorithm, they are a mix of basic box score, advanced metrics and specific play by play stats. I don't want to assume I know what's important in winning or losing for a certain lineup hence I throw a fair bit at the algorithm.
In regards to independence between variables, yes when coming up with the final list of statistics I would feed the algorithm, I used a pearsons correlation coefficient to weed out any stats that heavily correlated. eg Assisted field goals vs unassisted field goals. There is a direct correlation. I don't use stats such as ORTG
When preparing the modelling for training I use the standardscaler() from Scikit learn.
For feature selection i use RFE.
For checking accuracy of the model, i used mean squared error. I also run a learning curve function, especially for the high level team scout as it gives me a good indication at what point in the season the algorithm gets a handle on teams. Typically what I have found is it's around the 15-20 game mark. Post that point, teams are pretty predictable and "are who they are" unless they make a trade or get a significant injury.
For the team level scouts, typically for any team the algorithm had a means squared error of between 3 and 8 points. Each team is different, some being more predictable than others.
I also ensure over fitting is tackled using train and test methods and k-folds. So the algorithm isn't getting accurate results because of over fit.
For the cavs scout which i have recently posted, this looks at specific 5 man lineups for 4.5 minutes....I looked at 5 lineups. The most accurate lineup had a MSE of 1.3 points, the worst being 4.4 points. So at best in 4.5minutes it's predicting a cavs lineup within 1.3 points of the actual +/-.
When choosing which algorithm I would ultimately use, I ran a spot-check on 10 different algorithms, some being non-linear algorithms such as K-Nearest Neighbour, CART (Class & Regression Tree), SVM.
I could go back and re-tune some of these non-linear algorithms, run them on the data set and see what kind of results i get.
It's very hard in a blog to explain all of the inner working of applying a machine learning method (running the algorithm is the easy part), all the pre-work is where 95% of the effort is. I don't want my posts to be long winded and 2 nerdy, I try and find the balance between nerdy and basketball purist. I want the "average joe" to be able to understand this, because ultimately it's players, coaches and front offices who will choose whether to apply "analytics" or not and I want to not only make a point about what the data is telling us but also have time to put it into basketball language where it can be then applied.
I also don't want readers eyes to glaze over with a 5000 word post that they will probably lose attention on, click off the page and not the get ultimate point i'm trying to make. So finding that balance is hard, and is the reason i have left off a lot of inner working of pre-work that is done in the machine learning process.
As I said, I certainly am not saying what I am dong with these scouts is 100% right, but I think it's certainly pointing us in the right direction of what we should be looking at with teams & lineups and also what we can ignore. Teams can get so overwhelmed with what to look at, what I hope these scouts can do is point them in the right direction for them to go and taker a closer look at. I apply what I think they should do, or what film is showing me but this may not fit with the teams/coaches strategy.
Re: Fixing the Cavs with A.I
Would you consider doing the ML test for the Bucks? There is a handful or two of folks I have chatted with recently on twitter interested in understanding / fixing them. This would be food for further thought.
			
			
									
						
										
						Re: Fixing the Cavs with A.I
Yeah sure, all things going well, I’ll aim to have it done by the end of this week.
			
			
									
						
										
						Re: Fixing the Cavs with A.I
Thanks. Look forward to it.
			
			
									
						
										
						Re: Fixing the Cavs with A.I
Hey Crow....sorry for the delay on the bucks scout. I finally have got around to having a free day and getting it done.
Just need to put it into a post and will try look for some video to match some of the findings. Post should be up in the next 24 hours.
			
			
									
						
										
						Just need to put it into a post and will try look for some video to match some of the findings. Post should be up in the next 24 hours.
Re: Fixing the Cavs with A.I
It is still relevant for a few more days. You get to it when you can. Will check for it.