APBRmetrics

Posted: **Sun Dec 27, 2020 3:04 am**

I'm working on some of my NBA models and decided to revisit the garbage time weighting component for each possession. I had previously run a logistic regression model to estimate a team's chances of winning the game based on time remaining, point differential, and the number of starters on the floor (and some interaction terms) as my way of assigning numeric values to each play. As you can see from the ROC curve below, the model performs well, but there's a key problem with this approach. The iid assumptions that the logistic regression model makes are violated in practice. When teams are ahead, they don't continue to press their advantage by running at full strength, thus violating the iid assumption. This renders the model's win percentage predictions invalid. It also just doesn't align with how coaches make decisions about who is on the floor at any given time. I decided to rework this analysis and use something that aligns more with actual coaching decisions. I took my PBP data from the past 7 seasons and plotted the number of starters on the floor against the point differential for each quarter. The results were pretty interesting, so I thought I would share them in case anyone here finds this interesting or useful. Here are the results below, as well as the ROC plot of my previous logistic regression model's performance.

Posted: **Mon Dec 28, 2020 2:34 pm**

Would you mind telling us what ROC and (lowercase) iid are ?
What do the bars, stripes, and dots represent ?

Posted: **Mon Dec 28, 2020 7:22 pm**

Reicever operating (characteristic) curve

identically and indepdendently distributed.

Bars are percentiles (at 25-75 I would imagine). Dots are outliers. Horizontal line in the middle of the box is the mean

I am ok with the statistics lingo. I have a problem with the shop talk: "garbage time weighting component"

I have no idea what that means

Posted: **Mon Dec 28, 2020 7:28 pm**

rainmantrail wrote: ↑Sun Dec 27, 2020 3:04 am I'm working on some of my NBA models and decided to revisit the garbage time weighting component for each possession. I had previously run a logistic regression model to estimate a team's chances of winning the game based on time remaining, point differential, and the number of starters on the floor (and some interaction terms) as my way of assigning numeric values to each play. As you can see from the ROC curve below, the model performs well, but there's a key problem with this approach. The iid assumptions that the logistic regression model makes are violated in practice. When teams are ahead, they don't continue to press their advantage by running at full strength, thus violating the iid assumption. This renders the model's win percentage predictions invalid. It also just doesn't align with how coaches make decisions about who is on the floor at any given time. I decided to rework this analysis and use something that aligns more with actual coaching decisions. I took my PBP data from the past 7 seasons and plotted the number of starters on the floor against the point differential for each quarter. The results were pretty interesting, so I thought I would share them in case anyone here finds this interesting or useful. Here are the results below, as well as the ROC plot of my previous logistic regression model's performance.

I didnt understand some parts of your description (as it is often the case when trying to explain something in a discussion board).

Questions: How did you evaluate the ROC exactly?
Always my problem with live prediction models (i.e. time remaining based models) is how do you evaluate performance between two models that give different predictions. Say at time t, one model gives 78% win chance and the other 67% win chance. I would really like to know about this.

In your box plots. What is on the y-axis? Number of starters on the floor?. What does this mean exactly? how many players that started the game are still on the floor at any given time? I dont think I understand this because you are quoting more than 5 players. Can you explain?

Posted: **Wed Dec 30, 2020 11:48 am**

vzografos wrote: ↑Mon Dec 28, 2020 7:28 pm
rainmantrail wrote: ↑Sun Dec 27, 2020 3:04 am I'm working on some of my NBA models and decided to revisit the garbage time weighting component for each possession. I had previously run a logistic regression model to estimate a team's chances of winning the game based on time remaining, point differential, and the number of starters on the floor (and some interaction terms) as my way of assigning numeric values to each play. As you can see from the ROC curve below, the model performs well, but there's a key problem with this approach. The iid assumptions that the logistic regression model makes are violated in practice. When teams are ahead, they don't continue to press their advantage by running at full strength, thus violating the iid assumption. This renders the model's win percentage predictions invalid. It also just doesn't align with how coaches make decisions about who is on the floor at any given time. I decided to rework this analysis and use something that aligns more with actual coaching decisions. I took my PBP data from the past 7 seasons and plotted the number of starters on the floor against the point differential for each quarter. The results were pretty interesting, so I thought I would share them in case anyone here finds this interesting or useful. Here are the results below, as well as the ROC plot of my previous logistic regression model's performance.

I didnt understand some parts of your description (as it is often the case when trying to explain something in a discussion board).

Questions: How did you evaluate the ROC exactly?
Always my problem with live prediction models (i.e. time remaining based models) is how do you evaluate performance between two models that give different predictions. Say at time t, one model gives 78% win chance and the other 67% win chance. I would really like to know about this.

In your box plots. What is on the y-axis? Number of starters on the floor?. What does this mean exactly? how many players that started the game are still on the floor at any given time? I dont think I understand this because you are quoting more than 5 players. Can you explain?

Apologies for the not-so-well-explained post. Hopefully this clarifies what I'm working on a bit better.

My goal is to weight possessions in my play-by-play database by how important they are. At the end of games, we often see both teams running with their backups on the court while the starters sit on the bench when the game is essentially already over (e.g., up by 27 with 2:30 remaining in the 4th quarter). I wanted a way to determine how "important" each possession is. Initially, my approach was to build a logistic regression model that would yield the probability of a team winning the game based on the current state of the game (e.g., up by 17 with 7:00 left in the 3rd quarter and each team has 3 of their starters on the court = 86% chance of winning, or whatever the number is). I was able to build this model pretty easily, and the outputs made sense from a probability standpoint. I trained the model on a subset of the data and tested the results on a held out testing dataset. I created the ROC curve by evaluating the model's predicted winners against the actual outcomes of that game. The input variables to that model were 'Minutes_Remaining', 'Score_Differential', and 'Starters_on_Court', with the 'Starters_on_Court' being the total number of starters from both teams who were on the court for a given possession (the min of which is 0, the max of which would be 10).

This approach "worked" somewhat, but there is a fundamental flaw with it. Logistic Regression models assume that each observation in the dataset (each possession in this case) is independent and identically distributed (iid). The independence assumption is clearly violated since there is a lot of overlap from possessions within the same game, but this violation isn't really all that concerning, as there is enough variation across a full season that it mostly works itself out. However, the assumption of the possessions being identically distributed is a huge problem. The reason this is problematic is because teams make coaching decisions based on the state of the game and adjust accordingly. In other words, when the logistic regression model projects something as an 80% probability of winning, what it is actually saying is "if the lineups don't change, then team X has an 80% probability of winning". But as soon as substitutions are made, and a team puts their entire starting lineup back on the floor, then the model's prior probability of winning is no longer valid. Same thing in the opposite direction where if they had starters on the floor and then decided to sit them all. The model would change. The other thing that I noticed is that coaching decisions don't even remotely follow these likelihoods of winning at any given point throughout the game. In the 2nd and 3rd quarters particularly, teams will often run at full strength even when the point differential is very large (for example, just this week, Dallas was still running Luka Doncic even though they were up by 50 points in the 2nd quarter).

What I really want to accomplish is to be able to flag "garbage time" possessions and to weight those less than possessions where each team is playing to win, and doing so at full strength. I'm defining "garbage time" as the possessions in a game where the game is effectively over (say 3 minutes remaining and up by 32 points), and teams are running with their weakest lineups in an attempt to get rookies and scrubs some experience. I wanted my weights to follow how coaches actually make these lineup decisions rather than using the probability of winning estimates that are output by my logistic regression model. That's why I decided to look at how many starters were on the court in different scenarios. I also looked at the number of starters on the court with respect to time remaining and by point differentials, but surprisingly, the time remaining component didn't really affect these decisions nearly as much as I thought it would. Coaches generally don't "throw in the towel" until well into the 4th quarter, even if they're down by 30+ points, and the same is true with teams who have the lead.

The box plots I posted show how many combined starters from both teams (max of 10) are on the court vs how large the score differential is for each quarter. This gives me a pretty good idea of how coaches make decisions about their lineup strengths when their up by 5 points, 15 points, or 25 points in the 4th quarter. I can use this information to reweight each possession in my play-by-play database for when I'm doing RAPM (and similar) calculations. I can also give additional weights to possessions where the game is really close late in the 4th quarter or in OT if I want. This should help the RAPM calculations to overlook, or at least significantly devalue, possessions where Karl Anthony Towns is tearing it up against the Lakers practice squad while Lebron and AD are sitting on the bench because they're up by 36 points in the 4th quarter. It helps to keep the player's RAPM values more "honest", and is a better reflection of their true abilities.

Posted: **Wed Dec 30, 2020 10:41 pm**

rainmantrail wrote: ↑Wed Dec 30, 2020 11:48 am Initially, my approach was to build a logistic regression model that would yield the probability of a team winning the game based on the current state of the game (e.g., up by 17 with 7:00 left in the 3rd quarter and each team has 3 of their starters on the court = 86% chance of winning, or whatever the number is). I was able to build this model pretty easily, and the outputs made sense from a probability standpoint. I trained the model on a subset of the data and tested the results on a held out testing dataset. I created the ROC curve by evaluating the model's predicted winners against the actual outcomes of that game.

Yes but you see that's my question.
At any given time t your model predicts a win probability. Lets say your model is called Rainman_model and it predicts at time t=40 (assume time does not reset to 0 at every quarter) a probability that the home team wins of 67%.

Now, assume my model VZ_model predicts at t=40 a probability of 70%.

How can you evaluate these two models which is more correct at t?

Let's look a bit further as t increases

Rainman_model VZ_model
t=41 68% 74%
t=42 69% 79%
t=43 67% 70%
t=44 68% 76%
t=45 72% 78%
t=46 91% 85%
t=47 98% 92%
t=48 100% 100%

So obviously the Home team won. But which model is more accurate overall?

I am not even talking about training and testing right now. I am just talking how do you evaluate these two specific time series if you dont have ground truth data of the "correct" predictions at time t. I mean both models predict the outcome of the game equally accurately. But which is better?

Now assume a 3rd model, let's calle it Random_model that predicts as follows:

Rainman_model VZ_model Random_model
t=41 68% 74% 50%
t=42 69% 79% 56%
t=43 67% 70% 90%
t=44 68% 76% 64%
t=45 72% 78% 48%
t=46 91% 85% 72%
t=47 98% 92% 44%
t=48 100% 100% 100%

Now what is the answer?

Why I am asking this question. Your statement "the outputs made sense from a probability standpoint." is not obvious at all to me since I cannot imagine a simple method for evaluating the performance of each of these temporal predictors at time t or even overall, since they all correctly predict the game.

I would be very interested in learning how you evaluated your method because if you did what I think you did with "ROC curve by evaluating the model's predicted winners against the actual outcomes of that game. " and you only looked at the outcome at t=48 then I am not sure that evaluation is so meaningful.

Posted: **Wed Dec 30, 2020 10:43 pm**

BTW, have a look at this guy

http://stats.inpredictable.com/nba/wpBox_live.php

Posted: **Thu Dec 31, 2020 2:33 am**

vzografos wrote: ↑Wed Dec 30, 2020 10:41 pm
rainmantrail wrote: ↑Wed Dec 30, 2020 11:48 am Initially, my approach was to build a logistic regression model that would yield the probability of a team winning the game based on the current state of the game (e.g., up by 17 with 7:00 left in the 3rd quarter and each team has 3 of their starters on the court = 86% chance of winning, or whatever the number is). I was able to build this model pretty easily, and the outputs made sense from a probability standpoint. I trained the model on a subset of the data and tested the results on a held out testing dataset. I created the ROC curve by evaluating the model's predicted winners against the actual outcomes of that game.
Yes but you see that's my question.
At any given time t your model predicts a win probability. Lets say your model is called Rainman_model and it predicts at time t=40 (assume time does not reset to 0 at every quarter) a probability that the home team wins of 67%.

Now, assume my model VZ_model predicts at t=40 a probability of 70%.

How can you evaluate these two models which is more correct at t?

Let's look a bit further as t increases

Rainman_model VZ_model
t=41 68% 74%
t=42 69% 79%
t=43 67% 70%
t=44 68% 76%
t=45 72% 78%
t=46 91% 85%
t=47 98% 92%
t=48 100% 100%

So obviously the Home team won. But which model is more accurate overall?

I am not even talking about training and testing right now. I am just talking how do you evaluate these two specific time series if you dont have ground truth data of the "correct" predictions at time t. I mean both models predict the outcome of the game equally accurately. But which is better?

Now assume a 3rd model, let's calle it Random_model that predicts as follows:

Rainman_model VZ_model Random_model
t=41 68% 74% 50%
t=42 69% 79% 56%
t=43 67% 70% 90%
t=44 68% 76% 64%
t=45 72% 78% 48%
t=46 91% 85% 72%
t=47 98% 92% 44%
t=48 100% 100% 100%

Now what is the answer?

Why I am asking this question. Your statement "the outputs made sense from a probability standpoint." is not obvious at all to me since I cannot imagine a simple method for evaluating the performance of each of these temporal predictors at time t or even overall, since they all correctly predict the game.

I would be very interested in learning how you evaluated your method because if you did what I think you did with "ROC curve by evaluating the model's predicted winners against the actual outcomes of that game. " and you only looked at the outcome at t=48 then I am not sure that evaluation is so meaningful.

I don't understand your question. The "ground truth" is the easy part. By definition, it is the outcome of each game. The model's probabilistic outcomes should align with how often teams in those circumstances end up winning the game. It's pretty straight-forward. You can compare different model performances against each other by running an ANOVA test and seeing which ones yield better accuracy. As an example, I improved my model's performance by adding interaction terms between 'time_remaining' and 'score_differential', among other interactions. That yielded better performance. But ultimately, my point is that none of this matters. At least not for what I wish to accomplish. I'm arguing that a classification model such as logistic regression (or any other ML probabilistic classifier you want to use) is not the right tool for the job of weighting "garbage time" possessions.

Posted: **Thu Dec 31, 2020 2:39 am**

vzografos wrote: ↑Wed Dec 30, 2020 10:43 pm BTW, have a look at this guy

http://stats.inpredictable.com/nba/wpBox_live.php

My model outputs would look very similar to this. I'm not saying that such a model is entirely useless for all purposes. It could certainly be used to make predictions for in-game betting markets if desired. It could also be improved by incorporating home-field-advantage and team strength components if that were the goal. I'm just saying that it's not the best approach for my particular use case.

Posted: **Thu Dec 31, 2020 2:52 am**

vzografos wrote: ↑Wed Dec 30, 2020 10:43 pm BTW, have a look at this guy

http://stats.inpredictable.com/nba/wpBox_live.php

I should probably point out that this particular model doesn't look like it's very accurate. As an example, it has ATL as a 92.3% favorite over NJN with 32 seconds left in the 4th quarter and a score of 136-139. This number is almost certainly too high.

Posted: **Thu Dec 31, 2020 2:14 pm**

rainmantrail wrote: ↑Thu Dec 31, 2020 2:52 am
vzografos wrote: ↑Wed Dec 30, 2020 10:43 pm BTW, have a look at this guy

http://stats.inpredictable.com/nba/wpBox_live.php
I should probably point out that this particular model doesn't look like it's very accurate. As an example, it has ATL as a 92.3% favorite over NJN with 32 seconds left in the 4th quarter and a score of 136-139. This number is almost certainly too high.

Just thinking about that individual case, that sounds calibrated approximately correctly to me. 3 point lead, without the ball, 30 seconds left...

Posted: **Thu Dec 31, 2020 2:23 pm**

i guess the disagreement in the last two posts above prove my point.

You cannot evaluate AT A GIVEN TIME t, the prediction accuracy of a live time-based model when all you have is the final outcome (win/loss). You can only evaluate it in the end. But not during the game

Posted: **Thu Dec 31, 2020 2:25 pm**

rainmantrail wrote: ↑Thu Dec 31, 2020 2:33 am I don't understand your question. The "ground truth" is the easy part. By definition, it is the outcome of each game. The model's probabilistic outcomes should align with how often teams in those circumstances end up winning the game. It's pretty straight-forward. You can compare different model performances against each other by running an ANOVA test and seeing which ones yield better accuracy.

I dont think you understood my point.

Ok here is a simple question. From the three toy example models I gave you in my previous post. Which one is better? Or more accurate if you like?

Posted: **Thu Dec 31, 2020 3:55 pm**

vzografos wrote: ↑Thu Dec 31, 2020 2:23 pm

i guess the disagreement in the last two posts above prove my point.

You cannot evaluate AT A GIVEN TIME t, the prediction accuracy of a live time-based model when all you have is the final outcome (win/loss). You can only evaluate it in the end. But not during the game

Well, you can evaluate a ton of estimates, in bins, vs. what actually occurred to check the calibration of various regions of the estimation.

Posted: **Thu Dec 31, 2020 5:31 pm**

DSMok1 wrote: ↑Thu Dec 31, 2020 3:55 pm Well, you can evaluate a ton of estimates, in bins, vs. what actually occurred to check the calibration of various regions of the estimation.

My question was how can you evaluate your prediction at any time during the game based only on the game outcome?

APBRmetrics

Garbage Time

Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time

Re: Garbage Time