Factors Determining Production in Basketball
- 
				martinezjose
- Posts: 14
- Joined: Sun Apr 24, 2011 10:27 pm
Factors Determining Production in Basketball
Dear all,
Recently, I have published a new paper regarding player evaluation metrics. My paper does not pretend to be any "revolutionary" idea to measure productivity of players, but another tool for coaches, analysts and fans, to help to analyse basketball players.
The complete reference of the paper is
Martínez, J. A. (2012). Factors determining production (FDP) in basketball. Economic & Business Letters, 1 (1), 21-29.
You may download the paper here:
http://www.unioviedo.es/reunido/index.p ... /9346/9224
As you may read in the paper, this is a new form of considering productivity of players in baskeball, by separating points (production) from factors determining production.
Comments are welcome!
In addition I have made some additional computations with the recently finished 2012 season.
In this spreadsheet, I have dropped players with less than 10 games played an less than five minutes per game played. Some results are very interesting.
Please be cautios. To compare players from different teams you may suppose that conditions of the game are the same among disparate players and teams, and this is a somewhat dangerous assumption... But if you want to "dream" with that ideal context, you can see that, for example, Lebron did better than Kobe and Durant...
Enjoy!
			
			
													Recently, I have published a new paper regarding player evaluation metrics. My paper does not pretend to be any "revolutionary" idea to measure productivity of players, but another tool for coaches, analysts and fans, to help to analyse basketball players.
The complete reference of the paper is
Martínez, J. A. (2012). Factors determining production (FDP) in basketball. Economic & Business Letters, 1 (1), 21-29.
You may download the paper here:
http://www.unioviedo.es/reunido/index.p ... /9346/9224
As you may read in the paper, this is a new form of considering productivity of players in baskeball, by separating points (production) from factors determining production.
Comments are welcome!
In addition I have made some additional computations with the recently finished 2012 season.
In this spreadsheet, I have dropped players with less than 10 games played an less than five minutes per game played. Some results are very interesting.
Please be cautios. To compare players from different teams you may suppose that conditions of the game are the same among disparate players and teams, and this is a somewhat dangerous assumption... But if you want to "dream" with that ideal context, you can see that, for example, Lebron did better than Kobe and Durant...
Enjoy!
					Last edited by martinezjose on Wed Jul 18, 2012 5:17 pm, edited 2 times in total.
									
			
						
										
						Re: Factors Determining Production in Basketball
Has this paper already been finalized, or is this an advance online version?
			
			
									
						
										
						- 
				martinezjose
- Posts: 14
- Joined: Sun Apr 24, 2011 10:27 pm
Re: Factors Determining Production in Basketball
It has been published in May. This is an open access online journal.
			
			
									
						
										
						Re: Factors Determining Production in Basketball
After a read-through, I see a number of issues.  First, there's a factual mistake on the second page.  In your example of a player who steals the ball and gets 90 lay-ups, you say he would have a Win Score of 270.  He would actually have a Win Score of 180 because you need to subtract a point for each shot attempt.  That doesn't fix your concern of Win Score not summing to point differential, of course, but that leads me to a second concern.  Berri has notably said that Win Score is a quick approximation of Wins Produced but they are not the same thing.  Also, Wins Produced (as the name suggests) is meant to relate to wins, not to point differential.  So it seems odd to criticize Win Score for not summing to point differential when it is at best two steps removed from that goal.  
In terms of FDP, it's an interesting way to evaluate team performance. It reminds me of the method at Advanced NFL Stats, where scoring isn't used as a predictor but is assumed to be an outcome of other events on the field. But it seems a little problematic to use differences in team variables (such as home missed field goals minus away missed field goals) to evaluate individual players. For example, the home team could get an advantage in missed field goals by a) making more shots, b) having the away team miss more shots, or c) simply shooting less. The model doesn't account for made or total shots at all, and there's no method of tying defense to a particular player. To make up an example, let's say there's a game where Kobe missed 10 shots but the Lakers as a team missed 5 fewer shots than their opponent. Although the model is built from the latter information, it seems like one would have to do a lot of work to try to tie that information to the former. The model must assume that Kobe hurt his team, but that is not clear at all (what if, for example, he took 40 shots? Making 30 of 40 shots would be very good).
Related to the above issue, the lack of shot attempts or makes is problematic because players are simply punished for shooting with no regard for their accuracy. Ignoring whether or not LeBron "should" be the 14th-best All-Star, it is ludicrous to suggest that he makes a negative contribution to his team. Presumably his score in that table is due to the fact that he takes a decent amount of shots, and thus misses more shots than some players even though he shoots well. If you still wanted to leave out points as a predictor of differential, perhaps you could have removed missed field goals and used field goal percentage or something else.
In your validation, it's a little odd that you used season-level results to test a model based on game-by-game data.
Your test of consistency is a little unclear. When Berri refers to consistency, he means that good players in one year tend to be good the next year, and bad players stay bad. So he would typically look at the correlation between Wins Produced per 48 for all players in 2010 and Wins Produced per 48 for all players in 2011, for example. If you run such a correlation for a number of metrics, box score measures tend to be fairly consistent whereas something like APM is not. From the paper it sounds like you normalized your FDP score for some players and found that it was similar to normalized Win Score values. That isn't a test of the consistency of FDP so much as a test of whether FDP produces similar values to Win Score.
I'm also unclear on your final test with the additional NBA and Spanish League seasons. Did you run the same model on those seasons to get your R-squared values, or did you use the parameters from the rest of the paper to predict the new seasons and create R-squared values? It's possible, but very unlikely, that the parameters for 2009 would fit 2010 and 2011 even better than they fit 2009, but that seems to be what you're claiming.
			
			
									
						
										
						In terms of FDP, it's an interesting way to evaluate team performance. It reminds me of the method at Advanced NFL Stats, where scoring isn't used as a predictor but is assumed to be an outcome of other events on the field. But it seems a little problematic to use differences in team variables (such as home missed field goals minus away missed field goals) to evaluate individual players. For example, the home team could get an advantage in missed field goals by a) making more shots, b) having the away team miss more shots, or c) simply shooting less. The model doesn't account for made or total shots at all, and there's no method of tying defense to a particular player. To make up an example, let's say there's a game where Kobe missed 10 shots but the Lakers as a team missed 5 fewer shots than their opponent. Although the model is built from the latter information, it seems like one would have to do a lot of work to try to tie that information to the former. The model must assume that Kobe hurt his team, but that is not clear at all (what if, for example, he took 40 shots? Making 30 of 40 shots would be very good).
Related to the above issue, the lack of shot attempts or makes is problematic because players are simply punished for shooting with no regard for their accuracy. Ignoring whether or not LeBron "should" be the 14th-best All-Star, it is ludicrous to suggest that he makes a negative contribution to his team. Presumably his score in that table is due to the fact that he takes a decent amount of shots, and thus misses more shots than some players even though he shoots well. If you still wanted to leave out points as a predictor of differential, perhaps you could have removed missed field goals and used field goal percentage or something else.
In your validation, it's a little odd that you used season-level results to test a model based on game-by-game data.
Your test of consistency is a little unclear. When Berri refers to consistency, he means that good players in one year tend to be good the next year, and bad players stay bad. So he would typically look at the correlation between Wins Produced per 48 for all players in 2010 and Wins Produced per 48 for all players in 2011, for example. If you run such a correlation for a number of metrics, box score measures tend to be fairly consistent whereas something like APM is not. From the paper it sounds like you normalized your FDP score for some players and found that it was similar to normalized Win Score values. That isn't a test of the consistency of FDP so much as a test of whether FDP produces similar values to Win Score.
I'm also unclear on your final test with the additional NBA and Spanish League seasons. Did you run the same model on those seasons to get your R-squared values, or did you use the parameters from the rest of the paper to predict the new seasons and create R-squared values? It's possible, but very unlikely, that the parameters for 2009 would fit 2010 and 2011 even better than they fit 2009, but that seems to be what you're claiming.
- 
				martinezjose
- Posts: 14
- Joined: Sun Apr 24, 2011 10:27 pm
Re: Factors Determining Production in Basketball
Thanks for the comments.
I think some of your comments are answered in the paper.
Regarding your last question, yes, I used the fixed parameters estimated to get the r-squared, as I also wrote in the paper.
You do not have interpreted correctly the results. LeBron James do not make a negative contribution to his team. This would ocurr if, for example, he would score 0 points per minute and he would make 0 blocks per minute. Then, as his FDP is negative, we could say that his contribution is negative. As I said in the paper, you should interpret FDP, points and blocks together. They are variables at different levels, but they draw an overall picture of the tangible contribution of players. Please, see the note below the table, where I speak about the concept of "paradoxical players"....
Regarding consistency, normalized standard deviation may be a good tool to evaluate the performance of players along seasons. I do not agree that this is not a measure of consistency of performance.
Just another final comment to your interpretation of my paper. A regression model yields residuals. Model is fitted using a given data. If you want to obtain a prediction from new data, using a boundary condition that is close to an outlier, as for example a player who take 40 shots, then it is highly probable that your prediction would not be very good.
Thanks again for the comments. And yes, you are right regarding the mistake of the scond page. Sorry. And as you say, this does not invalid my reasoning.
My paper is not a fight against Berri work. I appreciate the Berri and colleagues' work. It is simple another form of viewing production. It has limitations, of course
			
			
									
						
										
						I think some of your comments are answered in the paper.
Regarding your last question, yes, I used the fixed parameters estimated to get the r-squared, as I also wrote in the paper.
You do not have interpreted correctly the results. LeBron James do not make a negative contribution to his team. This would ocurr if, for example, he would score 0 points per minute and he would make 0 blocks per minute. Then, as his FDP is negative, we could say that his contribution is negative. As I said in the paper, you should interpret FDP, points and blocks together. They are variables at different levels, but they draw an overall picture of the tangible contribution of players. Please, see the note below the table, where I speak about the concept of "paradoxical players"....
Regarding consistency, normalized standard deviation may be a good tool to evaluate the performance of players along seasons. I do not agree that this is not a measure of consistency of performance.
Just another final comment to your interpretation of my paper. A regression model yields residuals. Model is fitted using a given data. If you want to obtain a prediction from new data, using a boundary condition that is close to an outlier, as for example a player who take 40 shots, then it is highly probable that your prediction would not be very good.
Thanks again for the comments. And yes, you are right regarding the mistake of the scond page. Sorry. And as you say, this does not invalid my reasoning.
My paper is not a fight against Berri work. I appreciate the Berri and colleagues' work. It is simple another form of viewing production. It has limitations, of course

Re: Factors Determining Production in Basketball
Have you done out of sample prediction?
			
			
									
						
										
						- 
				martinezjose
- Posts: 14
- Joined: Sun Apr 24, 2011 10:27 pm
Re: Factors Determining Production in Basketball
Hi J. E.
I am not sure what you mean exactly with "out of sample prediction"...Sorry.
If you refer to test the model in other samples, yes, this is what I have done by employing the usal procedure achieved in some data mining studies.
Finally, FDP for the box-scores of the following two seasons in the NBA (2010 and
2011) and for a sample of box-scores of the Spanish ACB League from 1991 to 2010
was calculated. Observations after deleting outliers were 1106, 1125 and 485,
respectively. Prediction of the model in these three samples was computed, using the
fixed parameters estimated following the suggestions of Schmueli, Patel and Bruce
(2007). The aim was to assess the performance of the model with new data. Explained
variance for each model was 0.73, 0.74 and 0.75, which was akin to the 0.72 obtained
for the original model.
			
			
									
						
										
						I am not sure what you mean exactly with "out of sample prediction"...Sorry.
If you refer to test the model in other samples, yes, this is what I have done by employing the usal procedure achieved in some data mining studies.
Finally, FDP for the box-scores of the following two seasons in the NBA (2010 and
2011) and for a sample of box-scores of the Spanish ACB League from 1991 to 2010
was calculated. Observations after deleting outliers were 1106, 1125 and 485,
respectively. Prediction of the model in these three samples was computed, using the
fixed parameters estimated following the suggestions of Schmueli, Patel and Bruce
(2007). The aim was to assess the performance of the model with new data. Explained
variance for each model was 0.73, 0.74 and 0.75, which was akin to the 0.72 obtained
for the original model.
Re: Factors Determining Production in Basketball
I guess I'm a little confused then; what's the benefit of FDP over some of the other metrics available if I need to combine it with points and blocks in some unknown way?  I saw the note below the chart; those players aren't paradoxical so much as they simply aren't evaluated fully.  Your metric says that they shoot a lot and offset that relatively less with rebounds, assists, and steals.  But if you were to include shooting accuracy in some way, they would presumably not have paradoxical scores any more.  Consensus great players like Durant and LeBron would have positive, high scores instead of being below guys like Deron Williams or Al Horford, and presumably no one on an All-Star team would be a negative contributor.
Another way, perhaps, to ask the question would be to say: if Wins Score (for example) is limited because its results do not directly map to point differential, why is FDP better if good players have negative values (implying that they would have a negative contribution to point differential)? If I have to combine FDP with other measures to begin to evaluate players, it seems that it does not relate to point differential very well either.
I also wasn't trying to say that normalized scores are a bad way to evaluate players. I was saying that it doesn't evaluate consistency in the same way that others typically talk about consistency.
My Kobe example was obviously unusual, although he has taken 40 shots in a game multiple times in his career. The point was more that an individual player's missed shots may not relate to his team's fortune in any direct way. Say that Player A and Player B both missed 5 shots. Player A only took 5 shots while Player B took 15; thus Player A shot 0% while Player B shot 67%. Assuming all their other stats were the same, FDP would rank them the same while it would seem obvious that Player B helped his team more. I guess this relates to my concern about having to account for other statistics (such as points) in order to actually evaluate players.
I didn't think you were fighting against or dismissing Berri's work, I think I'm just more confused as to when I would use FDP as opposed to any of the other single-number measures out there.
			
			
									
						
										
						Another way, perhaps, to ask the question would be to say: if Wins Score (for example) is limited because its results do not directly map to point differential, why is FDP better if good players have negative values (implying that they would have a negative contribution to point differential)? If I have to combine FDP with other measures to begin to evaluate players, it seems that it does not relate to point differential very well either.
I also wasn't trying to say that normalized scores are a bad way to evaluate players. I was saying that it doesn't evaluate consistency in the same way that others typically talk about consistency.
My Kobe example was obviously unusual, although he has taken 40 shots in a game multiple times in his career. The point was more that an individual player's missed shots may not relate to his team's fortune in any direct way. Say that Player A and Player B both missed 5 shots. Player A only took 5 shots while Player B took 15; thus Player A shot 0% while Player B shot 67%. Assuming all their other stats were the same, FDP would rank them the same while it would seem obvious that Player B helped his team more. I guess this relates to my concern about having to account for other statistics (such as points) in order to actually evaluate players.
I didn't think you were fighting against or dismissing Berri's work, I think I'm just more confused as to when I would use FDP as opposed to any of the other single-number measures out there.
- 
				martinezjose
- Posts: 14
- Joined: Sun Apr 24, 2011 10:27 pm
Re: Factors Determining Production in Basketball
xkonk, FDP is a form of obtaining the weight of some basic statistics in order to obtain a single metric which explain (with a noticeable merit) the outcome of games (points differential). When a player has a negative FDP, he is dammaging his team. Obviously, if he is a good scorer, then he is yielding good outcomes. 
Think for example about a seller, as an analogy. A seller could sell many products, and produce high incomes for his company. At the same time, the seller could be a bad fellow, he could create a bad ambient among others sellers, etc, i.e. a negative FDP. His supervisor has to decide if his good achievemnt overcome his bad behavior, considering also the performance of the other sellers and their behaviour as a team.
Quantitave analysis of sports is based on some premises, such as to break with the social conventions of sports. David Berri speaks a lot about this in his books. If you think that your qualitative evaluation of a player overcome any quantitative evaluation challenging your thought, then you do not need basketball metrics.
Obviously, the laugh test called by Dean Oliver is always present. But sometimes, laught test can be criticized...some times.
We use basketball analytics to go beyond the conventions. But again, I do not think my results are far from many of the conventions about players.
			
			
									
						
										
						Think for example about a seller, as an analogy. A seller could sell many products, and produce high incomes for his company. At the same time, the seller could be a bad fellow, he could create a bad ambient among others sellers, etc, i.e. a negative FDP. His supervisor has to decide if his good achievemnt overcome his bad behavior, considering also the performance of the other sellers and their behaviour as a team.
Quantitave analysis of sports is based on some premises, such as to break with the social conventions of sports. David Berri speaks a lot about this in his books. If you think that your qualitative evaluation of a player overcome any quantitative evaluation challenging your thought, then you do not need basketball metrics.
Obviously, the laugh test called by Dean Oliver is always present. But sometimes, laught test can be criticized...some times.
We use basketball analytics to go beyond the conventions. But again, I do not think my results are far from many of the conventions about players.
- 
				martinezjose
- Posts: 14
- Joined: Sun Apr 24, 2011 10:27 pm
Re: Factors Determining Production in Basketball
Antother way of thinking about what it means FDP is soccer. Soccer is the most popular sport in the world (not very much in the USA). Players such Romario, Ronaldo (not CR7) and other famous players were very good scorers, but very bad workers for their team.
Some coaches did not want such type of players in their teams, because of this latter reason (it would be a very negative FDP). These players were extraordinary scorers (they help their team), but the were extraordinary bad workers (they hurt their team to produce goals and to avoid goals)
I think this is one of the reson player such Leo Messi are so awesome. Messi scores many goals and also work for his team very much. He also assits other players.
If we back to basketball, the question would be ¿a player who produce many points but he has a high negative FDP is good for a team?
In addition, some years ago we read why do advanced statistics hate Kobe Bryant? What is the FDP of Bryant?....
			
			
									
						
										
						Some coaches did not want such type of players in their teams, because of this latter reason (it would be a very negative FDP). These players were extraordinary scorers (they help their team), but the were extraordinary bad workers (they hurt their team to produce goals and to avoid goals)
I think this is one of the reson player such Leo Messi are so awesome. Messi scores many goals and also work for his team very much. He also assits other players.
If we back to basketball, the question would be ¿a player who produce many points but he has a high negative FDP is good for a team?
In addition, some years ago we read why do advanced statistics hate Kobe Bryant? What is the FDP of Bryant?....
Re: Factors Determining Production in Basketball
So to extend your analogy, LeBron is a bad worker?  It appears that description would apply to most of the All-Star team, right?  Everyone below Chris Paul in your chart has a negative FDP and is, as you say, damaging their teams.  If FDP is a single metric that explains the outcomes of games, I would predict that All-Star team to lose to a bunch of average players (assuming an average player has a FDP of 0, since negative values are damaging).
			
			
									
						
										
						- 
				martinezjose
- Posts: 14
- Joined: Sun Apr 24, 2011 10:27 pm
Re: Factors Determining Production in Basketball
Yes, LeBron dammaged his team. I do not think he is a bad worker (do not missinterpret my analogy). And the average FDP is not 0, but -0.072 for the 2012 season. His contribution to point differential was negative, but obviously he made many points, so his coach surely prefer to count with LeBron. However, probably his coach think that he needs player with positive FDP to improve the global team production.
Michael Jordan also had a negative career FDP (-0.14), but Magic Johnson had a positive FDP (0.02).
Dennis Rodman had a high positive FDP (0.12). Was Rodman the best player of the history of the NBA (as some metric seemed to say some year ago). I do not think so. Maybe he was one of the best "team players" of the history.
I think coaches want to have in their roster great scorers (probably players with negative FDP) and players with positive FDP, in order to improve team performance.
Will coach K form a line-up in the summer olympics with Williams, Durant, Kobe and Melo? Ummmmm, probably no. They are all great scorers but they have poor FDP.
Think about the importance of players such Ibaka, Collison or Perkins for the Thunders, or Anthoy or Haslem for the Heats.
			
			
									
						
										
						Michael Jordan also had a negative career FDP (-0.14), but Magic Johnson had a positive FDP (0.02).
Dennis Rodman had a high positive FDP (0.12). Was Rodman the best player of the history of the NBA (as some metric seemed to say some year ago). I do not think so. Maybe he was one of the best "team players" of the history.
I think coaches want to have in their roster great scorers (probably players with negative FDP) and players with positive FDP, in order to improve team performance.
Will coach K form a line-up in the summer olympics with Williams, Durant, Kobe and Melo? Ummmmm, probably no. They are all great scorers but they have poor FDP.
Think about the importance of players such Ibaka, Collison or Perkins for the Thunders, or Anthoy or Haslem for the Heats.
Re: Factors Determining Production in Basketball
I'm curious... why did you use data from only 3 semi-random seasons?martinezjose wrote:As you may read in the paper, this is a new form of considering productivity of players in baskeball, by separating points (production) from factors determining production.
Comments are welcome!
regards,
wiLQ @ http://weaksideawareness.wordpress.com
			
						wiLQ @ http://weaksideawareness.wordpress.com
Re: Factors Determining Production in Basketball
After going through this paper, it strikes me as an odd rating system. Players are penalized for missing shots, but not rewarded for making them. Basically, this system comes down to rebounds + blocks - FGA (especially 3FGA).
As I go through the spreadsheet, I'm finding results that can only be described as absurd. Examples:
- Stiemsma is the top-ranked player for the Celtics, with Rondo being the only other player in positive territory (barely). Kevin Garnett = a negative for the Celtics. Dead last on Boston in FDP: Paul Pierce.
- Dead last for Chicago: Derrick Rose.
- Varejao was the best player on Cleveland last season. Kyrie Irving -- 2nd to last on his own team.
- Apparently, Dallas REALLY screwed up in amnestying Brendan Haywood, who was #1 on their team in FDP. Going by this, they should have dumped Nowitzki, who was worst on the Mavs in this stat.
- Chris Paul and Blake Griffin both hurt the Clippers this season.
- Kobe wasn't just the worst player on the Lakers last season, he was one of the worst in the league, according to FDP.
- According to FDP, Miami was carried to their title by a Big Two of Joel Anthony and Udonis Haslem. Lebron, Bosh and Wade were all negatives in FDP.
- Not a single T-wolves player was in positive FDP territory, including Kevin Love.
- Shelden Williams was the best Net; Deron Williams the worst.
- FDP would seem to suggest that OKC could improve if they could rid themselves of Durant and Westbrook (3rd worst and worst respectively) to open playing time for Nick Collison, Cole Aldrich and Kendrick Perkins.
- Tim Duncan, Manu Ginobili and Tony Parker -- all had negative FDP scores. As did every player for the Spurs except Kawhi Leonard.
			
			
									
						
										
						As I go through the spreadsheet, I'm finding results that can only be described as absurd. Examples:
- Stiemsma is the top-ranked player for the Celtics, with Rondo being the only other player in positive territory (barely). Kevin Garnett = a negative for the Celtics. Dead last on Boston in FDP: Paul Pierce.
- Dead last for Chicago: Derrick Rose.
- Varejao was the best player on Cleveland last season. Kyrie Irving -- 2nd to last on his own team.
- Apparently, Dallas REALLY screwed up in amnestying Brendan Haywood, who was #1 on their team in FDP. Going by this, they should have dumped Nowitzki, who was worst on the Mavs in this stat.
- Chris Paul and Blake Griffin both hurt the Clippers this season.
- Kobe wasn't just the worst player on the Lakers last season, he was one of the worst in the league, according to FDP.
- According to FDP, Miami was carried to their title by a Big Two of Joel Anthony and Udonis Haslem. Lebron, Bosh and Wade were all negatives in FDP.
- Not a single T-wolves player was in positive FDP territory, including Kevin Love.
- Shelden Williams was the best Net; Deron Williams the worst.
- FDP would seem to suggest that OKC could improve if they could rid themselves of Durant and Westbrook (3rd worst and worst respectively) to open playing time for Nick Collison, Cole Aldrich and Kendrick Perkins.
- Tim Duncan, Manu Ginobili and Tony Parker -- all had negative FDP scores. As did every player for the Spurs except Kawhi Leonard.
- 
				martinezjose
- Posts: 14
- Joined: Sun Apr 24, 2011 10:27 pm
Re: Factors Determining Production in Basketball
Hi wilq, because this was the data available. I did not have box-scores of seasons before 2006 in a friendly format. Thanks to www.nbastuffer.com, I obtained the box-scores from 2007 to 2011.I'm curious... why did you use data from only 3 semi-random seasons?
Anyway, I think 3 seasons provide a non-small sample to fit a model, but obviously I always prefer higher samples if they are available.