Ideas for data to extract from PBP

Home for all your discussion of basketball statistical analysis.
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Re: Ideas for data to extract from PBP

Post by schtevie »

Daniel, what I make of it is the KGification of the Boston offense. For better or, far more likely, for worse. Nash has about 9% more of his assists within three feet of the basket. This is about the same % of "excess" assists Rondo has at long mid-range.

What would be interesting to see is the Nash/Rondo comparison, for lineups not including KG (and perhaps the removal of one or more of Stoudemire/Frye/Hill, who have played a similar mid-range role, on the other side).
Crow
Posts: 10536
Joined: Thu Apr 14, 2011 11:10 pm

Re: Ideas for data to extract from PBP

Post by Crow »

DSMok1 wrote:Interesting. Pierce and (particularly) Garnett are both exceptionally good midrange shooters.

Exceptionally good midrange shooters? What data specifically are you referring to? I am not sure what link , if any, you are using.

I see Garnett as an exceptionally good midrange shooter at Hoopdata (about 47%) but overall those shots are still below league average eFG% for all shots. Pierce for all his midrange shots is quite near league average for midrange shots and those shots are dramatically below league average eFG% for all shots. Discretionary midrangers that are not wide open are probably just passable for Garnett and often a poor choice by Pierce.
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Ideas for data to extract from PBP

Post by DSMok1 »

schtevie wrote:Daniel, what I make of it is the KGification of the Boston offense. For better or, far more likely, for worse. Nash has about 9% more of his assists within three feet of the basket. This is about the same % of "excess" assists Rondo has at long mid-range.

What would be interesting to see is the Nash/Rondo comparison, for lineups not including KG (and perhaps the removal of one or more of Stoudemire/Frye/Hill, who have played a similar mid-range role, on the other side).
I agree, it's mostly about KG.
Crow wrote:Exceptionally good midrange shooters? What data specifically are you referring to? I am not sure what link , if any, you are using.

I see Garnett as an exceptionally good midrange shooter at Hoopdata (about 47%) but overall those shots are still below league average eFG% for all shots. Pierce for all his midrange shots is quite near league average for midrange shots and those shots are dramatically below league average eFG% for all shots. Discretionary midrangers that are not wide open are probably just passable for Garnett and often a poor choice by Pierce.
I guess I was primarily thinking of Garnett, who is certainly one of the best in the midrange. Certainly above the threshold where shooting them contributes to the overall team efficiency.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Re: Ideas for data to extract from PBP

Post by schtevie »

DSMok1 wrote:I guess I was primarily thinking of Garnett, who is certainly one of the best in the midrange. Certainly above the threshold where shooting them contributes to the overall team efficiency.
Daniel, I am curious as to how you can be certain about the second point.

To be certain requires establishing the baseline for some kind of implicit comparison.

The simplest baseline - average PPP for mid-range vs. non-mid-range shots - clearly shows the former to be inferior. Even for KG.

Another baseline, introducing a simple notion of opportunity cost, makes certainty more likely. If a certain fraction of KGs mid-range shots are taken late in the shot clock, his below-global-average mid-range shot might, in fact, be above average given the alternative. And definitively answering a question like this (and the more important related question of shot clock optimization) is why gathering shot clock time/distance data is very important.

But then there is a third baseline, expanding the concept of opportunity cost, where certainty seems to me to be less likely. Who's to say that KG (in particular) has been optimally deployed throughout his career in taking so many long 2s relative to other shots. The recent plot Jeremias provided reminds us of this point. The Celtics could well be more efficient as a team if KGs assists from Rondo were occurring around the basket.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

Click. Rondo to Allen (blue), Pierce (red), Garnett (yellow). Total numbers, although it would probably be a good idea to normalize by shooter's # of offensive possessions

edit: CP3's graph looks almost exactly like Rondo's if you put "rim" and "1ft" in the same bin
kpascual
Posts: 50
Joined: Thu Mar 01, 2012 7:02 pm

Re: Ideas for data to extract from PBP

Post by kpascual »

J.E. wrote:
kpascual wrote:How about 5 man units?
You mean something like this http://stats-for-the-nba.appspot.com/PB ... br_ids.rar ?
See viewtopic.php?f=2&t=8033
Yes I did, and I am an idiot because I've been to your site many times and have seen this. Please ignore me.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

For the '11-'12 season I get the following distribution of 2 pointers taken by distance

Does anyone have any suggestions on what bins I should use? I'm thinking [0-3] (around the hoop), [4-13] (midrange), [14-24] (longer midrange and long distance 2)

I don't want to create too many bins, because I still want to use a player's FG% for that bin, which I can only do if said players has more than X shots in that bin
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

Here's where And1s happen by shot distance

Not too surprising. Somewhere around '10 bbr started to list them as "1 ft" or "2 ft", instead of using "at rim" like they did before. Not sure why
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Ideas for data to extract from PBP

Post by DSMok1 »

J.E. wrote:For the '11-'12 season I get the following distribution of 2 pointers taken by distance

Does anyone have any suggestions on what bins I should use? I'm thinking [0-3] (around the hoop), [4-13] (midrange), [14-24] (longer midrange and long distance 2)

I don't want to create too many bins, because I still want to use a player's FG% for that bin, which I can only do if said players has more than X shots in that bin
Those bins look reasonable. It might be worthwhile to have a shorter range bin that can catch the floaters/hooks as distinct from jumpers (perhaps 4-8 or so?).
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

A bit of a data dump with some valuable PBP info like:
-rebounds split into afterFG and afterFT
-And1s (split into subsequent FT missed/made)
-defensive fouls drawn
-offensive fouls drawn
-away assists
-away blocks (split into defense/offense got the ball "good block/bad block")
-live/dead TOs
-(un)assisted makes(2s) from close/mid/far

and maybe more. The first couple of columns are standard BoxScore data, up until "POINTS"

http://stats-for-the-nba.appspot.com/data/2006.txt
http://stats-for-the-nba.appspot.com/data/2007.txt
etc.

Please tell me if you spot any errors
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Ideas for data to extract from PBP

Post by DSMok1 »

What is the source for this data? Basketball Reference?
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: Ideas for data to extract from PBP

Post by v-zero »

DSMok1 wrote:What is the source for this data? Basketball Reference?
Yeah, you can tell from the unique IDS.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Ideas for data to extract from PBP

Post by mystic »

J.E. wrote: Please tell me if you spot any errors
Age, you are adding up the age as well for players who played on multiple teams in the respective season. Right at the start of the 2006 file, Jim Jackson is listed with the age of 70, because he played for Suns and Lakers.
KAN
Posts: 10
Joined: Thu Oct 18, 2012 2:44 pm

Re: Ideas for data to extract from PBP

Post by KAN »

I'm not sure if it is too late for this; but, it would be nice to have counterpart stats from the play by play data pubically available, similar to what 82games.com does.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

mystic wrote:Age, you are adding up the age as well for players who played on multiple teams in the respective season. Right at the start of the 2006 file, Jim Jackson is listed with the age of 70, because he played for Suns and Lakers.
Right, thanks. %s are also messed up for players that played on multiple teams.
I'm not sure if it is too late for this; but, it would be nice to have counterpart stats from the play by play data pubically available, similar to what 82games.com does.
You can never tell for sure who is defending whom via PBP, so I'm not too excited about trying to extract counterpart data
Post Reply