Ideas for data to extract from PBP

Home for all your discussion of basketball statistical analysis.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Ideas for data to extract from PBP

Post by J.E. »

I'm in the process of extracting player specific data from PBP data and need some ideas on what else to extract. So far I plan to record the following actions:

Made/Missed Shot from distance X, with 3-5 bins for distance, maybe: a) dunk/layup, b) close range, c) mid range jumper, d) 3pointer. For made shots split into assisted and unassisted
Assists. Split for shot distance, whether the shot was an And1
Offensive/Defensive Rebounds. Maybe split into 2-3 different types depending on where the shot was coming from (FT, close range, etc.)?
Team Offensive/Defensive Rebounds
Steals. Maybe sperate those that led to a fast/easy buckets (look for layups/dunks <5 secs after steals)
Charges drawn
Turnovers. offensive 3sec, illegal screen, dribble TO, bad pass, offensive foul, what else?
Team turnovers: 24sec
Blocks, 2 types: a) defending team got the ball b) offense gets the ball back
Defensive goaltends
Defensive fouls: shooting, non-shooting (further split into: loose ball, defensive 3sec, standard personal, flagrant?)
Fouls drawn: shooting, non-shooting
And1s: split into different shot distances?
FTs, remove And1 FTs from the total number of FTs

If you think something important is missing from this list, tell me. Whatever it is, the information needs to be present in bbr PBP files
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Ideas for data to extract from PBP

Post by DSMok1 »

Information on shot clock remaining would be useful as well, if possible.

It certainly would be nice to get assist locations figured out; I did some research a while back indicating that some assists were far more valuable than others, to a large extent explaining Steve Nash's incredible offensive value shown by RAPM.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

DSMok1 wrote:Information on shot clock remaining would be useful as well, if possible.
Yeah. Good idea
It certainly would be nice to get assist locations figured out;
You mean location of the passer of the shooter? bbr doesn't have location for either, but it has distance of the shooter
For shotblocking, distance of the shot that was blocked might be useful
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Ideas for data to extract from PBP

Post by DSMok1 »

No, I meant locations assisted to, which is what you had already mentioned. A pretty important piece of data. Assists leading to dunks are very valuable, assists leading to 20-footers not so much so.

Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

DSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking data
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Ideas for data to extract from PBP

Post by DSMok1 »

J.E. wrote:
DSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking data
Still, some sort of ID/size of the blockee for each block would be interesting.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
kpascual
Posts: 50
Joined: Thu Mar 01, 2012 7:02 pm

Re: Ideas for data to extract from PBP

Post by kpascual »

How about 5 man units? I remember Ryan Parker doing this a few years ago, and it's probably how basketballvalue does it, too. I think I tried doing this over a weekend way back when (https://github.com/kpascual/nbascrape/b ... fiveman.py), but couldn't quite get it to work for all cases.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

kpascual wrote:How about 5 man units?
You mean something like this http://stats-for-the-nba.appspot.com/PB ... br_ids.rar ?
See viewtopic.php?f=2&t=8033
Crow
Posts: 10536
Joined: Thu Apr 14, 2011 11:10 pm

Re: Ideas for data to extract from PBP

Post by Crow »

The close range or shortest mid range jumper dividing line is pretty important. Should the line be set at 3 feet, 4 or 5? I am not 100% fixed on any one of these numbers. Hoopdata uses 3 feet and 5 bins overall. I am fine with that approach but might tinker with it slightly and if you use less than 5 bins you will have to. A 4 foot shot seems more like an under 3 foot shot than the rest of the 3-9 foot category to me. The 5 foot shot is a tougher call.

Having a different data structure might be worthwhile. One alternative approach (just throwing it out there, not saying it is necessarily the best) would be to use: at the rim as 0-4, short jumper as 5- 10, longer mid-range jumper as 11-17, then 18-23. Comparison of the two data structures could possibly give even more fine detail depending on where the dividing lines are and if any are shared.
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: Ideas for data to extract from PBP

Post by EvanZ »

DSMok1 wrote:
J.E. wrote:
DSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking data
Still, some sort of ID/size of the blockee for each block would be interesting.
I'm working on this too with the ESPN shot data.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

Here's an assist profile for Rondo and Nash since '08. You need to subtract 1 from the X-Axis description to get the actual distance. "1" in the chart is "at rim", etc. If you combine "at rim" and "1 ft"("2" in the chart), I'd say they look almost the same. Makes me wonder how much I'd gain from doing assist splitting into shot distances
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Ideas for data to extract from PBP

Post by DSMok1 »

J.E. wrote:Here's an assist profile for Rondo and Nash since '08. You need to subtract 1 from the X-Axis description to get the actual distance. "1" in the chart is "at rim", etc. If you combine "at rim" and "1 ft"("2" in the chart), I'd say they look almost the same. Makes me wonder how much I'd gain from doing assist splitting into shot distances
They look way different to me! It's just you aren't using an ideal presentation type for comparison. I'd love to see them overlaid (line chart) with each one totaling to 100%. Rondo has way more mid-range assists.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Ideas for data to extract from PBP

Post by J.E. »

DSMok1 wrote:They look way different to me! It's just you aren't using an ideal presentation type for comparison. I'd love to see them overlaid (line chart) with each one totaling to 100%. Rondo has way more mid-range assists.
Click. I'd say Rondo's higher % of midrange assists can easily be explained by who he's playing with (Pierce, mostly)
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Ideas for data to extract from PBP

Post by DSMok1 »

Interesting. Pierce and (particularly) Garnett are both exceptionally good midrange shooters.

I think it's interesting data, but I'm not sure what to make of it!
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Re: Ideas for data to extract from PBP

Post by schtevie »

Cool graph! (But I think you're talking about KG. In fact, it seems like it's all KG. Paul Pierce mostly helped himself.)
Post Reply