Page 1 of 2
Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 11:41 am
by J.E.
I'm in the process of extracting player specific data from PBP data and need some ideas on what else to extract. So far I plan to record the following actions:
Made/Missed Shot from distance X, with 3-5 bins for distance, maybe: a) dunk/layup, b) close range, c) mid range jumper, d) 3pointer. For made shots split into assisted and unassisted
Assists. Split for shot distance, whether the shot was an And1
Offensive/Defensive Rebounds. Maybe split into 2-3 different types depending on where the shot was coming from (FT, close range, etc.)?
Team Offensive/Defensive Rebounds
Steals. Maybe sperate those that led to a fast/easy buckets (look for layups/dunks <5 secs after steals)
Charges drawn
Turnovers. offensive 3sec, illegal screen, dribble TO, bad pass, offensive foul, what else?
Team turnovers: 24sec
Blocks, 2 types: a) defending team got the ball b) offense gets the ball back
Defensive goaltends
Defensive fouls: shooting, non-shooting (further split into: loose ball, defensive 3sec, standard personal, flagrant?)
Fouls drawn: shooting, non-shooting
And1s: split into different shot distances?
FTs, remove And1 FTs from the total number of FTs
If you think something important is missing from this list, tell me. Whatever it is, the information needs to be present in bbr PBP files
Re: Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 12:27 pm
by DSMok1
Information on shot clock remaining would be useful as well, if possible.
It certainly would be nice to get assist locations figured out; I did some research a while back indicating that some assists were far more valuable than others, to a large extent explaining Steve Nash's incredible offensive value shown by RAPM.
Re: Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 12:46 pm
by J.E.
DSMok1 wrote:Information on shot clock remaining would be useful as well, if possible.
Yeah. Good idea
It certainly would be nice to get assist locations figured out;
You mean location of the passer of the shooter? bbr doesn't have location for either, but it has distance of the shooter
For shotblocking, distance of the shot that was blocked might be useful
Re: Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 12:51 pm
by DSMok1
No, I meant locations assisted to, which is what you had already mentioned. A pretty important piece of data. Assists leading to dunks are very valuable, assists leading to 20-footers not so much so.
Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Re: Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 1:05 pm
by J.E.
DSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking data
Re: Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 1:37 pm
by DSMok1
J.E. wrote:DSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking data
Still, some sort of ID/size of the blockee for each block would be interesting.
Re: Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 2:04 pm
by kpascual
How about 5 man units? I remember Ryan Parker doing this a few years ago, and it's probably how basketballvalue does it, too. I think I tried doing this over a weekend way back when (
https://github.com/kpascual/nbascrape/b ... fiveman.py), but couldn't quite get it to work for all cases.
Re: Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 2:56 pm
by J.E.
Re: Ideas for data to extract from PBP
Posted: Tue Oct 23, 2012 6:55 pm
by Crow
The close range or shortest mid range jumper dividing line is pretty important. Should the line be set at 3 feet, 4 or 5? I am not 100% fixed on any one of these numbers. Hoopdata uses 3 feet and 5 bins overall. I am fine with that approach but might tinker with it slightly and if you use less than 5 bins you will have to. A 4 foot shot seems more like an under 3 foot shot than the rest of the 3-9 foot category to me. The 5 foot shot is a tougher call.
Having a different data structure might be worthwhile. One alternative approach (just throwing it out there, not saying it is necessarily the best) would be to use: at the rim as 0-4, short jumper as 5- 10, longer mid-range jumper as 11-17, then 18-23. Comparison of the two data structures could possibly give even more fine detail depending on where the dividing lines are and if any are shared.
Re: Ideas for data to extract from PBP
Posted: Wed Oct 24, 2012 5:41 am
by EvanZ
DSMok1 wrote:J.E. wrote:DSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking data
Still, some sort of ID/size of the blockee for each block would be interesting.
I'm working on this too with the ESPN shot data.
Re: Ideas for data to extract from PBP
Posted: Wed Oct 24, 2012 2:27 pm
by J.E.
Here's an assist profile for Rondo and Nash since '08. You need to subtract 1 from the X-Axis description to get the actual distance. "1" in the chart is "at rim", etc. If you combine "at rim" and "1 ft"("2" in the chart), I'd say they look almost the same. Makes me wonder how much I'd gain from doing assist splitting into shot distances
Re: Ideas for data to extract from PBP
Posted: Wed Oct 24, 2012 2:32 pm
by DSMok1
J.E. wrote:Here's an assist profile for Rondo and Nash since '08. You need to subtract 1 from the X-Axis description to get the actual distance. "1" in the chart is "at rim", etc. If you combine "at rim" and "1 ft"("2" in the chart), I'd say they look almost the same. Makes me wonder how much I'd gain from doing assist splitting into shot distances
They look way different to me! It's just you aren't using an ideal presentation type for comparison. I'd love to see them overlaid (line chart) with each one totaling to 100%. Rondo has way more mid-range assists.
Re: Ideas for data to extract from PBP
Posted: Wed Oct 24, 2012 2:40 pm
by J.E.
DSMok1 wrote:They look way different to me! It's just you aren't using an ideal presentation type for comparison. I'd love to see them overlaid (line chart) with each one totaling to 100%. Rondo has way more mid-range assists.
Click. I'd say Rondo's higher % of midrange assists can easily be explained by who he's playing with (Pierce, mostly)
Re: Ideas for data to extract from PBP
Posted: Wed Oct 24, 2012 3:30 pm
by DSMok1
Interesting. Pierce and (particularly) Garnett are both exceptionally good midrange shooters.
I think it's interesting data, but I'm not sure what to make of it!
Re: Ideas for data to extract from PBP
Posted: Wed Oct 24, 2012 3:32 pm
by schtevie
Cool graph! (But I think you're talking about KG. In fact, it seems like it's all KG. Paul Pierce mostly helped himself.)