Ideas for data to extract from PBP
Ideas for data to extract from PBP
I'm in the process of extracting player specific data from PBP data and need some ideas on what else to extract. So far I plan to record the following actions:
Made/Missed Shot from distance X, with 3-5 bins for distance, maybe: a) dunk/layup, b) close range, c) mid range jumper, d) 3pointer. For made shots split into assisted and unassisted
Assists. Split for shot distance, whether the shot was an And1
Offensive/Defensive Rebounds. Maybe split into 2-3 different types depending on where the shot was coming from (FT, close range, etc.)?
Team Offensive/Defensive Rebounds
Steals. Maybe sperate those that led to a fast/easy buckets (look for layups/dunks <5 secs after steals)
Charges drawn
Turnovers. offensive 3sec, illegal screen, dribble TO, bad pass, offensive foul, what else?
Team turnovers: 24sec
Blocks, 2 types: a) defending team got the ball b) offense gets the ball back
Defensive goaltends
Defensive fouls: shooting, non-shooting (further split into: loose ball, defensive 3sec, standard personal, flagrant?)
Fouls drawn: shooting, non-shooting
And1s: split into different shot distances?
FTs, remove And1 FTs from the total number of FTs
If you think something important is missing from this list, tell me. Whatever it is, the information needs to be present in bbr PBP files
Made/Missed Shot from distance X, with 3-5 bins for distance, maybe: a) dunk/layup, b) close range, c) mid range jumper, d) 3pointer. For made shots split into assisted and unassisted
Assists. Split for shot distance, whether the shot was an And1
Offensive/Defensive Rebounds. Maybe split into 2-3 different types depending on where the shot was coming from (FT, close range, etc.)?
Team Offensive/Defensive Rebounds
Steals. Maybe sperate those that led to a fast/easy buckets (look for layups/dunks <5 secs after steals)
Charges drawn
Turnovers. offensive 3sec, illegal screen, dribble TO, bad pass, offensive foul, what else?
Team turnovers: 24sec
Blocks, 2 types: a) defending team got the ball b) offense gets the ball back
Defensive goaltends
Defensive fouls: shooting, non-shooting (further split into: loose ball, defensive 3sec, standard personal, flagrant?)
Fouls drawn: shooting, non-shooting
And1s: split into different shot distances?
FTs, remove And1 FTs from the total number of FTs
If you think something important is missing from this list, tell me. Whatever it is, the information needs to be present in bbr PBP files
Re: Ideas for data to extract from PBP
Information on shot clock remaining would be useful as well, if possible.
It certainly would be nice to get assist locations figured out; I did some research a while back indicating that some assists were far more valuable than others, to a large extent explaining Steve Nash's incredible offensive value shown by RAPM.
It certainly would be nice to get assist locations figured out; I did some research a while back indicating that some assists were far more valuable than others, to a large extent explaining Steve Nash's incredible offensive value shown by RAPM.
Re: Ideas for data to extract from PBP
Yeah. Good ideaDSMok1 wrote:Information on shot clock remaining would be useful as well, if possible.
You mean location of the passer of the shooter? bbr doesn't have location for either, but it has distance of the shooterIt certainly would be nice to get assist locations figured out;
For shotblocking, distance of the shot that was blocked might be useful
Re: Ideas for data to extract from PBP
No, I meant locations assisted to, which is what you had already mentioned. A pretty important piece of data. Assists leading to dunks are very valuable, assists leading to 20-footers not so much so.
Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Re: Ideas for data to extract from PBP
Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking dataDSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Re: Ideas for data to extract from PBP
Still, some sort of ID/size of the blockee for each block would be interesting.J.E. wrote:Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking dataDSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Re: Ideas for data to extract from PBP
How about 5 man units? I remember Ryan Parker doing this a few years ago, and it's probably how basketballvalue does it, too. I think I tried doing this over a weekend way back when (https://github.com/kpascual/nbascrape/b ... fiveman.py), but couldn't quite get it to work for all cases.
Re: Ideas for data to extract from PBP
You mean something like this http://stats-for-the-nba.appspot.com/PB ... br_ids.rar ?kpascual wrote:How about 5 man units?
See viewtopic.php?f=2&t=8033
Re: Ideas for data to extract from PBP
The close range or shortest mid range jumper dividing line is pretty important. Should the line be set at 3 feet, 4 or 5? I am not 100% fixed on any one of these numbers. Hoopdata uses 3 feet and 5 bins overall. I am fine with that approach but might tinker with it slightly and if you use less than 5 bins you will have to. A 4 foot shot seems more like an under 3 foot shot than the rest of the 3-9 foot category to me. The 5 foot shot is a tougher call.
Having a different data structure might be worthwhile. One alternative approach (just throwing it out there, not saying it is necessarily the best) would be to use: at the rim as 0-4, short jumper as 5- 10, longer mid-range jumper as 11-17, then 18-23. Comparison of the two data structures could possibly give even more fine detail depending on where the dividing lines are and if any are shared.
Having a different data structure might be worthwhile. One alternative approach (just throwing it out there, not saying it is necessarily the best) would be to use: at the rim as 0-4, short jumper as 5- 10, longer mid-range jumper as 11-17, then 18-23. Comparison of the two data structures could possibly give even more fine detail depending on where the dividing lines are and if any are shared.
Re: Ideas for data to extract from PBP
I'm working on this too with the ESPN shot data.DSMok1 wrote:Still, some sort of ID/size of the blockee for each block would be interesting.J.E. wrote:Unfortunately I can't tell for sure who is defending who(m?). I don't think I can reliably tell from the PBP whether it was On/Off-Ball either. Even when a C blocks a G it might have been On-Ball because of a P&R->Switch situation. Might be one of those situations where you need optical tracking dataDSMok1 wrote:Shot blocking--some sort of information on who the shooter was would be interesting. Is Serge Ibaka blocking Dirk (his man) or is he blocking a PG from the weakside? Trying to identify whether blocks are on or off ball would be interesting.
Re: Ideas for data to extract from PBP
Here's an assist profile for Rondo and Nash since '08. You need to subtract 1 from the X-Axis description to get the actual distance. "1" in the chart is "at rim", etc. If you combine "at rim" and "1 ft"("2" in the chart), I'd say they look almost the same. Makes me wonder how much I'd gain from doing assist splitting into shot distances
Re: Ideas for data to extract from PBP
They look way different to me! It's just you aren't using an ideal presentation type for comparison. I'd love to see them overlaid (line chart) with each one totaling to 100%. Rondo has way more mid-range assists.J.E. wrote:Here's an assist profile for Rondo and Nash since '08. You need to subtract 1 from the X-Axis description to get the actual distance. "1" in the chart is "at rim", etc. If you combine "at rim" and "1 ft"("2" in the chart), I'd say they look almost the same. Makes me wonder how much I'd gain from doing assist splitting into shot distances
Re: Ideas for data to extract from PBP
Click. I'd say Rondo's higher % of midrange assists can easily be explained by who he's playing with (Pierce, mostly)DSMok1 wrote:They look way different to me! It's just you aren't using an ideal presentation type for comparison. I'd love to see them overlaid (line chart) with each one totaling to 100%. Rondo has way more mid-range assists.
Re: Ideas for data to extract from PBP
Interesting. Pierce and (particularly) Garnett are both exceptionally good midrange shooters.
I think it's interesting data, but I'm not sure what to make of it!
I think it's interesting data, but I'm not sure what to make of it!
Re: Ideas for data to extract from PBP
Cool graph! (But I think you're talking about KG. In fact, it seems like it's all KG. Paul Pierce mostly helped himself.)