Page 1 of 1

Data of players on court

Posted: Mon Apr 29, 2013 2:49 pm
by liurical
Hi all

First time posting here! I've been writing code to scrape and organize NBA PBP data and have a problem that I'm not sure how to solve. How do I determine the 10 players on the court through the course of the game? I can get the starting lineup from the boxscore, but it's difficult to determine who the coaches placed in the lineup beginning of each quarter. For people who have solved this problem, did you just deduce by the subsequent player names in the plays and substitutions? Or is there a cleaner more efficient way?

Any advice would be greatly appreciated!

Re: Data of players on court

Posted: Mon Apr 29, 2013 4:17 pm
by DSMok1
liurical wrote:Hi all

First time posting here! I've been writing code to scrape and organize NBA PBP data and have a problem that I'm not sure how to solve. How do I determine the 10 players on the court through the course of the game? I can get the starting lineup from the boxscore, but it's difficult to determine who the coaches placed in the lineup beginning of each quarter. For people who have solved this problem, did you just deduce by the subsequent player names in the plays and substitutions? Or is there a cleaner more efficient way?

Any advice would be greatly appreciated!
First of all, use substitutions and then track from those substitutions each way. If a player never was substituted for in a quarter, check the PBP for a player appearing there that isn't already tracked. And finally, if a player both played the entire quarter and never showed up in the PbP for that quarter, check manually vs. box score for the game for a player missing 12 minutes.

As far as I know, that's about the only way to do it.

Re: Data of players on court

Posted: Mon Apr 29, 2013 5:09 pm
by v-zero
It's an incredible PITA to get very good accuracy. Unless you know or are learning some bayesian modelling techniques it will be a very unrewarding exercise. The box-score isn't solved yet, so I would suggest starting there.

Re: Data of players on court

Posted: Mon Apr 29, 2013 5:37 pm
by liurical
DSMok1 wrote:
First of all, use substitutions and then track from those substitutions each way. If a player never was substituted for in a quarter, check the PBP for a player appearing there that isn't already tracked. And finally, if a player both played the entire quarter and never showed up in the PbP for that quarter, check manually vs. box score for the game for a player missing 12 minutes.

As far as I know, that's about the only way to do it.
It's great to get confirmation on the method, really couldn't think of another way!

Re: Data of players on court

Posted: Mon Apr 29, 2013 5:45 pm
by liurical
v-zero wrote:I use the PBP, Plus-Minus (pictorial variant) and box score pages from BBR to get a virtually perfect dataset. It's an incredible PITA to get very good accuracy. Unless you know or are learning some bayesian modelling techniques it will be a very unrewarding exercise. The box-score isn't solved yet, so I would suggest starting there.
I understand, thank you for your advice as well.