nba.com now has play by play data back to 1997

Home for all your discussion of basketball statistical analysis.
kpascual
Posts: 50
Joined: Thu Mar 01, 2012 7:02 pm

Re: nba.com now has play by play data back to 1997

Post by kpascual »

sbs wrote:Will some clever programming the boxscore range should be able to resolve all of the problems from the last names only in the play-by-play.
That's what I've been doing, but I wouldn't call it clever:
https://github.com/kpascual/nbascrape/b ... _nbacom.py
colts18 wrote:No problem. Will you be able to parse out the pbp data from 97-00?
I can't promise I have time to do it, but I bet EvanZ either has already done it or is doing it right now.
colts18
Posts: 313
Joined: Fri Aug 31, 2012 1:52 am

Re: nba.com now has play by play data back to 1997

Post by colts18 »

kpascual wrote:
colts18 wrote:No problem. Will you be able to parse out the pbp data from 97-00?
I can't promise I have time to do it, but I bet EvanZ either has already done it or is doing it right now.
What does the parsing out process entail? I have no knowledge on this so I want to know. How long does it usually take to parse out this data?
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: nba.com now has play by play data back to 1997

Post by EvanZ »

kpascual wrote:
I can't promise I have time to do it, but I bet EvanZ either has already done it or is doing it right now.
Haha. I'm now definitely considering it. :D

It would be great for my site to have all the data going back that far. Although, it could get rather expensive to host what would probably be about 10 GB worth of pbp and matchup data.
AcrossTheCourt
Posts: 237
Joined: Sat Feb 16, 2013 11:56 am

Re: nba.com now has play by play data back to 1997

Post by AcrossTheCourt »

I just wanted to see I'd love to see the end result of this work. You can estimate the plus/minus value of Bulls Jordan and get a more complete understanding of prime Shaq! That's a gold mine. I've always wanted to attempt an adjusted plus/minus model, but unfortunately I have too much real research work now. (If they pay you, you kinda have to do it.)
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: nba.com now has play by play data back to 1997

Post by DSMok1 »

EvanZ wrote:
kpascual wrote:
I can't promise I have time to do it, but I bet EvanZ either has already done it or is doing it right now.
Haha. I'm now definitely considering it. :D

It would be great for my site to have all the data going back that far. Although, it could get rather expensive to host what would probably be about 10 GB worth of pbp and matchup data.
The raw, parsed files could be hosted elsewhere (but you were referring to NBAWOWY?)
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
colts18
Posts: 313
Joined: Fri Aug 31, 2012 1:52 am

Re: nba.com now has play by play data back to 1997

Post by colts18 »

EvanZ wrote:
kpascual wrote:
I can't promise I have time to do it, but I bet EvanZ either has already done it or is doing it right now.
Haha. I'm now definitely considering it. :D

It would be great for my site to have all the data going back that far. Although, it could get rather expensive to host what would probably be about 10 GB worth of pbp and matchup data.

How does this parsing out process work? Is there some kind of code to make it work?


Does NBAWOwy extend to before this season? Could this raw parsed out 1997-2000 pbp be added to NBAwowy or do you need a different kind of pbp.
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: nba.com now has play by play data back to 1997

Post by EvanZ »

@Daniel, the way my site works is that the app is running off Nodejitsu and the database is running off MongoLab. The plan I'm on right now is 1 GB per month storage and that's about $10/mo. Not a big deal. But it goes up steeply from there. 4 GB per month is $40/mo. Every GB after that is another $11, so 10 GB would be over $100/mo. Since I don't make money off the site, that's $1200/yr coming out of my pocket. And for all I know the database might end up being around 15 GB to go back to '97.

@colts18, right now nbawowy is just this season. My play-by-play is coming from NBC Sports which was the cleanest, easiest source to parse that I could find (for example, providing full names on every play). Unfortunately, before I could scrape prior seasons from the site (which I think they had at one point), they took them all down (I'm assuming at the request of the NBA).
How does this parsing out process work? Is there some kind of code to make it work?
Yes, there is some kind of code. The primary challenge of parsing play-by-play is determining who is on the court. Using the NBC dataset, I've got an error rate that is very, very low. It's almost a perfect process with very little manual correction involved. Every day it takes me about 5 minutes to update the site. The one good thing about going back and dealing with old data, is that once you do it, you don't have to mess with it again. I'd love to add it to my site, but like Ken said, it's a matter of finding time.

Hey, Ken. There's a D3 meetup in SF tonight at Trulia. Any chance you're going? I'll be there.
colts18
Posts: 313
Joined: Fri Aug 31, 2012 1:52 am

Re: nba.com now has play by play data back to 1997

Post by colts18 »

EvanZ wrote:@Daniel, the way my site works is that the app is running off Nodejitsu and the database is running off MongoLab. The plan I'm on right now is 1 GB per month storage and that's about $10/mo. Not a big deal. But it goes up steeply from there. 4 GB per month is $40/mo. Every GB after that is another $11, so 10 GB would be over $100/mo. Since I don't make money off the site, that's $1200/yr coming out of my pocket. And for all I know the database might end up being around 15 GB to go back to '97.

@colts18, right now nbawowy is just this season. My play-by-play is coming from NBC Sports which was the cleanest, easiest source to parse that I could find (for example, providing full names on every play). Unfortunately, before I could scrape prior seasons from the site (which I think they had at one point), they took them all down (I'm assuming at the request of the NBA).
How does this parsing out process work? Is there some kind of code to make it work?
Yes, there is some kind of code. The primary challenge of parsing play-by-play is determining who is on the court. Using the NBC dataset, I've got an error rate that is very, very low. It's almost a perfect process with very little manual correction involved. Every day it takes me about 5 minutes to update the site. The one good thing about going back and dealing with old data, is that once you do it, you don't have to mess with it again. I'd love to add it to my site, but like Ken said, it's a matter of finding time.

Hey, Ken. There's a D3 meetup in SF tonight at Trulia. Any chance you're going? I'll be there.
How long does it take to parse out a season's worth of pbp data? Are you able to do an APM on that data?
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: nba.com now has play by play data back to 1997

Post by EvanZ »

colts18 wrote:
How long does it take to parse out a season's worth of pbp data? Are you able to do an APM on that data?
The short answer is it doesn't hardly take any time at all once you've written the code. The longer answer is it takes a lot of time to write the code.

And yes, once you do that, you can calculate APM or RAPM or whatever.
kpascual
Posts: 50
Joined: Thu Mar 01, 2012 7:02 pm

Re: nba.com now has play by play data back to 1997

Post by kpascual »

EvanZ wrote: Hey, Ken. There's a D3 meetup in SF tonight at Trulia. Any chance you're going? I'll be there.
Dammit, I missed it. I was signed up to go, but forgot I had a rec league basketball game. But yeah we should actually hang out sometime. Sloan conference? Other meetups?
EvanZ wrote:
colts18 wrote:
How long does it take to parse out a season's worth of pbp data? Are you able to do an APM on that data?
The short answer is it doesn't hardly take any time at all once you've written the code. The longer answer is it takes a lot of time to write the code.

And yes, once you do that, you can calculate APM or RAPM or whatever.
This. The primary constraint isn't time, it's usually effort.
colts18
Posts: 313
Joined: Fri Aug 31, 2012 1:52 am

Re: nba.com now has play by play data back to 1997

Post by colts18 »

EvanZ wrote:
colts18 wrote:
How long does it take to parse out a season's worth of pbp data? Are you able to do an APM on that data?
The short answer is it doesn't hardly take any time at all once you've written the code. The longer answer is it takes a lot of time to write the code.

And yes, once you do that, you can calculate APM or RAPM or whatever.
Do you think you would be able to do it? Or is it too hard?
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: nba.com now has play by play data back to 1997

Post by EvanZ »

It's not about whether it's "hard". For me it's just getting the time to do it. Maybe, maybe not. Can't guarantee anything.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: nba.com now has play by play data back to 1997

Post by J.E. »

If someone converts the data into text files with the format

gameid TAB linenumber TAB time TAB [team_id] description

I can take a crack at it using the parser I wrote for bbr PBP


(I don't want to do the html-> text conversion because, as I said, I think at some point the PBP will appear on bbr, for which I already have a crawler/converter)
colts18
Posts: 313
Joined: Fri Aug 31, 2012 1:52 am

Re: nba.com now has play by play data back to 1997

Post by colts18 »

J.E. wrote:If someone converts the data into text files with the format

gameid TAB linenumber TAB time TAB [team_id] description

I can take a crack at it using the parser I wrote for bbr PBP


(I don't want to do the html-> text conversion because, as I said, I think at some point the PBP will appear on bbr, for which I already have a crawler/converter)
How long would it take to do that for the 4 seasons? If its not too long, and someone taught me how to do it, I guess I could try.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: nba.com now has play by play data back to 1997

Post by J.E. »

Python with urllib is a good place to start.
Further, you could use Python's Beautifulsoup or go the laborious way with string.split and string.replace
Post Reply