Page 1 of 1

Play by Play JSON Feeds

Posted: Thu Jan 30, 2014 3:05 am
by nileriver
After several posts asking about getting play by play data, I have been learning how to do web scraping. In another thread I found a link to a Sports Illustrated JSON feed (http://data.sportsillustrated.cnn.com/j ... yplay.json). I was wondering what other JSON or XML feeds exist that others use to get data.

Re: Play by Play JSON Feeds

Posted: Thu Jan 30, 2014 4:56 pm
by kohanz
NBA.com/stats has JSON feeds for their data as well. If you learn to use the developer tools in Chrome to sniff out what data is being loaded for a given page (already described in another thread on this topic), discovering JSON feeds becomes easier.

However, the main thing I want to post is for people to try and use these feeds as responsibly as they can. Don't hammer the feed with tons of requests. Once a game is finished, the PBP data isn't going to change, so for example in my app (not yet released), once the game is over, I never hit that feed again. During games, I might query it every 5 minutes at most, which is reasonable.

Re: Play by Play JSON Feeds

Posted: Thu Jan 30, 2014 6:58 pm
by nileriver
I definitely agree about being responsible with the amount of requests you give a server. I have been using Firebug in Chrome to look at the structure of various websites (NBA, ESPN, and SI) and was hoping for others to share their experiences.

Re: Play by Play JSON Feeds

Posted: Sat Feb 01, 2014 7:15 am
by EvanZ
kohanz wrote: However, the main thing I want to post is for people to try and use these feeds as responsibly as they can. Don't hammer the feed with tons of requests. Once a game is finished, the PBP data isn't going to change, so for example in my app (not yet released), once the game is over, I never hit that feed again. During games, I might query it every 5 minutes at most, which is reasonable.
Their servers should be setup to handle thousands of requests per second. I don't think anyone should worry about their scraper being mistaken for a DoS attack.

Re: Play by Play JSON Feeds

Posted: Sat Feb 01, 2014 8:15 pm
by nileriver
EvanZ wrote:
kohanz wrote: However, the main thing I want to post is for people to try and use these feeds as responsibly as they can. Don't hammer the feed with tons of requests. Once a game is finished, the PBP data isn't going to change, so for example in my app (not yet released), once the game is over, I never hit that feed again. During games, I might query it every 5 minutes at most, which is reasonable.
Their servers should be setup to handle thousands of requests per second. I don't think anyone should worry about their scraper being mistaken for a DoS attack.
It is always a best practice not to put an unnecessary load on a server no matter how small. If you have previously scraped the information, there should be no reason to hit the page again. Taking the time to make sure your code is not running redundant tasks is important. We discussed the implications on the server, but it will also slow down performance when running the code. I agree that the impact would be negligible on the servers that the NBA or ESPN has. However, you should grateful that they provide this information and respectful in the way in which you grab that information.

Re: Play by Play JSON Feeds

Posted: Mon Feb 03, 2014 9:30 pm
by kohanz
EvanZ wrote:
kohanz wrote: However, the main thing I want to post is for people to try and use these feeds as responsibly as they can. Don't hammer the feed with tons of requests. Once a game is finished, the PBP data isn't going to change, so for example in my app (not yet released), once the game is over, I never hit that feed again. During games, I might query it every 5 minutes at most, which is reasonable.
Their servers should be setup to handle thousands of requests per second. I don't think anyone should worry about their scraper being mistaken for a DoS attack.
My recommendation is not centered around a concern for DoS. The website being scraped generally pays considerable amounts for that data, and I wouldn't recommend someone drawing attention to themselves re-purposing that data for their own projects. I mean, if it's just for hobby analysis at home, that's fine, but I think websites such as vorped and nbawowy (which, Evan, I'm a big fan of) start to enter a bit of a grey area. As long as they don't have an amount of traffic that makes the bigger sites notice and as long as they have no monetization, I think they'll be fine, but it's basically a case of not being noticed. I also don't think the sellers of that type of data would be thrilled to find escalating amounts of scraping.