Public basketball data and analysis github
Posted: Sun Nov 11, 2012 11:49 pm
I've opened up a subset of my basketball github to the public:
https://github.com/galizur/basketball-public
The code is in R, Ruby, Bash and SQL and presumes a PostgreSQL database. This isn't meant to be anything special or cutting edge, just a little introduction to some data science using basketball. My day job is doing quantitative analysis for the San Diego Padres.
It includes basic college and pro game and performance data, basic feature detection for predicting pro performance from college performance, power rankings for amateur and pro teams. The power rankings for college teams pool teams within divisions, then measure pool strength from 2002-2012. This allows you to measure relative team strength between D1, D2 and D3 and to also estimate all of the NCAA on the same scale. Also included are home/away factors, plus a demonstration that distance traveled by teams impacts performance. Distances are calculated using Yahoo's PlaceFinder API, geocoding cities, then computing great circle distance between cities. There are lots of obvious improvements that can be made.
I've included play-by-play data for 10000 NCAA games in XML, plus parsed versions in CSV files. I haven't done anything with this data yet, however.
My Twitter:
https://twitter.com/octonion
My (sometimes) blog:
http://angrystatistician.blogspot.com
-Chris
https://github.com/galizur/basketball-public
The code is in R, Ruby, Bash and SQL and presumes a PostgreSQL database. This isn't meant to be anything special or cutting edge, just a little introduction to some data science using basketball. My day job is doing quantitative analysis for the San Diego Padres.
It includes basic college and pro game and performance data, basic feature detection for predicting pro performance from college performance, power rankings for amateur and pro teams. The power rankings for college teams pool teams within divisions, then measure pool strength from 2002-2012. This allows you to measure relative team strength between D1, D2 and D3 and to also estimate all of the NCAA on the same scale. Also included are home/away factors, plus a demonstration that distance traveled by teams impacts performance. Distances are calculated using Yahoo's PlaceFinder API, geocoding cities, then computing great circle distance between cities. There are lots of obvious improvements that can be made.
I've included play-by-play data for 10000 NCAA games in XML, plus parsed versions in CSV files. I haven't done anything with this data yet, however.
My Twitter:
https://twitter.com/octonion
My (sometimes) blog:
http://angrystatistician.blogspot.com
-Chris