SQL, Databases, and Basketball Stats

Home for all your discussion of basketball statistical analysis.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: SQL, Databases, and Basketball Stats

Post by mystic »

A mod just seperated this from the original thread. A reasonable decision, because the talk about SQL etc. pp had little to do with the Rockets job offer, wouldn't you agree?
bbstats
Posts: 227
Joined: Thu Apr 21, 2011 8:25 pm
Location: Boone, NC
Contact:

Re: SQL, Databases, and Basketball Stats

Post by bbstats »

Ah I didn't realize I created a fuss. Didn't pay attention 'til it had been relocated. Carry on!
JohnHasADHD
Posts: 21
Joined: Wed Feb 15, 2012 2:16 am

Re: SQL, Databases, and Basketball Stats

Post by JohnHasADHD »

I"m a bit late - but if you want to do data anlaysis of any significance, I would think SQL would be vital and important - and it's not just the concept of creating the databases.

When you know your database (or create it) you can get the data you want when you want it how you want it.

I have been working slowly on getting shot information and boxscore information for every player and every game this off season, so far I believe I've successfully downloaded the stats for the last two NBA seasons (though I might have to tweak the shot information to indicate at what point in the quarter the shot was taken)

As lark, and a sixers fan, I wanted to take a look at 4th quarter shots last season...Louis Williams really took a lot of them so I was just curious who might be taking them this year and wanted to see how it had happened last year. If you don't know SQL - how would you do that quickly and easily?

In this day and age with the volume (and granularity) of data, you can't do advanced statistical analysis without know SQL.

Of course you need to know other stuff too, I can normalize the crap out of any data, query any database to give me exactly what I want, but the anlysis portion of it still comes slow to me, advanced statistics is where i need to advance, but you definitely need to know SQL
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: SQL, Databases, and Basketball Stats

Post by mystic »

JohnHasADHD wrote: As lark, and a sixers fan, I wanted to take a look at 4th quarter shots last season...Louis Williams really took a lot of them so I was just curious who might be taking them this year and wanted to see how it had happened last year. If you don't know SQL - how would you do that quickly and easily?
Klick me!

Took me about 10 sec to get that information. The good thing today is that bbref is actually knowing how to handle a database, thus, in order to get information like you desired, you really don't need to know SQL.
JohnHasADHD
Posts: 21
Joined: Wed Feb 15, 2012 2:16 am

Re: SQL, Databases, and Basketball Stats

Post by JohnHasADHD »

Interesting little toy - not exactly what i looked at when i was looking at my data, but interesting none the less
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: SQL, Databases, and Basketball Stats

Post by v-zero »

John, I don't accept your argument. Yes databasing is very useful for data mining, but the fact of the matter is that you can mine data without proper databasing - however I agree that it is optimal for that purpose. However, what if you have almost no interest in data mining? Believe what you will but I think you'll find that you struggle with analysis because you are blinded by data. Data mining is very prone to over-fitting issues, and beyond that is often little more than pattern detection. Proper analysis requires a lot more than a nice SQL database with lots of numbers. Analytics should be all about building a suitable frame into which to slot those numbers and bring forth some genuine insight.

Anyway, I am waffling.
JohnHasADHD
Posts: 21
Joined: Wed Feb 15, 2012 2:16 am

Re: SQL, Databases, and Basketball Stats

Post by JohnHasADHD »

Well, I don't struggle with analysis because I am blinded by data, I struggle with analysis because while teaching my self databases and ruby (but not perl) comes easy to me, teaching myself statistical analysis comes harder for me.

The reason one might need SQL for a job like advertised (as the original post asked) is that if you don't know sql you might not as easily work with the raw data that the job provides you. It's possible that the job listed required not only analysis of the data but organization of the data to make it able to be analyzed by the whole statistical department. Without a SQL background organizing the raw data can be rather difficult (I know you can do a lot with excel and vbscript but I've never found it easier than with a database, the ONLY time i've built a complex excel workbook over a database is in realation to my day job because I don't know how to program in the native language for Microsoft Dynamics - and building a production worksheet based on past and projected sales and current raw and finished good inventory just worked better in excel - but at the same time there's nothing really advanced in the 15 worksheets involved calculation wise, just a lot of cascading calculations
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: SQL, Databases, and Basketball Stats

Post by v-zero »

Yeah, I agree with all that. I just didn't like what seemed to be a "you can't do analysis without SQL" suggestion.
mikez
Posts: 32
Joined: Fri Apr 15, 2011 3:11 am

Re: SQL, Databases, and Basketball Stats

Post by mikez »

SQL's nice because it's widely known/used/accepted, but it's not the only environment in which you can operate a database. We found it easiest to find a good person to develop (and then, later, operate/manage/continue to develop) our initial database in SQL vs other alternatives, and I imagine Houston did the same when Daryl left here to go there.

But you could, for example, use 4D, some NoSQL system, or some other environment to build a big database - it's just harder to find coders in other environments, especially when you consider that your database may need to interface smoothly with external vendors. SQL (in whatever flavor) is simply just a more widely used/understood option, though as Evan notes this may be changing.

All this, of course, is assuming you actually need to store large amounts of data in (and run many varied queries from) some sort of non-flat-file format. Many analyses we do work perfectly fine with easy-to-acquire data stored in flat delimited files or even directly in Excel, etc. So for the purposes of the majority of non-professional analysts, SQL may not ever be needed, especially given the excellent resources (e.g. basketball-reference, basketballvalue, etc.) now available online that were not available when we first started doing this stuff.

In addition, plenty of people analyze data provided to them by their database people without actually knowing any SQL. This is true both in and out of the NBA - in many industries I imagine it describes the vast majority of analysts. It was certainly true in the consulting firm where I used to work; we just had a database guy who would get us files that we then could analyze in SAS or SPSS or Excel or whatever else we were using on a particular project.

There's no question, though, that in nearly any analytical field, employment is more likely if you're familiar with the database resources used by your potential employer. For the majority of NBA teams doing this stuff, for various reasons, it's probably some flavor of SQL.

-MZ
kpascual
Posts: 50
Joined: Thu Mar 01, 2012 7:02 pm

Re: SQL, Databases, and Basketball Stats

Post by kpascual »

I believe SQL is absolutely valuable, though not essential, for doing data analysis. It really depends on use case: if you're working on data sets where the # of records is around the thousands or hundreds of thousands of records, Excel is just fine. Same if your analysis has relatively narrow or well-defined scope.

You start to see the value of SQL when your data doesn't fit into 2 dimensional squares, or more accurately, tends to be more about relationships across objects (hence the term "relational database"), or when you don't necessarily know what you're trying to solve.

I'm personally not a huge fan of the NoSQL movement as it stands now. I dabbled with Mongo when determining the backend of my vorped website, but just thought a relational DB was the overall a better option. At my prior job, I found that even when using the newfangled Hadoop map/reduce technologies, I kept wanting a SQL interface instead of the pseudo-scripting language they created. In the end, data analysis is about getting data and (hopefully) answers as painlessly as possible, and I find SQL to be the least painful interface to doing so. But YMMV.

For background, I've been doing data analysis for various Silicon Valley companies over the past few years, and it's not a stretch to say I speak more SQL than English day-to-day. In fact my basketball site is really just an exercise in data warehousing (fancy term for a big database). I'd be happy to share any knowledge or do a write-up if anyone's interested in expanding into this realm.
Crow
Posts: 10533
Joined: Thu Apr 14, 2011 11:10 pm

Re: SQL, Databases, and Basketball Stats

Post by Crow »

When will team and player pages update at vorped.com?

I hadn't noticed this before: http://vorped.com/bball/index.php/referee

Have you (or anyone else) done anything with RAPM or APM factors in relation to other stats in SQL?

I alluded to it before but if one thinks the stats are inteconnected then I would think it would be helpful to work in a relational database over a non-relational database, but am I being naive / overly simplistic in that view?
kpascual
Posts: 50
Joined: Thu Mar 01, 2012 7:02 pm

Re: SQL, Databases, and Basketball Stats

Post by kpascual »

ESPN decided to eff with me and not provide play-by-play data for certain games. Since my shot charts are actually tied to play-by-play events, things kind of went to hell, thus I've spent the last 2 weeks integrating NBA.com play by play so it won't happen until NBA.com messes with me.

TL;DR Should be fixed by the weekend.

I haven't personally done any RAPM stuff, since it seems many here are doing good work in that area.

I think a relational database can help in so many ways, with the most prominent being in organizing and ensuring the accuracy of your data.

I sense an aversion to using databases, and I just don't get it... it might be an aversion to doing command-line-y things or to writing "real" code, or to learning how exactly to install a database on your computer (which to be fair can be a huge PITA).

But if you have the slightest curiosity about SQL/databases, I say dive in headfirst... it'll be time well-spent. It's a worthwhile skill to have if you want to do data analysis for a living.
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: SQL, Databases, and Basketball Stats

Post by DSMok1 »

kpascual, if you could write/direct folks to a good tutorial for getting started in SQL for sports stats, I'm sure you'd get a lot (okay, maybe an overstatement) of interested readers.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
Crow
Posts: 10533
Joined: Thu Apr 14, 2011 11:10 pm

Re: SQL, Databases, and Basketball Stats

Post by Crow »

Thanks Ken for the reply.

I have been able to do most of what I wanted in Excel & Access to date. I was intimidated by the command line syntax; but, as I said earlier, after looking through some SQL books I have overcome that initial intimidation. I will probably take a course on it / read more in the future. I will still need a database and research questions that require it to get more experience but those can be assembled. I tend to be more interested in season stats than play by play details but I can certainly see that the play by play level analysis would be important to be able to do.
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: SQL, Databases, and Basketball Stats

Post by EvanZ »

sqlfiddle is a nice playground for experimenting/learning SQL without even having to install your own server (which is not a big deal, but anyway).

Like I mentioned earlier, that Coursera database course can really get a person quickly up to speed with the basics of SQL/relational theory.

Related to the play-by-play issues, does anyone know if Aaron B. is done with basketball-value? If so, I need to write my own parser as well.
Post Reply