Page 1 of 1

statistical formulas

Posted: Thu Oct 13, 2011 11:51 pm
by dougthonus
I've been working on trying to practice some of my development skills, so I decided a good project to work on would be to make an advanced statistical website for a wide variety of basketball leagues [NBA, NCAA, DLeague, and the major foreign leagues].

I have game log and season total data available, and I was wondering if there is a list of statistical formulas for all of the various metrics out there and how to calculate them with that data. I've found some of the formulas by looking around the web, but I wasn't sure if anyone had a comprehensive list of things worth calculating and the formulas for them.

Re: statistical formulas

Posted: Fri Oct 14, 2011 2:54 am
by Crow
Hey.

Here are some spots you can check:

http://www.bepress.com/jqas/vol3/iss3/1/

http://www.basketball-reference.com/about/glossary.html

http://www.nbastuffer.com/component/opt ... /catid,41/

http://www.nbastuffer.com/component/opt ... /catid,42/

http://hoopdata.com/advancedstats.aspx
http://hoopdata.com/teamff.aspx
(click on the black glossary bar to see a formula for a stat)


I assume you have / will look at what other sites already have and maybe try to fill in gaps they miss or organize it differently.

Some other sites, beyond the most obvious, include

http://www.hoopsstats.com/

http://www.draftexpress.com/stats.php
(Did you have a hand in developing those? If so, what do you want to do differently?)

http://queencityhoops.com/playerPage.ph ... Glen+Davis

http://kenpom.com


The leagues beyond the NBA and NCAA have advanced stats available but they could use someone making effort to highlight advanced stat leaders.

Putting Regularized APM http://stats-for-the-nba.appspot.com/ and traditional APM http://basketballvalue.com/topplayers.p ... =2010-2011 side by side with each other and other advanced stats would be handy and has not been done yet anywhere on a regular, comprehensive basis to my knowledge. I'd go to a site that had all of them in one place, especially if you had back years too.

If you really wanted to dig deeper, you could look around this site and the sites of people here and elsewhere and add rarer advanced formulas to the summary- things like EWins, EZPM, WARP, Lambda PM, Advanced SPM and other formulas. That could help elevate their public visibility and study of them and comparison with other better-known metrics.

I'd be interested in hearing more about what you plan to do (and any more detailed questions you may might want user feedback on) and also look forward to seeing it. I get the impression you are very well equipped for the project.

Re: statistical formulas

Posted: Fri Oct 14, 2011 4:34 am
by dougthonus
First, thanks for all the links. I've seen some of them, but not all of them, so I will check them all out.

I did develop draftexpress's stats. Without getting into too long of a story about my personal journey as a developer, let's just say the code used to create them is about 8 years old and horribly designed in c++ from visual studio 6.0. It is procedural and doesn't use regex, so it's just really difficult to update, maintain, and add new things.

I've always wanted to redo the stats, but I no longer work there [too difficult with my day job]. I recently transitioned to be a c sharp developer, but I don't have a c sharp background and have learned mostly on the fly through google search and what not. I've gotten fairly adequate at c sharp, but I thought when tooling around with it that it would help to create a large scale project that manages back end code, database, and web interface all in one.

Since I'm familiar with basketball stats and see a lot of holes in what is presently available, I thought i'd try my hand at writing something along those lines.

I basically want to create something like basketball reference, but with an emphasis on foreign leagues and more advanced metrics included. My new design should allow me to add as many foreign leagues as I can find stats for pretty easily. (I estimate once the initial design phase is done that I can add a league in about 2-3 hours). Once I have the formulas in place I can easily put the advanced stats on all leagues.

Anyway, since most of the work is in grabbing the raw data, parsing it, normalizing it, and databasing it, I figured that I'd try and create as many stats as can be calculated then create a web interface that allows you to highly customize the output (select whatever fields you want, compare players across leagues, whatever).

I might do some advanced play by play stuff for NCAA/NBA since I can get at play by play data for it, but I'm not sure how much interest I have in that in having worked with pbp data before I've found it to be highly unreliable and a difficult process to try and work around the massive amount of data entry errors. I have always wanted to create a more dynamic version of 82games.com which I love, but doesn't give you an easy way to compare all their great stats.

I don't have any specific goals at this point really. I don't know that I'll create anything that hasn't already been created, nor am I certain that I have the time to bring the whole project to a conclusion that will allow me to share the output. I've got basically a half hour or so a night to work on it, and it can be tough to maintain motivation. However, I figure I might as well aim to create something awesome rather than think small and see where it goes.

Long term, I'd like to experiment with regression software as well to try and compare players across leagues and see the meaning of various stats and how they translate to success when guys cross leagues and possibly try to correlate various stats to teams winning as well. That's more of a side project I'd only take on in a best case scenario where the above is completed though.

Re: statistical formulas

Posted: Fri Oct 14, 2011 5:16 am
by Crow
"I have always wanted to create a more dynamic version of 82games.com which I love, but doesn't give you an easy way to compare all their great stats."

A lot of people have wanted easier comparative tools for that data. That sounds like a real promising direction. Seeing top and bottom 25s, means, medians & standard deviations, player comparisons etc. Basketball-reference has a number of good & flexible tools. Finders applied to other data would be a welcome addition.

Advanced play by play stuff would be great too, though I can understand it might be in phase 2 or 3.

Good luck and hope it is fun & rewarding for you.

Re: statistical formulas

Posted: Fri Oct 14, 2011 2:00 pm
by Mike G
dougthonus wrote: I basically want to create something like basketball reference, but with an emphasis on foreign leagues and more advanced metrics included. ..
The Advanced Statistics area at b-r.com hasn't really advanced much in several years. A bunch of us here have had suggestions that have gone unheeded. A rival site might be just what is needed to either inspire them to get on the ball, or to just redefine 'advanced'.

So I encourage you to create an alternative user-oriented stats interface. Hopefully you'll use this forum for feedback and reference.

Re: statistical formulas

Posted: Sat Oct 15, 2011 6:42 pm
by dougthonus
I appreciate the encouragement.

It's always easier to stay on the ball if you feel there is value in what you intend to do. I'll certainly be looking for feedback and suggestion as well as help with formulas and such as I make progress.

For whatever reason I decided to start with the French PROA league as my initial testbed of data, but once I get all the base data parsed I should add new leagues fairly quickly.