the state of APBR

Home for all your discussion of basketball statistical analysis.
ampersand5
Posts: 262
Joined: Sun Nov 23, 2014 6:18 pm

the state of APBR

Post by ampersand5 »

The purpose of this thread is partially educational, partially documentary and partially to clarify.

Throughout the past two years, there have beens several occasions where I thought that RAPM was going to be lost forever. The first time I think was when Jerry hadn't updated his website for a while and I just assumed he started working for an NBA team and would delete his site. More recently is when RPM appeared on ESPN and I thought he may have sold the stat or himself to ESPN and would be done with his website. This community is filled with websites and posters suddenly missing, never to be heard from again. Maybe its because I've been involved in the academic world, but I view the stuff being worked not only as an important advancement in our collective understanding of basketball, but an important educational tool.

I will focus on xRAPM specifically because of its academic-style history and its value as a stat.

Adjusted Plus Minus was first implemented in WINVAL by Wayne Winston (business school professor) and Jeff Sagarin. This was followed by Dan Rosenbaum (economics professor) publishing the first paper on the metric and advancing it. Then came along Joe Sill, a caltech PHD and former Nasa analyst who authored an award winning paper on the idea of taking the adjusted plus minus numbers and running them through through a Tikhonov regularization. Jeremias Engelmann took this idea and started using priors and adjusted the stat with a number of factors including the score of a basketball game, aging curves, coaches and others.

So many people have been involved with the creation of XRAPM and it would be extremely unfortunate if it were to disappear.
I realized that despite how many people rely on the stat, how little we really understand.

If Jerry's site were to be taken down, could someone else recreate it? Does everyone understand whats actually in xrapm when making analysis?
I personally couldn't do any of those things.

To get to the point - I wanted this board to create an explainer or a guide on how someone could the same Xrapm numbers on their own. I view this to be important for several reasons. Most importantly, so we never lose it.
Aside from this, I find that by laying out clear instructions, we can better understand exactly what is in xrapm so we can make better criticisms of the stat leading to improvements. Moreover, I view this as a great opportunity to educate and involve more fans in the analytics movement. Most people want to engage in the world of stats, but do not have the skills to manipulate data and create metrics. By giving people the ability to create and work with data on their own, it will make it easier for newcomers to get involved in the APBR community. Additionally, it will broaden the user base of people doing analysis outside of people who are great at math/excel/programming.

Lastly, I view this as a great opportunity to document whats going on in this community at such an important time.

I understand that most people here are experts at what they do and don't care about this sort of stuff, but I hope some of you will find this worthwhile to participate in.

Please share any thoughts you have on the current state of the APBR community/analytics movement, the documentation of XRAPM or anything else.

Thanks!
permaximum
Posts: 416
Joined: Tue Nov 27, 2012 7:04 pm

Re: the state of APBR

Post by permaximum »

I couldn't agree more as an amateur who doesn't know much about statistics or math. I just have a vision on what should be possible with the data we have today and I learn everything myself through testing stuff on real-world, reading papers written in different languages etc.

As for xRAPM, noone save J.E. himself can reproduce the metric. I believe the metric is at the point that even J.E. can't be 100% sure if "some stuff" is in it or not.
ampersand5
Posts: 262
Joined: Sun Nov 23, 2014 6:18 pm

Re: the state of APBR

Post by ampersand5 »

the lack of responses in this thread is disheartening. The ability of this community to include outsiders, encourage full and frank disclosure of work to encourage criticism, preserve and share work are all important.
Mike G
Posts: 6175
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: the state of APBR

Post by Mike G »

We do get and give plenty of critical comments here. It's always best to put forth theories on this forum. Eventually, your theories will get run through the mill here, so you can't avoid it forever.

People come and go all the time. I don't expect any different, and often I forget that so-and-so was once a regular poster here. Meanwhile, one should just keep on keepin' on, or not.

Rather than be disheartened, maybe rephrase what it is you want to see discussed. Put an edge or hook on it, if you must.
ampersand5
Posts: 262
Joined: Sun Nov 23, 2014 6:18 pm

Re: the state of APBR

Post by ampersand5 »

Mike G wrote:We do get and give plenty of critical comments here. It's always best to put forth theories on this forum. Eventually, your theories will get run through the mill here, so you can't avoid it forever.

People come and go all the time. I don't expect any different, and often I forget that so-and-so was once a regular poster here. Meanwhile, one should just keep on keepin' on, or not.

Rather than be disheartened, maybe rephrase what it is you want to see discussed. Put an edge or hook on it, if you must.
Thanks.

Essentially, it boils down to this.

People with the requisite skills and knowledge to program and create their own data/metrics are fine. These people can figure things out on their own, start from scratch, carry on their own projects and occasionally, discuss their work on here.

The issue is that there are a huge amount of people interested in basketball statistics who don't have that knowledge.
Mike G
Posts: 6175
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: the state of APBR

Post by Mike G »

Maybe I'm just jaded, but why is that a particular "issue"? It's true of any area of endeavor: You start with curiosity, tinker with numbers, see what seems to work.

It's a millionth of the work it was when I started tinkering in the '80s. Had to grab the Sporting News at the end of the NBA season and hand calculate from that printed text. Maybe the trouble with starting today is that there are so many ready sources of both raw totals and a bunch of derived metrics. It can be intimidating, but that's how it is at first.

Maybe it's also just too obvious: This is APBR Metrics. Ask some questions, and you will get some answers.
AcrossTheCourt
Posts: 237
Joined: Sat Feb 16, 2013 11:56 am

Re: the state of APBR

Post by AcrossTheCourt »

I think this is bogus. We saw what happened when something similar happened. When JE took down the "pure" RAPM (plus/minus only) stats and when he was inactive for a while (working for a team, I assume), people stepped up and put out RAPM ratings. v-zero did it for one season, shutupandjam has every season up on his site, and talkingpractice has recent seasons up on gotbuckets.

We know JE's general methodology and while we may not be able to replicate it exactly, we could get pretty close. Plus, I'm sure there are others with their own hybrid models like IPV from talkingpractice.

There's a lot of basketball discussion that goes on beyond this place and if we should worry about anything, it's the cold-war era in stats in front offices where (up to) 30 different stats departments compete with each other and share little, if any, public information.
ampersand5
Posts: 262
Joined: Sun Nov 23, 2014 6:18 pm

Re: the state of APBR

Post by ampersand5 »

AcrossTheCourt wrote:I think this is bogus. We saw what happened when something similar happened. When JE took down the "pure" RAPM (plus/minus only) stats and when he was inactive for a while (working for a team, I assume), people stepped up and put out RAPM ratings. v-zero did it for one season, shutupandjam has every season up on his site, and talkingpractice has recent seasons up on gotbuckets.

We know JE's general methodology and while we may not be able to replicate it exactly, we could get pretty close. Plus, I'm sure there are others with their own hybrid models like IPV from talkingpractice.

There's a lot of basketball discussion that goes on beyond this place and if we should worry about anything, it's the cold-war era in stats in front offices where (up to) 30 different stats departments compete with each other and share little, if any, public information.
thanks for the response. I think your last point is bang on - and a large factor in why I wish more information was public.
Crow
Posts: 10624
Joined: Thu Apr 14, 2011 11:10 pm

Re: the state of APBR

Post by Crow »

The bulk of the original post focused on RPM / XRAPM. The logical responder would be JE. If he doesn't want to respond or can't right now, that's that. I recall he indicated he'd reveal his methods in more detail once, but that was before the ESPN relationship. Maybe someday. For now, there is the history of clues in past threads. Has the OP really dug into that? (Daniel and I both made pretty big efforts to preserve the past after a sabotage hack, but it is unclear to me how much newcomers these days do in terms of reading the archives.)

I deferred from saying anything further about the general state of APBRmetrics so far, in part because I've addressed it a number of times before; and few others have spoken up about that in the past too, so try not to take the low degree of response too personally. The simple answer is that it is what it is, in the context of many now working for teams and others being more guarded even before getting there and many probably being dissatisfied for one reason or others.

There was talk years ago by a few of a more active analytic association beyond bulletin board level, but no action that I am aware of. Sloan took a lot of that energy and serves some purposes. There are a few other academic conferences and one perhaps to be two new analytic journals as of next year with some basketball content. There are other resources and much of the community primarily operates outside this hub. There is twitter for new work notification and one to one or small group talk, mostly pretty casual but sometimes substantive. This board can still serve a role not filled by the alternatives but how much is dependent on individuals. It may not be as much as it once was or could still be, but talking about the state seems to yield little change. The discussion of basketball analytics itself is the goal and the main motivator. All you or anyone else can do is try to make it what you want and see what happens... and then decide what to do next.
Crow
Posts: 10624
Joined: Thu Apr 14, 2011 11:10 pm

Re: the state of APBR

Post by Crow »

In the almost 4 years since the restart of the board after the hack, there have been about 400 people who have made at least one comment. About 100 have made twenty or more. Not bad, could be better or both.

This past month there might have been about 500 comments total or under 20 per day. Probably mostly from ten people. If 50, 100 or more were more active? Maybe there would be more learning.
ampersand5
Posts: 262
Joined: Sun Nov 23, 2014 6:18 pm

Re: the state of APBR

Post by ampersand5 »

Crow - I want to personally thank you for all that you've done. Your post made a lot of salient points.

In light of the recent posts, I still think it would be wise to create a guide/how to on creating RAPM.

I think it would serve as a great tool for encouraging and teaching new members to get involved. Moreover, it would also put everyone on the same page and allow for further and more comprehensive discussions on it.

Thoughts?
Crow
Posts: 10624
Joined: Thu Apr 14, 2011 11:10 pm

Re: the state of APBR

Post by Crow »

Thank you.

I should have said that your call to document RPM / XRAPM was appropriate and timely and I hope it gets a response similar to what Daniel has done with BPM. Whether or not it gets a detailed summary, we still appreciate what Jerry has done and continues to do with the public metric.
ampersand5
Posts: 262
Joined: Sun Nov 23, 2014 6:18 pm

Re: the state of APBR

Post by ampersand5 »

im going to compile all of the publicly available information on what went into RAPM in addition to contacting all of the individuals who have posted RAPM stats online. Hopefully, we can collaboratively create a how-to-guide for RAPM so anyone can make and adjust it.

For anyone reading this who wants to be involved, or has knowledge of what goes into making RAPM, please contact me or post here.

Thanks!
Crow
Posts: 10624
Joined: Thu Apr 14, 2011 11:10 pm

Re: the state of APBR

Post by Crow »

Thank you for the undertaking and hope you get cooperation.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: the state of APBR

Post by mystic »

Eli Witus once made a pretty good intro for the calculation of APM. The preparation of the data can be the same for RAPM.

http://www.countthebasket.com/blog/2008 ... lus-minus/

glmnet seems to be the easiest-to-handle package to calculate RAPM values in R and has a cross validation option included to get the optimal lambda.

http://cran.r-project.org/web/packages/ ... index.html

R should be the most commonly used software for doing statistical analysis. Other helpful tools/script or programming languages can be: Python, Ruby, Matlab (Octave) and Fortan. I used Matlab as well as Fortran in the past to calculate RAPM values. Though, calculating RAPM without any kind of programming skills will be next to impossible, but everyone interested in stats/numbers shouldn't have a hard time being able to handle scripting languages like R, Python, Ruby or Matlab/Octave. Fortran is more powerful in the end, and mostly used in scientific environments when building models (like weather/climate models or other dynamical system models; I used it when working with glacier and impact models).

I think the biggest obstacle will be getting that clean matchupfile in order to generate those numbers. Right now I adjust each pbp-file by adding the lineup formations at the start of each quarter manually (meaning, I have to watch the game video, because I have not found a reliable source to extract those information automatically), that helps a lot with the parsing process and is limiting error sources. If that would be a group effort and it could be made publically available, that would be a really good start.

The presentation of results might also be an important part (not for me, I'm happy with a list of numbers), in order to help with the interpretation. I can't give any good advice here, there are for sure people around with better skills in that part (thinking about Kirk Goldsberry or EvanZ here; Daniel also has some nice graphical presentations).

If there are questions regarding the underlying math, I am always willing to share some information (or give some useful links, though I'm likely not skilled enough to serve as a good "teacher"), as long as I see real interest in understanding the math part and some motivation to learn about matrix algebra (because the matrix notation makes it much much easier to comprehend the process for my taste). Though, I tend to give straight honest answers rather than being polite, which makes it difficult for a lot of people from my experience.

Hope, that is at least some helpful start ...
Post Reply