I recently became aware of several additional guides to working with NBA play-by-play data in R.
Ramiro Bentes put together this guide: https://nbainrstats.netlify.app/post/ad ... play-data/
It's a whole sequence of posts with a lot of good R code.
Building off this, Ahmed Cheema built a couple of versions of RAPM using R:
https://www.thespax.com/nba/calculating ... asketball/
https://www.thespax.com/nba/quantifying ... ince-1997/
I note that the RAPM values found there appear to be more compressed than I would expect.
Also, Jerry Engelmann recently posted some new career RAPM data on Twitter. Career RAPM data is problematic because aging effects cause weird issues.
https://twitter.com/JerryEngelmann/stat ... 3153179776
And subsequently:
https://twitter.com/JerryEngelmann/stat ... 9741604261
First one doesn't use aging curves at all, which will cause weird issues when players are at the beginning or end of their careers and the regression still thinks they must be Superstars because they were at their peak.
The second one should be better but it also will have issues with players that don't age in a standard curve.
NBA play by play data and RAPM in R, plus new career RAPM
Re: NBA play by play data and RAPM in R, plus new career RAPM
Death, taxes, and people getting weird (wrong?) results when running RAPM in R
My first guess was that it's just missing an rubber-band effect adjustment. But they're so compressed, it doesn't seem like that would fix the whole issue
My first guess was that it's just missing an rubber-band effect adjustment. But they're so compressed, it doesn't seem like that would fix the whole issue
Re: NBA play by play data and RAPM in R, plus new career RAPM
Could it just be a wrong lambda being selected? He indicated that he was using a prior similar to the one I created that you used.
-
- Posts: 151
- Joined: Sun Jul 14, 2013 4:58 am
- Contact:
Re: NBA play by play data and RAPM in R, plus new career RAPM
It looks like the lineups tutorial is in R but that the actual lineups were pulled using Python and the RAPM calculation was done in Python. So can't blame R this time!
From the spax article:
(emphasis mine)I began this project by scraping play-by-play data for every regular season and postseason game since 1997. Then I used the ideas in this tutorial (applying it to Python) to get lineup data for each possession in the play-by-play. I was successfully able to do this for every dataset except for the 1997 regular season, which contained a lot of missing information. The data used in final RAPM calculations is almost entirely complete from the 1997 postseason to Game 6 of the 2021 Finals.
In order to address the greater importance of the postseason, I doubled playoff possessions to increase their weight in calculations. At the end of the data collection process, I had compiled 859,049 stints across 5,972,736 possessions.