At a basic level, developing a SPM model just involves regressing some player stats against a RAPM sample, but what is the best way to do that? Is it:
1. Regress a multi-year player sample against a multi-year RAPM sample (i.e. Kevin Durant's 2010-2015 box score stats against Durant's 2010-2015 RAPM)
2. Regress individual player seasons against a multi-year RAPM sample (i.e. Kevin Durant's 2010, 2011, 2012, 2013, 2014, and 2015 box score stats against Durant's 2010-2015 RAPM)
3. Regress individual player seasons against only that year's RAPM (i.e Kevin Durant's 2010 box score stats against his 2010 RAPM)
4. Something else
Best practices for SPM development?
Re: Best practices for SPM development?
If you are trying to understand a season, do #3. #1 might be good too in some yr weighted fashion. If you are trying to project, probably some version of #1.
My other opinion / advice would be to make sure defense gets equal weight. Probably display separate. And do something on shot defense vs. leaving it out.
My other opinion / advice would be to make sure defense gets equal weight. Probably display separate. And do something on shot defense vs. leaving it out.
Re: Best practices for SPM development?
For BPM I sort of did a hybrid of 1 and 2--I used the individual season statistics, but then aggregated them iteratively within the regression onto the multi-year RAPM. This was computationally challenging, so I used method 1 for feature selection and then the hybrid approach for fine-tuning the coefficients.RyanRiot wrote: ↑Mon Mar 25, 2019 9:20 pm At a basic level, developing a SPM model just involves regressing some player stats against a RAPM sample, but what is the best way to do that? Is it:
1. Regress a multi-year player sample against a multi-year RAPM sample (i.e. Kevin Durant's 2010-2015 box score stats against Durant's 2010-2015 RAPM)
2. Regress individual player seasons against a multi-year RAPM sample (i.e. Kevin Durant's 2010, 2011, 2012, 2013, 2014, and 2015 box score stats against Durant's 2010-2015 RAPM)
3. Regress individual player seasons against only that year's RAPM (i.e Kevin Durant's 2010 box score stats against his 2010 RAPM)
4. Something else
Good Multi-year RAPM is the key--single year RAPM is quite noisy and thus ends up with a bunch of statistical shrinkage to reduce that noise. Not a lot of signal to work with.
Re: Best practices for SPM development?
I think the right answer is whatever ends up being best out-of-sample.
I do number #1 - however, to get the right SPM blend I use cross validation with each of the seasons within data set.
So you'll get the following for a 2000-2018 data set:
Multi-Year RAPM 2000-2018 (- 2000) + Box Score 2000-2018 (- 2000)
Multi-Year RAPM 2000-2018 (- 2001) + Box Score 2000-2018 (- 2001)
Multi-Year RAPM 2000-2018 (- 2002) + Box Score 2000-2018 (- 2002)
...
Multi-Year RAPM 2000-2018 (- 2018) + Box Score 2000-2018 (- 2018)
It's computationally expensive to generate box score data and multi-year RAPM by excluding each year but it allows you test OOS against individual seasons.
I do number #1 - however, to get the right SPM blend I use cross validation with each of the seasons within data set.
So you'll get the following for a 2000-2018 data set:
Multi-Year RAPM 2000-2018 (- 2000) + Box Score 2000-2018 (- 2000)
Multi-Year RAPM 2000-2018 (- 2001) + Box Score 2000-2018 (- 2001)
Multi-Year RAPM 2000-2018 (- 2002) + Box Score 2000-2018 (- 2002)
...
Multi-Year RAPM 2000-2018 (- 2018) + Box Score 2000-2018 (- 2018)
It's computationally expensive to generate box score data and multi-year RAPM by excluding each year but it allows you test OOS against individual seasons.