Best practices for SPM development?

RyanRiot · Post by **RyanRiot** » Mon Mar 25, 2019 9:20 pm

At a basic level, developing a SPM model just involves regressing some player stats against a RAPM sample, but what is the best way to do that? Is it:

1. Regress a multi-year player sample against a multi-year RAPM sample (i.e. Kevin Durant's 2010-2015 box score stats against Durant's 2010-2015 RAPM)

2. Regress individual player seasons against a multi-year RAPM sample (i.e. Kevin Durant's 2010, 2011, 2012, 2013, 2014, and 2015 box score stats against Durant's 2010-2015 RAPM)

3. Regress individual player seasons against only that year's RAPM (i.e Kevin Durant's 2010 box score stats against his 2010 RAPM)

4. Something else

Crow · Post by **Crow** » Mon Mar 25, 2019 10:15 pm

If you are trying to understand a season, do #3. #1 might be good too in some yr weighted fashion. If you are trying to project, probably some version of #1.

My other opinion / advice would be to make sure defense gets equal weight. Probably display separate. And do something on shot defense vs. leaving it out.

DSMok1 · Post by **DSMok1** » Tue Mar 26, 2019 1:40 pm

RyanRiot wrote: ↑Mon Mar 25, 2019 9:20 pm At a basic level, developing a SPM model just involves regressing some player stats against a RAPM sample, but what is the best way to do that? Is it:

1. Regress a multi-year player sample against a multi-year RAPM sample (i.e. Kevin Durant's 2010-2015 box score stats against Durant's 2010-2015 RAPM)

2. Regress individual player seasons against a multi-year RAPM sample (i.e. Kevin Durant's 2010, 2011, 2012, 2013, 2014, and 2015 box score stats against Durant's 2010-2015 RAPM)

3. Regress individual player seasons against only that year's RAPM (i.e Kevin Durant's 2010 box score stats against his 2010 RAPM)

4. Something else

For BPM I sort of did a hybrid of 1 and 2--I used the individual season statistics, but then aggregated them iteratively within the regression onto the multi-year RAPM. This was computationally challenging, so I used method 1 for feature selection and then the hybrid approach for fine-tuning the coefficients.

Good Multi-year RAPM is the key--single year RAPM is quite noisy and thus ends up with a bunch of statistical shrinkage to reduce that noise. Not a lot of signal to work with.

sbs · Post by **sbs** » Wed Mar 27, 2019 9:48 pm

I think the right answer is whatever ends up being best out-of-sample.

I do number #1 - however, to get the right SPM blend I use cross validation with each of the seasons within data set.

So you'll get the following for a 2000-2018 data set:
Multi-Year RAPM 2000-2018 (- 2000) + Box Score 2000-2018 (- 2000)
Multi-Year RAPM 2000-2018 (- 2001) + Box Score 2000-2018 (- 2001)
Multi-Year RAPM 2000-2018 (- 2002) + Box Score 2000-2018 (- 2002)
...
Multi-Year RAPM 2000-2018 (- 2018) + Box Score 2000-2018 (- 2018)

It's computationally expensive to generate box score data and multi-year RAPM by excluding each year but it allows you test OOS against individual seasons.

APBRmetrics

Best practices for SPM development?

Best practices for SPM development?

Re: Best practices for SPM development?

Re: Best practices for SPM development?

Re: Best practices for SPM development?