Guides to Creating RAPM

Home for all your discussion of basketball statistical analysis.
dwm8
Posts: 9
Joined: Mon Jul 06, 2015 3:52 pm

Re: Guides to Creating RAPM

Post by dwm8 »

DSMok1 wrote:These results look like no regression to the prior/mean is occurring (i.e. this is pure APM). I'm almost certain that's what's happening. I don't know enough about the code to troubleshoot it though. Anyone else?
I'm getting lambda.1se = 5.146436 if that means anything to anyone.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Guides to Creating RAPM

Post by mystic »

DSMok1 wrote: These results look like no regression to the prior/mean is occurring (i.e. this is pure APM). I'm almost certain that's what's happening. I don't know enough about the code to troubleshoot it though. Anyone else?
Well, there is some regression to the mean, because the used lambda is bigger than 0 (in fact I get about 3 with the used code and raw data). The reason is alpha, the elasticnet parameter, is set to 1 by default. Add alpha=0 in cv.glmnet and lambda.1se will be nearly 3000 and the results from the regression will look much different.

Also, dwm8, for the cv you can use the option "parallel=TRUE", when you use the package doParallel. That should make use of multiple cores and will decrease the computing time. It is also not necessary to set nfolds=100, 10 should be sufficient to have a stable lambda.
dwm8
Posts: 9
Joined: Mon Jul 06, 2015 3:52 pm

Re: Guides to Creating RAPM

Post by dwm8 »

mystic wrote:
DSMok1 wrote: These results look like no regression to the prior/mean is occurring (i.e. this is pure APM). I'm almost certain that's what's happening. I don't know enough about the code to troubleshoot it though. Anyone else?
Well, there is some regression to the mean, because the used lambda is bigger than 0 (in fact I get about 3 with the used code and raw data). The reason is alpha, the elasticnet parameter, is set to 1 by default. Add alpha=0 in cv.glmnet and lambda.1se will be nearly 3000 and the results from the regression will look much different.

Also, dwm8, for the cv you can use the option "parallel=TRUE", when you use the package doParallel. That should make use of multiple cores and will decrease the computing time. It is also not necessary to set nfolds=100, 10 should be sufficient to have a stable lambda.
I figured 10 folds would be enough, but I bumped it up to 100 to see if the results would get better (which they didn't). The computing time wasn't an issue for me, only took a couple of minutes.

I know I included the matrices and vectors in the Google Sheets spreadsheet, but the exact .csv's used can be found at the link below, including the entire 39192 x 479 lineup matrix.

https://www.dropbox.com/sh/5x1q8wbdlseu ... eTo-a?dl=0

On another note, I tried out lambda = 500 just to see what would happen, and I still ended up with bad results, just everything regressed closed to 0 (N'Diaye and Benson still leading the charge at 12.4 and 8.5, respectively). Any help/thoughts would be greatly appreciated!
AcrossTheCourt
Posts: 237
Joined: Sat Feb 16, 2013 11:56 am

Re: Guides to Creating RAPM

Post by AcrossTheCourt »

mystic wrote:
dwm8 wrote:
sndesai1 wrote:based on reading the glmnet documentation, i would think it involves creating a vector of your priors and using it as the "offset" parameter
Thanks for your response. I was looking through that as well, but noticed that it says the offset vector is "A vector of length nobs," which doesn't make sense to me. Wouldn't a vector for priors be a vector of length nvars, i.e. a vector with each value corresponding to a player, not a lineup?

I also just saw your PM and will reply in more detail there, but I want to reply here to that specific question. You basically calculate the expected value of a specific matchup (5 vs. 5) and then subtract it from the real value in order to create a prior-informed RAPM. In order to do that in GLMNET you can create the vector of the prior values for each player and then combine that with the design matrix via matrix multiplaction (design matrix %*% prior). This creates the necessary vector with the length nobs and would be used as offset in order to calculate the prior informed RAPM.

Assuming you want RAPM for year y with a prior based only on year x, in that case you can include the data from year x to the set and run the regression based on the whole sample. The results should be nearly identical and differences are explained by rounding.

I want to add, that when using GLMNET for RAPM, I do the crossvalidation with nfolds=10, that's when the resulting lambda is usually constant. Also, lambda.1se should be used for running the ridge regression in order to get better predictive values. Using lambda.min with a big enough sample would result into not much different values than using no lambda at all (which means APM values instead).
I just want to reiterate what Mystic said here. It's basically how I calculate RAPM.

Alpha >0 means the model will drop some variables, and the whole point of RAPM is to calculate estimates for every player. Alpha = 0 means it's a ridge regression.

Also, you can play around with the penalty factor. For instance, I tested a looser penalty for rookies in prior-informed RAPM.
dwm8
Posts: 9
Joined: Mon Jul 06, 2015 3:52 pm

Re: Guides to Creating RAPM

Post by dwm8 »

AcrossTheCourt wrote: I just want to reiterate what Mystic said here. It's basically how I calculate RAPM.

Alpha >0 means the model will drop some variables, and the whole point of RAPM is to calculate estimates for every player. Alpha = 0 means it's a ridge regression.

Also, you can play around with the penalty factor. For instance, I tested a looser penalty for rookies in prior-informed RAPM.
Thanks for the note. As I mentioned, I've manually adjusted the penalty factor, though I'm still finding that the resulting coefficients seem flawed. Do you have any idea why my results are so extreme? I believe I've followed most people's general guidelines as closely as I can, but I haven't come across anyone with as bad/noisy of results as I've gotten.
xkonk
Posts: 307
Joined: Fri Apr 15, 2011 12:37 am

Re: Guides to Creating RAPM

Post by xkonk »

It's been a while since I've looked at RAPM-type data... can you spell out what the non-x variables are? In particular, how did you make y and the weights? I assumed y is point differential per 100 possessions and weight is the number of possessions for a particular match-up, but then I don't think the numbers make sense. I should be able to use y and the weight to get to a whole number score difference for each stint, right? That doesn't look to be true for what I downloaded from dropbox.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Guides to Creating RAPM

Post by mystic »

dwm8 wrote: Thanks for the note. As I mentioned, I've manually adjusted the penalty factor, though I'm still finding that the resulting coefficients seem flawed. Do you have any idea why my results are so extreme? I believe I've followed most people's general guidelines as closely as I can, but I haven't come across anyone with as bad/noisy of results as I've gotten.
That is an issue with GLMNET, but I could not figure out yet, what the main reason is. When you run the calculation like I described in this thread (you need to add the weight to that equation, obviously, but I think I described that already in the PM as well as how to incorporate a prior), that's the result when using the bbv dataset for the 2012 RS (lambda = 3000):

Code: Select all

Name                   RAPM
Gibson, Taj                 4.88
Nowitzki, Dirk              4.24
Bonner, Matt                4.14
Parker, Tony                3.92
Gallinari, Danilo           3.84
Griffin, Blake              3.79
Harden, James               3.69
James, LeBron               3.65
Udrih, Beno                 3.64
Granger, Danny              3.47
Carter, Vince               3.44
Gasol, Marc                 3.44
Paul, Chris                 3.23
Ginobili, Manu              3.18
Curry, Stephen              3.12
Lucas, John                 3.10
Garnett, Kevin              3.02
Anderson, Ryan              3.01
Bosh, Chris                 2.94
Duncan, Tim                 2.92
Deng, Luol                  2.82
Conley, Mike                2.79
Westbrook, Russell          2.72
Young, Thaddeus             2.70
Aldridge, LaMarcus          2.69
Udoh, Ekpe                  2.68
Allen, Tony                 2.61
Wade, Dwyane                2.56
Frye, Channing              2.48
Rose, Derrick               2.47
Butler, Caron               2.46
Gasol, Pau                  2.45
Smith, Josh                 2.42
Collison, Nick              2.38
Sanders, Larry              2.36
Millsap, Paul               2.34
Rubio, Ricky                2.33
Korver, Kyle                2.33
Miller, Mike                2.25
Fields, Landry              2.21
Randolph, Zach              2.21
Lin, Jeremy                 2.14
Iguodala, Andre             2.14
Howard, Dwight              2.14
Felton, Raymond             2.12
Brand, Elton                2.08
Chalmers, Mario             2.06
Durant, Kevin               1.99
Nash, Steve                 1.93
Miller, Andre               1.93
Novak, Steve                1.90
Pondexter, Quincy           1.88
Love, Kevin                 1.85
Green, Danny                1.84
George, Paul                1.84
Rondo, Rajon                1.81
Anderson, James             1.80
Magloire, Jamaal            1.70
Bradley, Avery              1.70
Harrellson, Josh            1.67
Nelson, Jameer              1.67
West, David                 1.67
Matthews, Wes               1.63
Sefolosha, Thabo            1.62
Smith, J.R.                 1.61
Splitter, Tiago             1.58
Kidd, Jason                 1.56
Butler, Jimmy               1.51
Aminu, Al-Farouq            1.51
Bryant, Kobe                1.50
Stuckey, Rodney             1.49
Williams, Louis             1.49
Harrington, Al              1.49
Teague, Jeff                1.47
James, Mike                 1.46
Hibbert, Roy                1.42
Richardson, Jason           1.38
Anthony, Joel               1.37
Johnson, Amir               1.34
Smith, Jason                1.34
Radmanovic, Vladimir        1.31
Jefferson, Al               1.30
Mbah a Moute, Luc           1.29
Gordon, Eric                1.29
Ibaka, Serge                1.24
Meeks, Jodie                1.23
Robinson, Nate              1.22
Ilyasova, Ersan             1.22
Anderson, Alan              1.21
Vasquez, Greivis            1.20
Brewer, Ronnie              1.19
Dalembert, Samuel           1.19
Kapono, Jason               1.18
Thomas, Isaiah              1.17
Noah, Joakim                1.16
Koufos, Kosta               1.15
Pargo, Jannero              1.15
Bledsoe, Eric               1.14
Uzoh, Ben                   1.14
Varejao, Anderson           1.10
Jordan, DeAndre             1.09
Thomas, Kurt                1.06
Johnson, Joe                1.06
Dragic, Goran               1.05
Kleiza, Linas               1.04
Williams, Sean              1.03
Humphries, Kris             1.03
Shumpert, Iman              1.00
Hamilton, Jordan            0.99
Parsons, Chandler           0.99
Lee, Courtney               0.95
Hill, George                0.94
Billups, Chauncey           0.94
Wallace, Gerald             0.92
Sessions, Ramon             0.92
Thompson, Jason             0.91
Mahinmi, Ian                0.90
Whiteside, Hassan           0.90
Holiday, Jrue               0.90
Price, A.J.                 0.89
Bynum, Andrew               0.89
Battier, Shane              0.88
Barnes, Matt                0.88
Hudson, Lester              0.87
Arenas, Gilbert             0.87
Haywood, Brendan            0.86
Booker, Trevor              0.85
Allen, Lavoy                0.85
Stackhouse, Jerry           0.84
Hill, Grant                 0.83
Camby, Marcus               0.83
Hamilton, Richard           0.82
Foster, Jeff                0.81
Scola, Luis                 0.79
Jeffries, Jared             0.78
Battie, Tony                0.77
Forbes, Gary                0.77
Pierce, Paul                0.77
Dunleavy, Mike              0.76
Dudley, Jared               0.74
Williams, Deron             0.73
Fisher, Derek               0.73
Favors, Derrick             0.72
Bogut, Andrew               0.71
Beaubois, Rodrigue          0.71
Turkoglu, Hedo              0.71
Lowry, Kyle                 0.70
Asik, Omer                  0.70
Haddadi, Hamed              0.69
Hilario, Nene               0.68
Anthony, Carmelo            0.64
Lawson, Ty                  0.63
Maxiell, Jason              0.62
Stone, Julyan               0.60
Macklin, Vernon             0.59
Davis, Baron                0.59
Johnson, James              0.57
Horford, Al                 0.55
Boozer, Carlos              0.52
Ayon, Gustavo               0.52
Wright, Brandan             0.51
Seraphin, Kevin             0.51
Wilkins, Damien             0.50
Gortat, Marcin              0.49
Mills, Patrick              0.48
Bass, Brandon               0.47
Ford, T.J.                  0.47
McGrady, Tracy              0.43
Harris, Tobias              0.43
Bynum, Will                 0.41
Bargnani, Andrea            0.41
Moore, E'Twaun              0.39
Alabi, Solomon              0.39
Irving, Kyrie               0.39
Mason, Roger                0.39
Andersen, Chris             0.38
Ellis, Monta                0.36
Maynor, Eric                0.36
Martin, Cartier             0.35
Williams, Jordan            0.35
Lopez, Robin                0.34
Fernandez, Rudy             0.32
Cook, Daequan               0.31
Brewer, Corey               0.31
Dampier, Erick              0.30
Rush, Brandon               0.30
Henry, Xavier               0.30
Diogu, Ike                  0.29
Derozan, DeMar              0.28
Ebanks, Devin               0.28
Benson, Keith               0.27
Bayless, Jerryd             0.27
Thompson, Mychel            0.25
Watson, Earl                0.23
Turiaf, Ronny               0.23
Kaman, Chris                0.23
Foote, Jeff                 0.22
Lopez, Brook                0.20
Hobson, Darington           0.20
Faried, Kenneth             0.20
Greene, Donte               0.20
Hollins, Ryan               0.18
Balkman, Renaldo            0.17
Tinsley, Jamaal             0.16
Smith, Jerry                0.15
Neal, Gary                  0.13
Fesenko, Kyrylo             0.13
Reid, Ryan                  0.11
N'Diaye, Hamady             0.11
Budinger, Chase             0.11
Honeycutt, Tyler            0.10
Hayward, Gordon             0.09
Green, Gerald               0.07
Moore, Mikki                0.07
Pendergraph, Jeff           0.07
Jordan, Jerome              0.06
Emmett, Andre               0.06
Blake, Steve                0.04
Brown, Shannon              0.01
Leuer, Jon                  0.01
Singleton, Chris            0.01
Tolliver, Anthony          -0.01
Brockman, Jon              -0.01
Harris, Manny              -0.02
Chandler, Wilson           -0.02
Duhon, Chris               -0.03
Dawson, Eric               -0.05
Price, Ronnie              -0.06
Diaw, Boris                -0.07
Harris, Devin              -0.07
Barron, Earl               -0.08
O'Neal, Jermaine           -0.10
Cousins, DeMarcus          -0.10
Collison, Darren           -0.11
Gibson, Daniel             -0.11
Pekovic, Nikola            -0.12
Azubuike, Kelenna          -0.12
Byars, Derrick             -0.12
Smith, Greg                -0.12
Gadzuric, Dan              -0.12
Chandler, Tyson            -0.13
Pietrus, Mickael           -0.13
Johnson, Carldell          -0.13
Smith, Ishmael             -0.13
Haslem, Udonis             -0.14
Hill, Jordan               -0.14
Johnson, Trey              -0.15
Wright, Chris              -0.16
Randolph, Anthony          -0.17
Pachulia, Zaza             -0.17
Johnson, Ivan              -0.18
Johnson, Armon             -0.19
Mohammed, Nazr             -0.20
Skinner, Brian             -0.20
Thomas, Malcolm            -0.21
Mack, Shelvin              -0.21
Dyson, Jerome              -0.22
Beasley, Michael           -0.23
Batum, Nicolas             -0.24
Allen, Ray                 -0.24
Pavlovic, Sasha            -0.25
Brown, Kwame               -0.26
Martin, Kevin              -0.27
Singleton, James           -0.27
World Peace, Metta         -0.28
Scalabrine, Brian          -0.28
Williams, Shelden          -0.29
Martin, Kenyon             -0.29
Thornton, Marcus           -0.30
Wallace, Ben               -0.30
Landry, Carl               -0.30
Vucevic, Nikola            -0.30
Ahearn, Blake              -0.30
Lee, David                 -0.31
Leonard, Kawhi             -0.32
Orton, Daniel              -0.32
Webster, Martell           -0.32
Barbosa, Leandro           -0.32
Murphy, Troy               -0.32
Afflalo, Arron             -0.33
Boykins, Earl              -0.34
Dentmon, Justin            -0.34
Carroll, DeMarre           -0.34
Jackson, Stephen           -0.35
Silas, Xavier              -0.36
Morris, Markieff           -0.36
Evans, Maurice             -0.36
Williams, Marvin           -0.36
Villanueva, Charlie        -0.37
James, Damion              -0.38
Miller, Brad               -0.38
Garcia, Francisco          -0.39
Jones, James               -0.40
Walker, Bill               -0.41
Horner, Dennis             -0.41
Simmons, Bobby             -0.42
West, Delonte              -0.43
Ubiles, Edwin              -0.43
Johnson, Chris             -0.45
Ellington, Wayne           -0.45
Collins, Jason             -0.45
Farmar, Jordan             -0.47
Elson, Francisco           -0.49
Jackson, Reggie            -0.49
Gay, Rudy                  -0.49
Milicic, Darko             -0.51
Adrien, Jeff               -0.51
Douglas, Toney             -0.56
Najera, Eduardo            -0.57
Ridnour, Luke              -0.57
Amundson, Louis            -0.58
Harangody, Luke            -0.59
Ariza, Trevor              -0.60
Hawes, Spencer             -0.61
Turner, Evan               -0.62
Summers, DaJuan            -0.63
Samuels, Samardo           -0.63
Howard, Josh               -0.63
Morris, Marcus             -0.63
Johnson, JaJuan            -0.63
Miles, C.J.                -0.64
Nocioni, Andres            -0.65
Kennedy, D.J.              -0.65
Thompson, Klay             -0.66
Leslie, Travis             -0.67
Outlaw, Travis             -0.68
Okafor, Emeka              -0.68
Gee, Alonzo                -0.69
Mayo, O.J.                 -0.71
Pittman, Dexter            -0.71
Williams, Elliot           -0.72
Cook, Brian                -0.73
Hayes, Chuck               -0.78
Selby, Josh                -0.79
Higgins, Cory              -0.79
Daniels, Marquis           -0.79
Jamison, Antawn            -0.81
Evans, Jeremy              -0.83
Hinrich, Kirk              -0.84
Livingston, Shaun          -0.85
Howard, Juwan              -0.88
Carter, Anthony            -0.89
Terry, Jason               -0.89
Burks, Alec                -0.92
Stevenson, DeShawn         -0.93
Przybilla, Joel            -0.94
Ivey, Royal                -0.95
Gladness, Mickell          -0.95
Jerebko, Jonas             -0.98
Barea, Jose                -0.98
Walton, Luke               -1.01
Marion, Shawn              -1.02
Aldrich, Cole              -1.03
Redick, J.J.               -1.03
Diop, DeSagana             -1.03
Williams, Derrick          -1.05
Goudelock, Andrew          -1.05
Harper, Justin             -1.07
Joseph, Cory               -1.07
Calderon, Jose             -1.07
Lewis, Rashard             -1.07
Blatche, Andray            -1.08
Eyenga, Christian          -1.08
Smith, Nolan               -1.08
Bell, Raja                 -1.08
Patterson, Patrick         -1.09
Wafer, Von                 -1.10
Gray, Aaron                -1.10
Fortson, Courtney          -1.11
Wall, John                 -1.12
Carroll, Matt              -1.12
McGuire, Dominic           -1.13
Watkins, Darryl            -1.14
Morris, Darius             -1.15
Jefferson, Richard         -1.16
Lee, Malcolm               -1.16
Childress, Josh            -1.16
Williams, Shawne           -1.21
Davis, Josh                -1.24
Stephenson, Lance          -1.25
Wright, Dorell             -1.25
Russell, Walker            -1.25
Gooden, Drew               -1.26
Davis, Ed                  -1.27
Erden, Semih               -1.28
Perkins, Kendrick          -1.28
Moon, Jamario              -1.29
Cardinal, Brian            -1.29
Butler, Rasual             -1.31
Williams, Terrence         -1.31
Jennings, Brandon          -1.32
Wilcox, Chris              -1.32
Daye, Austin               -1.35
Thabeet, Hasheem           -1.36
Petro, Johan               -1.37
Smith, Craig               -1.38
Biedrins, Andris           -1.38
Jianlian, Yi               -1.39
Parker, Anthony            -1.42
Monroe, Greg               -1.42
Telfair, Sebastian         -1.43
Hansbrough, Tyler          -1.45
Foye, Randy                -1.48
Curry, Eddy                -1.51
Harris, Terrel             -1.53
Liggins, DeAndre           -1.54
Tyler, Jeremy              -1.54
Okur, Mehmet               -1.56
Cunningham, Dante          -1.57
Hughes, Larry              -1.58
Richardson, Quentin        -1.58
Crawford, Jamal            -1.59
Young, Nick                -1.60
Mullens, Byron             -1.61
Thompkins, Trey            -1.61
Vesely, Jan                -1.61
Brooks, Marshon            -1.62
Williams, Mo               -1.63
Hayward, Lazar             -1.63
Gordon, Ben                -1.64
Gomes, Ryan                -1.67
Davis, Glen                -1.68
Hickson, J.J.              -1.75
Blair, DeJuan              -1.75
Brackins, Craig            -1.75
Salmons, John              -1.76
Odom, Lamar                -1.77
Brown, Derrick             -1.78
Mozgov, Timofey            -1.78
Thomas, Lance              -1.85
Young, Sam                 -1.90
White, D.J.                -1.90
Cole, Norris               -1.94
Evans, Tyreke              -1.99
McRoberts, Josh            -2.00
Jones, Dominique           -2.05
Babbitt, Luke              -2.07
Green, Willie              -2.10
Flynn, Jonny               -2.13
Watson, C.J.               -2.13
Jones, Dahntay             -2.17
Thomas, Tyrus              -2.21
Crawford, Jordan           -2.23
Williams, Reggie           -2.24
Morrow, Anthony            -2.26
Kanter, Enes               -2.27
Bibby, Mike                -2.29
Belinelli, Marco           -2.30
Gaines, Sundiata           -2.30
Johnson, Wesley            -2.33
Stoudemire, Amare          -2.41
Jones, Solomon             -2.43
Delfino, Carlos            -2.44
Maggette, Corey            -2.46
Stiemsma, Greg             -2.50
Knight, Brandon            -2.60
Prince, Tayshaun           -2.61
Jack, Jarrett              -2.61
Redd, Michael              -2.64
Henderson, Gerald          -2.65
Fredette, Jimmer           -2.72
Augustin, D.J.             -2.78
Speights, Marreese         -2.82
Jenkins, Charles           -2.98
Pargo, Jeremy              -3.02
Casspi, Omri               -3.04
Clark, Earl                -3.13
Evans, Reggie              -3.16
McGee, JaVale              -3.20
Walker, Kemba              -3.23
Thompson, Tristan          -3.35
Biyombo, Bismack           -3.50
Warrick, Hakim             -3.52
Dooling, Keyon             -3.57
Sloan, Donald              -3.92
When I use GLMNET, I also get "weird" results when using the same raw data (dataset from bbv). I noticed that before and just simply calculated RAPM via my own script (which isn't that hard to write, if you know matrix algebra and follow my description in the mentioned thread). Thus, I argue that is not the result of you doing something wrong, but something in the GLMNET script is doing something, which causes those weird results. It is not reasonable at all to get such high values for players with such low amount of possessions like GLMNET is producing.
dwm8
Posts: 9
Joined: Mon Jul 06, 2015 3:52 pm

Re: Guides to Creating RAPM

Post by dwm8 »

xkonk wrote:It's been a while since I've looked at RAPM-type data... can you spell out what the non-x variables are? In particular, how did you make y and the weights? I assumed y is point differential per 100 possessions and weight is the number of possessions for a particular match-up, but then I don't think the numbers make sense. I should be able to use y and the weight to get to a whole number score difference for each stint, right? That doesn't look to be true for what I downloaded from dropbox.
Here's what I used:

X = 39192 x 479 lineup matrix
y = 39192 x 1 results vector (only includes stints where both home and away teams got a possession, value is difference between home and away points per 100 possessions)
weights = 39192 x 1 vector of observation weights (sum of home and away possessions for each matchup)

I was able to get normal results (very similar to those of mystic) by going through and using matrix multiplication, though I cannot get the numbers to work using glmnet in R.

Another question that I believe has been addressed somewhere on this forum, though I can't determine where: what are the benefits of adding a prior into the calculation of RAPM? I'm guessing the main one would be to establish a baseline for low-minute players so that they aren't all regressed to 0, but is there anything else? I hear people saying that it makes RAPM a more predictive statistic, but what does that mean? I've always seen it as an assessment of how much impact a player has or has not made in a season, so how does prediction factor in?
xkonk
Posts: 307
Joined: Fri Apr 15, 2011 12:37 am

Re: Guides to Creating RAPM

Post by xkonk »

dwm8 wrote: Here's what I used:

X = 39192 x 479 lineup matrix
y = 39192 x 1 results vector (only includes stints where both home and away teams got a possession, value is difference between home and away points per 100 possessions)
weights = 39192 x 1 vector of observation weights (sum of home and away possessions for each matchup)

I was able to get normal results (very similar to those of mystic) by going through and using matrix multiplication, though I cannot get the numbers to work using glmnet in R.
I downloaded the data you provided, so I know what you used. I was asking more how you made those matrices. Just at the beginning of the file, you have entries where the results are -83.333 and -8.333 and the weights are 7. 7 doesn't turn either of those into a whole number difference in points. Shouldn't I be able to do something like the first entry, where the difference of 40 and weight of 10 suggests that the home-away difference was 40*10/100 = 4?
dwm8 wrote:Another question that I believe has been addressed somewhere on this forum, though I can't determine where: what are the benefits of adding a prior into the calculation of RAPM? I'm guessing the main one would be to establish a baseline for low-minute players so that they aren't all regressed to 0, but is there anything else? I hear people saying that it makes RAPM a more predictive statistic, but what does that mean? I've always seen it as an assessment of how much impact a player has or has not made in a season, so how does prediction factor in?
Adding a prior helps the algorithm keep players closer to a better estimate of their actual value, assuming that the previous year's RAPM or SPM or whatever is a better estimate than 0. That means the algorithm can make a better guess with less data in the current year. Having the information that LeBron is really good, the algorithm can comfortably say that in the current season he's really good as opposed to pretty good. If you use player ratings to predict a later season, they're more accurate when they think LeBron is really good as opposed to pretty good. Or to describe it a different way, predictions are typically better when estimated values are regressed to the mean. RAPM with a prior regresses the player estimates to something like each player's mean instead of regressing them to a mean of 0. It turns out that regressing to an individual mean is better.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Guides to Creating RAPM

Post by mystic »

xkonk wrote:Just at the beginning of the file, you have entries where the results are -83.333 and -8.333 and the weights are 7. 7 doesn't turn either of those into a whole number difference in points. Shouldn't I be able to do something like the first entry, where the difference of 40 and weight of 10 suggests that the home-away difference was 40*10/100 = 4?
No, you shouldn't. -83.333 is the difference between the home rating and the away rating, where either rating maybe based on a different amount of possessions. In that specific case the home team scored 2 points in 4 possessions, which gives them 50 as their rating, and the away team scored 4 points in 3 possessions, which gives them a rating of 133.333. 50 - 133.333 = -83.333. Completely normal result. The issue dwm8 encountered is not explained by the used data.
dwm8
Posts: 9
Joined: Mon Jul 06, 2015 3:52 pm

Re: Guides to Creating RAPM

Post by dwm8 »

Has anyone here had success breaking RAPM into offensive and defensive splits that make sense? As I mentioned, I was able to get some solid overall RAPM numbers thanks to help from mystic, but I'm having some problems while trying to get the splits of the data.

The two methods of which I am aware of splitting RAPM are

1) Eli Witus' method at http://www.countthebasket.com/blog/2008 ... lus-minus/

2) EvanZ's method of creating an offensive and defensive variable for each player and using the offensive team's rating as the dependent variable

Using the my same 2012 dataset, I tried both methods and got very extreme results. The offensive splits were very high and the defensive splits low (Antawn Jamison had a combined RAPM of -0.8 with an ORAPM of 13.9 and a DRAPM of -14.7). I'd be interested in hearing about how you guys go about performing these calculations and what the results generally look like. Thanks for all of the help, everyone!
saberpowers
Posts: 1
Joined: Tue Nov 03, 2015 6:34 am

Re: Guides to Creating RAPM

Post by saberpowers »

mystic wrote:That is an issue with GLMNET, but I could not figure out yet, what the main reason is. When you run the calculation like I described in this thread (you need to add the weight to that equation, obviously, but I think I described that already in the PM as well as how to incorporate a prior), that's the result when using the bbv dataset for the 2012 RS (lambda = 3000):

...

When I use GLMNET, I also get "weird" results when using the same raw data (dataset from bbv). I noticed that before and just simply calculated RAPM via my own script (which isn't that hard to write, if you know matrix algebra and follow my description in the mentioned thread). Thus, I argue that is not the result of you doing something wrong, but something in the GLMNET script is doing something, which causes those weird results. It is not reasonable at all to get such high values for players with such low amount of possessions like GLMNET is producing.
In glmnet, set the standardize argument to FALSE. What's happening is that for players with small numbers of possessions, their indicator variables are being standardized to have the same variance as all other players' indicators. So the variables are scaled up, which means that smaller values of the coefficients for players with less playing time correspond to larger values of the coefficients for players with more playing time. So players with less playing time see their coefficients regularized less, in a sense.

So for example, do

Code: Select all

fit = glmnet(x, y, alpha = 0, standardize = FALSE, ...)
Also, if you want it to run faster, try coding x as a sparse matrix using the sparseMatrix() function in the matrix package. Lastly, I would actually recommend using 'lambda.min' in cv.glmnet() rather than 'lambda.1se'. I did not follow the argument you were making earlier in favor of 'lambda.1se' over 'lambda.min'. In fact, I'd be very interested if you could sway me to start using 'lambda.1se' myself instead of 'lambda.min'.
permaximum
Posts: 416
Joined: Tue Nov 27, 2012 7:04 pm

Re: Guides to Creating RAPM

Post by permaximum »

Glmnet definetely produces high values for players with low possessions. It uses an efficient method to provide a very accurate "estimate". It's not the real deal. Unfortunately I don't know any other package that includes weight option for penalized regressions afaik R goes.
Post Reply