I'm getting lambda.1se = 5.146436 if that means anything to anyone.DSMok1 wrote:These results look like no regression to the prior/mean is occurring (i.e. this is pure APM). I'm almost certain that's what's happening. I don't know enough about the code to troubleshoot it though. Anyone else?
Guides to Creating RAPM
Re: Guides to Creating RAPM
Re: Guides to Creating RAPM
Well, there is some regression to the mean, because the used lambda is bigger than 0 (in fact I get about 3 with the used code and raw data). The reason is alpha, the elasticnet parameter, is set to 1 by default. Add alpha=0 in cv.glmnet and lambda.1se will be nearly 3000 and the results from the regression will look much different.DSMok1 wrote: These results look like no regression to the prior/mean is occurring (i.e. this is pure APM). I'm almost certain that's what's happening. I don't know enough about the code to troubleshoot it though. Anyone else?
Also, dwm8, for the cv you can use the option "parallel=TRUE", when you use the package doParallel. That should make use of multiple cores and will decrease the computing time. It is also not necessary to set nfolds=100, 10 should be sufficient to have a stable lambda.
Re: Guides to Creating RAPM
I figured 10 folds would be enough, but I bumped it up to 100 to see if the results would get better (which they didn't). The computing time wasn't an issue for me, only took a couple of minutes.mystic wrote:Well, there is some regression to the mean, because the used lambda is bigger than 0 (in fact I get about 3 with the used code and raw data). The reason is alpha, the elasticnet parameter, is set to 1 by default. Add alpha=0 in cv.glmnet and lambda.1se will be nearly 3000 and the results from the regression will look much different.DSMok1 wrote: These results look like no regression to the prior/mean is occurring (i.e. this is pure APM). I'm almost certain that's what's happening. I don't know enough about the code to troubleshoot it though. Anyone else?
Also, dwm8, for the cv you can use the option "parallel=TRUE", when you use the package doParallel. That should make use of multiple cores and will decrease the computing time. It is also not necessary to set nfolds=100, 10 should be sufficient to have a stable lambda.
I know I included the matrices and vectors in the Google Sheets spreadsheet, but the exact .csv's used can be found at the link below, including the entire 39192 x 479 lineup matrix.
https://www.dropbox.com/sh/5x1q8wbdlseu ... eTo-a?dl=0
On another note, I tried out lambda = 500 just to see what would happen, and I still ended up with bad results, just everything regressed closed to 0 (N'Diaye and Benson still leading the charge at 12.4 and 8.5, respectively). Any help/thoughts would be greatly appreciated!
-
- Posts: 237
- Joined: Sat Feb 16, 2013 11:56 am
Re: Guides to Creating RAPM
I just want to reiterate what Mystic said here. It's basically how I calculate RAPM.mystic wrote:dwm8 wrote:Thanks for your response. I was looking through that as well, but noticed that it says the offset vector is "A vector of length nobs," which doesn't make sense to me. Wouldn't a vector for priors be a vector of length nvars, i.e. a vector with each value corresponding to a player, not a lineup?sndesai1 wrote:based on reading the glmnet documentation, i would think it involves creating a vector of your priors and using it as the "offset" parameter
I also just saw your PM and will reply in more detail there, but I want to reply here to that specific question. You basically calculate the expected value of a specific matchup (5 vs. 5) and then subtract it from the real value in order to create a prior-informed RAPM. In order to do that in GLMNET you can create the vector of the prior values for each player and then combine that with the design matrix via matrix multiplaction (design matrix %*% prior). This creates the necessary vector with the length nobs and would be used as offset in order to calculate the prior informed RAPM.
Assuming you want RAPM for year y with a prior based only on year x, in that case you can include the data from year x to the set and run the regression based on the whole sample. The results should be nearly identical and differences are explained by rounding.
I want to add, that when using GLMNET for RAPM, I do the crossvalidation with nfolds=10, that's when the resulting lambda is usually constant. Also, lambda.1se should be used for running the ridge regression in order to get better predictive values. Using lambda.min with a big enough sample would result into not much different values than using no lambda at all (which means APM values instead).
Alpha >0 means the model will drop some variables, and the whole point of RAPM is to calculate estimates for every player. Alpha = 0 means it's a ridge regression.
Also, you can play around with the penalty factor. For instance, I tested a looser penalty for rookies in prior-informed RAPM.
Re: Guides to Creating RAPM
Thanks for the note. As I mentioned, I've manually adjusted the penalty factor, though I'm still finding that the resulting coefficients seem flawed. Do you have any idea why my results are so extreme? I believe I've followed most people's general guidelines as closely as I can, but I haven't come across anyone with as bad/noisy of results as I've gotten.AcrossTheCourt wrote: I just want to reiterate what Mystic said here. It's basically how I calculate RAPM.
Alpha >0 means the model will drop some variables, and the whole point of RAPM is to calculate estimates for every player. Alpha = 0 means it's a ridge regression.
Also, you can play around with the penalty factor. For instance, I tested a looser penalty for rookies in prior-informed RAPM.
Re: Guides to Creating RAPM
It's been a while since I've looked at RAPM-type data... can you spell out what the non-x variables are? In particular, how did you make y and the weights? I assumed y is point differential per 100 possessions and weight is the number of possessions for a particular match-up, but then I don't think the numbers make sense. I should be able to use y and the weight to get to a whole number score difference for each stint, right? That doesn't look to be true for what I downloaded from dropbox.
Re: Guides to Creating RAPM
That is an issue with GLMNET, but I could not figure out yet, what the main reason is. When you run the calculation like I described in this thread (you need to add the weight to that equation, obviously, but I think I described that already in the PM as well as how to incorporate a prior), that's the result when using the bbv dataset for the 2012 RS (lambda = 3000):dwm8 wrote: Thanks for the note. As I mentioned, I've manually adjusted the penalty factor, though I'm still finding that the resulting coefficients seem flawed. Do you have any idea why my results are so extreme? I believe I've followed most people's general guidelines as closely as I can, but I haven't come across anyone with as bad/noisy of results as I've gotten.
Code: Select all
Name RAPM
Gibson, Taj 4.88
Nowitzki, Dirk 4.24
Bonner, Matt 4.14
Parker, Tony 3.92
Gallinari, Danilo 3.84
Griffin, Blake 3.79
Harden, James 3.69
James, LeBron 3.65
Udrih, Beno 3.64
Granger, Danny 3.47
Carter, Vince 3.44
Gasol, Marc 3.44
Paul, Chris 3.23
Ginobili, Manu 3.18
Curry, Stephen 3.12
Lucas, John 3.10
Garnett, Kevin 3.02
Anderson, Ryan 3.01
Bosh, Chris 2.94
Duncan, Tim 2.92
Deng, Luol 2.82
Conley, Mike 2.79
Westbrook, Russell 2.72
Young, Thaddeus 2.70
Aldridge, LaMarcus 2.69
Udoh, Ekpe 2.68
Allen, Tony 2.61
Wade, Dwyane 2.56
Frye, Channing 2.48
Rose, Derrick 2.47
Butler, Caron 2.46
Gasol, Pau 2.45
Smith, Josh 2.42
Collison, Nick 2.38
Sanders, Larry 2.36
Millsap, Paul 2.34
Rubio, Ricky 2.33
Korver, Kyle 2.33
Miller, Mike 2.25
Fields, Landry 2.21
Randolph, Zach 2.21
Lin, Jeremy 2.14
Iguodala, Andre 2.14
Howard, Dwight 2.14
Felton, Raymond 2.12
Brand, Elton 2.08
Chalmers, Mario 2.06
Durant, Kevin 1.99
Nash, Steve 1.93
Miller, Andre 1.93
Novak, Steve 1.90
Pondexter, Quincy 1.88
Love, Kevin 1.85
Green, Danny 1.84
George, Paul 1.84
Rondo, Rajon 1.81
Anderson, James 1.80
Magloire, Jamaal 1.70
Bradley, Avery 1.70
Harrellson, Josh 1.67
Nelson, Jameer 1.67
West, David 1.67
Matthews, Wes 1.63
Sefolosha, Thabo 1.62
Smith, J.R. 1.61
Splitter, Tiago 1.58
Kidd, Jason 1.56
Butler, Jimmy 1.51
Aminu, Al-Farouq 1.51
Bryant, Kobe 1.50
Stuckey, Rodney 1.49
Williams, Louis 1.49
Harrington, Al 1.49
Teague, Jeff 1.47
James, Mike 1.46
Hibbert, Roy 1.42
Richardson, Jason 1.38
Anthony, Joel 1.37
Johnson, Amir 1.34
Smith, Jason 1.34
Radmanovic, Vladimir 1.31
Jefferson, Al 1.30
Mbah a Moute, Luc 1.29
Gordon, Eric 1.29
Ibaka, Serge 1.24
Meeks, Jodie 1.23
Robinson, Nate 1.22
Ilyasova, Ersan 1.22
Anderson, Alan 1.21
Vasquez, Greivis 1.20
Brewer, Ronnie 1.19
Dalembert, Samuel 1.19
Kapono, Jason 1.18
Thomas, Isaiah 1.17
Noah, Joakim 1.16
Koufos, Kosta 1.15
Pargo, Jannero 1.15
Bledsoe, Eric 1.14
Uzoh, Ben 1.14
Varejao, Anderson 1.10
Jordan, DeAndre 1.09
Thomas, Kurt 1.06
Johnson, Joe 1.06
Dragic, Goran 1.05
Kleiza, Linas 1.04
Williams, Sean 1.03
Humphries, Kris 1.03
Shumpert, Iman 1.00
Hamilton, Jordan 0.99
Parsons, Chandler 0.99
Lee, Courtney 0.95
Hill, George 0.94
Billups, Chauncey 0.94
Wallace, Gerald 0.92
Sessions, Ramon 0.92
Thompson, Jason 0.91
Mahinmi, Ian 0.90
Whiteside, Hassan 0.90
Holiday, Jrue 0.90
Price, A.J. 0.89
Bynum, Andrew 0.89
Battier, Shane 0.88
Barnes, Matt 0.88
Hudson, Lester 0.87
Arenas, Gilbert 0.87
Haywood, Brendan 0.86
Booker, Trevor 0.85
Allen, Lavoy 0.85
Stackhouse, Jerry 0.84
Hill, Grant 0.83
Camby, Marcus 0.83
Hamilton, Richard 0.82
Foster, Jeff 0.81
Scola, Luis 0.79
Jeffries, Jared 0.78
Battie, Tony 0.77
Forbes, Gary 0.77
Pierce, Paul 0.77
Dunleavy, Mike 0.76
Dudley, Jared 0.74
Williams, Deron 0.73
Fisher, Derek 0.73
Favors, Derrick 0.72
Bogut, Andrew 0.71
Beaubois, Rodrigue 0.71
Turkoglu, Hedo 0.71
Lowry, Kyle 0.70
Asik, Omer 0.70
Haddadi, Hamed 0.69
Hilario, Nene 0.68
Anthony, Carmelo 0.64
Lawson, Ty 0.63
Maxiell, Jason 0.62
Stone, Julyan 0.60
Macklin, Vernon 0.59
Davis, Baron 0.59
Johnson, James 0.57
Horford, Al 0.55
Boozer, Carlos 0.52
Ayon, Gustavo 0.52
Wright, Brandan 0.51
Seraphin, Kevin 0.51
Wilkins, Damien 0.50
Gortat, Marcin 0.49
Mills, Patrick 0.48
Bass, Brandon 0.47
Ford, T.J. 0.47
McGrady, Tracy 0.43
Harris, Tobias 0.43
Bynum, Will 0.41
Bargnani, Andrea 0.41
Moore, E'Twaun 0.39
Alabi, Solomon 0.39
Irving, Kyrie 0.39
Mason, Roger 0.39
Andersen, Chris 0.38
Ellis, Monta 0.36
Maynor, Eric 0.36
Martin, Cartier 0.35
Williams, Jordan 0.35
Lopez, Robin 0.34
Fernandez, Rudy 0.32
Cook, Daequan 0.31
Brewer, Corey 0.31
Dampier, Erick 0.30
Rush, Brandon 0.30
Henry, Xavier 0.30
Diogu, Ike 0.29
Derozan, DeMar 0.28
Ebanks, Devin 0.28
Benson, Keith 0.27
Bayless, Jerryd 0.27
Thompson, Mychel 0.25
Watson, Earl 0.23
Turiaf, Ronny 0.23
Kaman, Chris 0.23
Foote, Jeff 0.22
Lopez, Brook 0.20
Hobson, Darington 0.20
Faried, Kenneth 0.20
Greene, Donte 0.20
Hollins, Ryan 0.18
Balkman, Renaldo 0.17
Tinsley, Jamaal 0.16
Smith, Jerry 0.15
Neal, Gary 0.13
Fesenko, Kyrylo 0.13
Reid, Ryan 0.11
N'Diaye, Hamady 0.11
Budinger, Chase 0.11
Honeycutt, Tyler 0.10
Hayward, Gordon 0.09
Green, Gerald 0.07
Moore, Mikki 0.07
Pendergraph, Jeff 0.07
Jordan, Jerome 0.06
Emmett, Andre 0.06
Blake, Steve 0.04
Brown, Shannon 0.01
Leuer, Jon 0.01
Singleton, Chris 0.01
Tolliver, Anthony -0.01
Brockman, Jon -0.01
Harris, Manny -0.02
Chandler, Wilson -0.02
Duhon, Chris -0.03
Dawson, Eric -0.05
Price, Ronnie -0.06
Diaw, Boris -0.07
Harris, Devin -0.07
Barron, Earl -0.08
O'Neal, Jermaine -0.10
Cousins, DeMarcus -0.10
Collison, Darren -0.11
Gibson, Daniel -0.11
Pekovic, Nikola -0.12
Azubuike, Kelenna -0.12
Byars, Derrick -0.12
Smith, Greg -0.12
Gadzuric, Dan -0.12
Chandler, Tyson -0.13
Pietrus, Mickael -0.13
Johnson, Carldell -0.13
Smith, Ishmael -0.13
Haslem, Udonis -0.14
Hill, Jordan -0.14
Johnson, Trey -0.15
Wright, Chris -0.16
Randolph, Anthony -0.17
Pachulia, Zaza -0.17
Johnson, Ivan -0.18
Johnson, Armon -0.19
Mohammed, Nazr -0.20
Skinner, Brian -0.20
Thomas, Malcolm -0.21
Mack, Shelvin -0.21
Dyson, Jerome -0.22
Beasley, Michael -0.23
Batum, Nicolas -0.24
Allen, Ray -0.24
Pavlovic, Sasha -0.25
Brown, Kwame -0.26
Martin, Kevin -0.27
Singleton, James -0.27
World Peace, Metta -0.28
Scalabrine, Brian -0.28
Williams, Shelden -0.29
Martin, Kenyon -0.29
Thornton, Marcus -0.30
Wallace, Ben -0.30
Landry, Carl -0.30
Vucevic, Nikola -0.30
Ahearn, Blake -0.30
Lee, David -0.31
Leonard, Kawhi -0.32
Orton, Daniel -0.32
Webster, Martell -0.32
Barbosa, Leandro -0.32
Murphy, Troy -0.32
Afflalo, Arron -0.33
Boykins, Earl -0.34
Dentmon, Justin -0.34
Carroll, DeMarre -0.34
Jackson, Stephen -0.35
Silas, Xavier -0.36
Morris, Markieff -0.36
Evans, Maurice -0.36
Williams, Marvin -0.36
Villanueva, Charlie -0.37
James, Damion -0.38
Miller, Brad -0.38
Garcia, Francisco -0.39
Jones, James -0.40
Walker, Bill -0.41
Horner, Dennis -0.41
Simmons, Bobby -0.42
West, Delonte -0.43
Ubiles, Edwin -0.43
Johnson, Chris -0.45
Ellington, Wayne -0.45
Collins, Jason -0.45
Farmar, Jordan -0.47
Elson, Francisco -0.49
Jackson, Reggie -0.49
Gay, Rudy -0.49
Milicic, Darko -0.51
Adrien, Jeff -0.51
Douglas, Toney -0.56
Najera, Eduardo -0.57
Ridnour, Luke -0.57
Amundson, Louis -0.58
Harangody, Luke -0.59
Ariza, Trevor -0.60
Hawes, Spencer -0.61
Turner, Evan -0.62
Summers, DaJuan -0.63
Samuels, Samardo -0.63
Howard, Josh -0.63
Morris, Marcus -0.63
Johnson, JaJuan -0.63
Miles, C.J. -0.64
Nocioni, Andres -0.65
Kennedy, D.J. -0.65
Thompson, Klay -0.66
Leslie, Travis -0.67
Outlaw, Travis -0.68
Okafor, Emeka -0.68
Gee, Alonzo -0.69
Mayo, O.J. -0.71
Pittman, Dexter -0.71
Williams, Elliot -0.72
Cook, Brian -0.73
Hayes, Chuck -0.78
Selby, Josh -0.79
Higgins, Cory -0.79
Daniels, Marquis -0.79
Jamison, Antawn -0.81
Evans, Jeremy -0.83
Hinrich, Kirk -0.84
Livingston, Shaun -0.85
Howard, Juwan -0.88
Carter, Anthony -0.89
Terry, Jason -0.89
Burks, Alec -0.92
Stevenson, DeShawn -0.93
Przybilla, Joel -0.94
Ivey, Royal -0.95
Gladness, Mickell -0.95
Jerebko, Jonas -0.98
Barea, Jose -0.98
Walton, Luke -1.01
Marion, Shawn -1.02
Aldrich, Cole -1.03
Redick, J.J. -1.03
Diop, DeSagana -1.03
Williams, Derrick -1.05
Goudelock, Andrew -1.05
Harper, Justin -1.07
Joseph, Cory -1.07
Calderon, Jose -1.07
Lewis, Rashard -1.07
Blatche, Andray -1.08
Eyenga, Christian -1.08
Smith, Nolan -1.08
Bell, Raja -1.08
Patterson, Patrick -1.09
Wafer, Von -1.10
Gray, Aaron -1.10
Fortson, Courtney -1.11
Wall, John -1.12
Carroll, Matt -1.12
McGuire, Dominic -1.13
Watkins, Darryl -1.14
Morris, Darius -1.15
Jefferson, Richard -1.16
Lee, Malcolm -1.16
Childress, Josh -1.16
Williams, Shawne -1.21
Davis, Josh -1.24
Stephenson, Lance -1.25
Wright, Dorell -1.25
Russell, Walker -1.25
Gooden, Drew -1.26
Davis, Ed -1.27
Erden, Semih -1.28
Perkins, Kendrick -1.28
Moon, Jamario -1.29
Cardinal, Brian -1.29
Butler, Rasual -1.31
Williams, Terrence -1.31
Jennings, Brandon -1.32
Wilcox, Chris -1.32
Daye, Austin -1.35
Thabeet, Hasheem -1.36
Petro, Johan -1.37
Smith, Craig -1.38
Biedrins, Andris -1.38
Jianlian, Yi -1.39
Parker, Anthony -1.42
Monroe, Greg -1.42
Telfair, Sebastian -1.43
Hansbrough, Tyler -1.45
Foye, Randy -1.48
Curry, Eddy -1.51
Harris, Terrel -1.53
Liggins, DeAndre -1.54
Tyler, Jeremy -1.54
Okur, Mehmet -1.56
Cunningham, Dante -1.57
Hughes, Larry -1.58
Richardson, Quentin -1.58
Crawford, Jamal -1.59
Young, Nick -1.60
Mullens, Byron -1.61
Thompkins, Trey -1.61
Vesely, Jan -1.61
Brooks, Marshon -1.62
Williams, Mo -1.63
Hayward, Lazar -1.63
Gordon, Ben -1.64
Gomes, Ryan -1.67
Davis, Glen -1.68
Hickson, J.J. -1.75
Blair, DeJuan -1.75
Brackins, Craig -1.75
Salmons, John -1.76
Odom, Lamar -1.77
Brown, Derrick -1.78
Mozgov, Timofey -1.78
Thomas, Lance -1.85
Young, Sam -1.90
White, D.J. -1.90
Cole, Norris -1.94
Evans, Tyreke -1.99
McRoberts, Josh -2.00
Jones, Dominique -2.05
Babbitt, Luke -2.07
Green, Willie -2.10
Flynn, Jonny -2.13
Watson, C.J. -2.13
Jones, Dahntay -2.17
Thomas, Tyrus -2.21
Crawford, Jordan -2.23
Williams, Reggie -2.24
Morrow, Anthony -2.26
Kanter, Enes -2.27
Bibby, Mike -2.29
Belinelli, Marco -2.30
Gaines, Sundiata -2.30
Johnson, Wesley -2.33
Stoudemire, Amare -2.41
Jones, Solomon -2.43
Delfino, Carlos -2.44
Maggette, Corey -2.46
Stiemsma, Greg -2.50
Knight, Brandon -2.60
Prince, Tayshaun -2.61
Jack, Jarrett -2.61
Redd, Michael -2.64
Henderson, Gerald -2.65
Fredette, Jimmer -2.72
Augustin, D.J. -2.78
Speights, Marreese -2.82
Jenkins, Charles -2.98
Pargo, Jeremy -3.02
Casspi, Omri -3.04
Clark, Earl -3.13
Evans, Reggie -3.16
McGee, JaVale -3.20
Walker, Kemba -3.23
Thompson, Tristan -3.35
Biyombo, Bismack -3.50
Warrick, Hakim -3.52
Dooling, Keyon -3.57
Sloan, Donald -3.92
Re: Guides to Creating RAPM
Here's what I used:xkonk wrote:It's been a while since I've looked at RAPM-type data... can you spell out what the non-x variables are? In particular, how did you make y and the weights? I assumed y is point differential per 100 possessions and weight is the number of possessions for a particular match-up, but then I don't think the numbers make sense. I should be able to use y and the weight to get to a whole number score difference for each stint, right? That doesn't look to be true for what I downloaded from dropbox.
X = 39192 x 479 lineup matrix
y = 39192 x 1 results vector (only includes stints where both home and away teams got a possession, value is difference between home and away points per 100 possessions)
weights = 39192 x 1 vector of observation weights (sum of home and away possessions for each matchup)
I was able to get normal results (very similar to those of mystic) by going through and using matrix multiplication, though I cannot get the numbers to work using glmnet in R.
Another question that I believe has been addressed somewhere on this forum, though I can't determine where: what are the benefits of adding a prior into the calculation of RAPM? I'm guessing the main one would be to establish a baseline for low-minute players so that they aren't all regressed to 0, but is there anything else? I hear people saying that it makes RAPM a more predictive statistic, but what does that mean? I've always seen it as an assessment of how much impact a player has or has not made in a season, so how does prediction factor in?
Re: Guides to Creating RAPM
I downloaded the data you provided, so I know what you used. I was asking more how you made those matrices. Just at the beginning of the file, you have entries where the results are -83.333 and -8.333 and the weights are 7. 7 doesn't turn either of those into a whole number difference in points. Shouldn't I be able to do something like the first entry, where the difference of 40 and weight of 10 suggests that the home-away difference was 40*10/100 = 4?dwm8 wrote: Here's what I used:
X = 39192 x 479 lineup matrix
y = 39192 x 1 results vector (only includes stints where both home and away teams got a possession, value is difference between home and away points per 100 possessions)
weights = 39192 x 1 vector of observation weights (sum of home and away possessions for each matchup)
I was able to get normal results (very similar to those of mystic) by going through and using matrix multiplication, though I cannot get the numbers to work using glmnet in R.
Adding a prior helps the algorithm keep players closer to a better estimate of their actual value, assuming that the previous year's RAPM or SPM or whatever is a better estimate than 0. That means the algorithm can make a better guess with less data in the current year. Having the information that LeBron is really good, the algorithm can comfortably say that in the current season he's really good as opposed to pretty good. If you use player ratings to predict a later season, they're more accurate when they think LeBron is really good as opposed to pretty good. Or to describe it a different way, predictions are typically better when estimated values are regressed to the mean. RAPM with a prior regresses the player estimates to something like each player's mean instead of regressing them to a mean of 0. It turns out that regressing to an individual mean is better.dwm8 wrote:Another question that I believe has been addressed somewhere on this forum, though I can't determine where: what are the benefits of adding a prior into the calculation of RAPM? I'm guessing the main one would be to establish a baseline for low-minute players so that they aren't all regressed to 0, but is there anything else? I hear people saying that it makes RAPM a more predictive statistic, but what does that mean? I've always seen it as an assessment of how much impact a player has or has not made in a season, so how does prediction factor in?
Re: Guides to Creating RAPM
No, you shouldn't. -83.333 is the difference between the home rating and the away rating, where either rating maybe based on a different amount of possessions. In that specific case the home team scored 2 points in 4 possessions, which gives them 50 as their rating, and the away team scored 4 points in 3 possessions, which gives them a rating of 133.333. 50 - 133.333 = -83.333. Completely normal result. The issue dwm8 encountered is not explained by the used data.xkonk wrote:Just at the beginning of the file, you have entries where the results are -83.333 and -8.333 and the weights are 7. 7 doesn't turn either of those into a whole number difference in points. Shouldn't I be able to do something like the first entry, where the difference of 40 and weight of 10 suggests that the home-away difference was 40*10/100 = 4?
Re: Guides to Creating RAPM
Has anyone here had success breaking RAPM into offensive and defensive splits that make sense? As I mentioned, I was able to get some solid overall RAPM numbers thanks to help from mystic, but I'm having some problems while trying to get the splits of the data.
The two methods of which I am aware of splitting RAPM are
1) Eli Witus' method at http://www.countthebasket.com/blog/2008 ... lus-minus/
2) EvanZ's method of creating an offensive and defensive variable for each player and using the offensive team's rating as the dependent variable
Using the my same 2012 dataset, I tried both methods and got very extreme results. The offensive splits were very high and the defensive splits low (Antawn Jamison had a combined RAPM of -0.8 with an ORAPM of 13.9 and a DRAPM of -14.7). I'd be interested in hearing about how you guys go about performing these calculations and what the results generally look like. Thanks for all of the help, everyone!
The two methods of which I am aware of splitting RAPM are
1) Eli Witus' method at http://www.countthebasket.com/blog/2008 ... lus-minus/
2) EvanZ's method of creating an offensive and defensive variable for each player and using the offensive team's rating as the dependent variable
Using the my same 2012 dataset, I tried both methods and got very extreme results. The offensive splits were very high and the defensive splits low (Antawn Jamison had a combined RAPM of -0.8 with an ORAPM of 13.9 and a DRAPM of -14.7). I'd be interested in hearing about how you guys go about performing these calculations and what the results generally look like. Thanks for all of the help, everyone!
-
- Posts: 1
- Joined: Tue Nov 03, 2015 6:34 am
Re: Guides to Creating RAPM
In glmnet, set the standardize argument to FALSE. What's happening is that for players with small numbers of possessions, their indicator variables are being standardized to have the same variance as all other players' indicators. So the variables are scaled up, which means that smaller values of the coefficients for players with less playing time correspond to larger values of the coefficients for players with more playing time. So players with less playing time see their coefficients regularized less, in a sense.mystic wrote:That is an issue with GLMNET, but I could not figure out yet, what the main reason is. When you run the calculation like I described in this thread (you need to add the weight to that equation, obviously, but I think I described that already in the PM as well as how to incorporate a prior), that's the result when using the bbv dataset for the 2012 RS (lambda = 3000):
...
When I use GLMNET, I also get "weird" results when using the same raw data (dataset from bbv). I noticed that before and just simply calculated RAPM via my own script (which isn't that hard to write, if you know matrix algebra and follow my description in the mentioned thread). Thus, I argue that is not the result of you doing something wrong, but something in the GLMNET script is doing something, which causes those weird results. It is not reasonable at all to get such high values for players with such low amount of possessions like GLMNET is producing.
So for example, do
Code: Select all
fit = glmnet(x, y, alpha = 0, standardize = FALSE, ...)
-
- Posts: 416
- Joined: Tue Nov 27, 2012 7:04 pm
Re: Guides to Creating RAPM
Glmnet definetely produces high values for players with low possessions. It uses an efficient method to provide a very accurate "estimate". It's not the real deal. Unfortunately I don't know any other package that includes weight option for penalized regressions afaik R goes.