permaximum wrote:A Gravity Well wrote:What are the alternatives to not including it as a variable?
Calculate HCA at the team level with SRS home/away difference compared to league average but don't forget to include the effect of B2B. This way you'll have all the data you need for all arenas instead of being limited to PBP data 2000+ (1996+ if you're lucky). When you find HCA(per 100 offense and defense poss) for all arenas, adjust possession scores (results vector J.E. pointed out above) in the regression ( e.g.: 300 (possession score * 100) - 2.4 (hca) home team on offense or 400 + 2.4(hca) home team on defense). This way you won't have different values for HCA on defense and offense but you'll have a more reliable value.
BTW score/mov/rating (whatever you call it) for each poss should always be positive in your regression if you're following what I described.
Edit: If I were you I would just use the values J.E. found in
this thread. There are HCA values for teams along with B2B and rest effects. Since he calculated it via one big regression (2002-2014 i guess) the sample size should be enough. Then I would do what I described above.
Edit2: It looks those HCA values are not per 100 possession but per game. So you simply need to do: "HCA*2/1.95". Remove charlotte hornets from the list and use charlotte bobcats' value. Then normalize those values. HCA values should never be negative.
If I'm understanding you and the fine posters here correctly:
To calculate HCA for each team over N years:
1/0/-1 dummy variables: 1 for Offense, -1 for Defense
Result vector of points per 100 possessions for the offensive unit of the matchup
b2b.rh.ot = second night of a back to back (b2b) of the road/home (rh) variety, first game finished in overtime (ot)
r.2.r = rest (r) of last playing two (2) days ago, last game played on the road (r)
First game: Boston Celtics @ Atlanta Hawks
Atlanta last played three days ago on the road, r.3.r is switched on (1 when on offense, -1 when on defense)
Boston last played two days ago at home, r.2.h is switched on (1 when on offense, -1 when on defense)
hca.o is a 1 when Atlanta is on offense
hca.d is a -1 when Atlanta is on defense
Atlanta tallies a 105.7 offensive rating
Boston tallies a 99.9 offensive rating
Modifications
J.E. mentioned not using 1/-1 for the same variable -- should rest effects, then, be split into offensive and defensive halves?
Don't use hca.o and hca.d for each team -- just use one column "hca", set to 1 when the home offense is on the court
Go possession by possession rather than game by game to account for gamestate considerations (More on this to come)
Travel effects, but implementation seems beyond arduous -- would need a team's travel schedule to track whether they return home between road games four days apart. Would then need to track whether they arrive the day of or the day before or earlier. Have columns that are bins of distances from home or distances from last played game? 0-150, 151-300...
And then
After getting the values for rest effects and each team's home court over N years, when running RAPM for players for a specific year, adjust the result vector for each possession or each lineup stint by the values of the previously-found effects present for that possession -- home court, rest effects, travel effects (?) and gamestate -- or series of possessions. (If using gamestate, likely just go possession by possession, as a stint can have multiple gamestates within it).
To drill down further, run the same regression as earlier which found each team's home court advantage, but run it for general league-wide home court advantage instead of team by team -- not for their numbers, but for the ratio of their values. Use that ratio to then divide up each team's HCA among offensive and defensive possessions when adjusting the result vector.