Page 1 of 2

Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Thu Feb 06, 2014 7:05 pm
by jkeen39
Interesting article from Kirk Goldsberry here about another approach to developing an all-in-one value for each player called EPV or Expected Possession Value: http://grantland.com/features/expected- ... analytics/

The paper behind the article that Kirk worked on (which I haven't had a chance to read yet but wanted to get the dialogue going and see what others thought) which is for the upcoming Sloan Conference can be found here: http://www.sloansportsconference.com/wp ... l-Time.pdf

I really appreciated Kirk discussing this topic and the logic behind it makes a lot of sense, but we are clearly in the infancy stages of using SportVU data. Regarding the Spurs possession he breaks down, there are already holes to poke in the analysis. Kirk talks about the value of Duncan setting a good screen up top for Parker. Yet in the model, only Parker gets credit for the increase in EPV for his drive. Another problem is that the model seems to heavily weight what the person with the ball is doing, which would seem to favor guys that have the ball in their hands a lot (and in the paper, many of the top EPVA players are PG’s or ball-dominant players like Jamal Crawford. At the end of the day, it seems like this model is trying to determine an overall value for each player (similar to PER or WARP), but I take issue with a model that doesn’t have LeBron or Durant in it’s Top 10 (yet has Jamal Crawford and Greivis Vasquez). Similarly it has Kevin Love and Russell Westbrook as two of the worst players, even worse than horribly inefficient players like Evan Turner, Austin Rivers and Rudy Gay. Perhaps, however, some of our current “advanced statistics” are overvaluing these guys but I have a hard time seeing how this initial attempt is an improvement on PER, ORtg, or WARP. The article does clarify that it is based on 2012-13 SportVU data which only some teams had and therefore sample sizes for some players (including LeBron) were limited. Will be interesting to see if they re-run the data on the 2013-14 season what the results bear out.

Anyways, here are the top 10 players from the 2012-13 season in "EPVA":

1. Chris Paul 3.48
2. Dirk Nowitzki 2.60
3. Deron Williams 2.52
4. Stephen Curry 2.50
5. Jamal Crawford 2.50
6. Greivis Vasquez 2.46
7. LaMarcus Aldridge 2.40
8. Steve Nash 2.09
9. Wesley Matthews 2.06
10. Damian Lillard 1.95

And here are the Top 10 Worst players:
1. Ricky Rubio -3.33
2. Kevin Love -2.38
3. Russell Westbrook -2.07
4. Evan Turner -1.90
5. Austin Rivers -1.84
6. Rudy Gay -1.75
7. Jrue Holiday -1.51
8. Paul George -1.49
9. Chris Singleton -1.48
10. Roy Hibbert -1.44

The other big issue (and perhaps this is their next step) is that the model doesn’t delve into assigning value on the defensive side and I think that area is going to be even more challenging in determining the value each player delivers. Simple things like having a foot in the lane and helping off your guy provide value in stopping a drive but will be difficult to quantify. That being said, I think we have a number of data points from SportVU and other sources that can clarify the value provided by a player on the defensive side. And perhaps the idea of coming up with another all-in-one model isn't the best approach. Even such a model that provided a player's overall value (both offensively and defensively) wouldn't necessarily tell you what aspects of their skills on offense and defense would fit on another team (if the aim was trying to understand how trades, free agents, etc. would impact a team).

Overall, I think the concept is sound and I'm sure Kirk and others are working on fine-tuning it. I think the current value at the team level would be understanding what passes to try to either stop or to encourage the other team to make, leading to a lower EPVA. However, I think teams already know a lot of this stuff and many of the best defensive teams focus on forcing mid-range shots, taking away Corner 3's, etc. Anyways, I do appreciate the discussion and think it will be interesting where this heads as teams and outside analysts have more time to comb through the SportVU data.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Thu Feb 06, 2014 7:11 pm
by Crow
I wrote this last night. It repeats some of the points just made but I'll post it as is to save time.

The Pointswise paper for Sloan uses spatial data analysis with a calculation of expected point value based on the available options to find EPV-added. (It shares much of the goal I had in this thread viewtopic.php?f=2&t=8462 but uses a different technique. It also shares much of the goal I had several years ago when I wondered conceptually about trying to use specialized software (and perhaps some calculus) to capture team player movement and assign value for micro-level movements / actions.) I am glad to see some folks with the right skills, tools and opportunity to actually dig in and produce a good start on what could be a gold mine path. The amount of human and computing resources used explains why it has been out of reach for all but the most ambitious and provisioned.

The article focuses on value of possessions used. It does not get at a player’s personal off the ball value though I think the principles and techniques laid out could do so (i.e. comparing the value of passing to a player in a spot and moment to the league average value for passes for all positions and players). It should be able to handle the decision to prepare help defense or not for defensive EPV-added as well.

The article suggests that it considered closeness of the defender in computing EPV of shots from spots. I would like to hear more about that. In the long run there is a need to have the EPV-added for defense fully developed including for help defense because the value of being close to your man is linked to the value of the help defense who can provided from being one distance to that player or less close to him and closer to other players. The value of a player off the ball being in one spot or another is also affected by what it does to the defensive player’s options and EPVs in slightly or very different spots. The chance of an uncontested or lightly contested shot can be affected a lot by very small movements of offensive and / or defensive players.

The article states that “Standard techniques such as regression, analysis of variance, and generalized linear models are ill-suited to these problems.” Maybe so but I still think there might be some value trying to use them to generalize the value of being in certain spots at certain times.

We are some distance from getting at overall value which would be the sum of EPV-added from possessions used, EPV-added from off the ball offensive action and EPV-added from defense (on the ball and help). I’m hopeful that the authors of the article will get there and publish the results publicly but there is of course the chance that this level of achievement will be brought and kept private thru consultation with a team or teams. Eventually these values (the roll-up and the parts) need to be compared to RAPM values. It is possible that EPV-added could help identify significant errors in specific player RAPM estimates and that RAPM and its factor level parts might be able to help find and better characterize important elements of a player’s performance not fully defined by EPV-added studies.

I’d think it would be useful to analyze set plays. The 10-40 most common by a team or league as a whole. After looking at the actual plays and the EPV movement, I would think one could develop more reliable coaching advice about what players should do on the ball and off on those plays and consider major breaks / innovations from the designed play that increased EPV. On defense I wonder if this pathway will eventually be able to tell a team’s analysts and coaches how much of a difference in EPV-added there is for a player being 1-2 feet in any direction on defense when facing a certain play or in general terms and which of the thousands of such differences that the coaches and players should most focus on.

Shot satisfaction findings should be an important addition to the discussion of the efficiency of “shot-creators”. I agree that pass satisfaction is also important to pursue, especially for guys like Rubio and Rondo.

With Rubio and Love being #1 and #2 on negative EPV-added behavior for a total of almost –6, it would seem to be appropriate to ask if their actual basketball IQ upon which they base their behavior is usually bad or if their coaching is usually bad for them or both?

Westbrook as 3rd worst on EPV-added. Would knowing that change anything Presti, Brooks and Westbrook are doing / not doing with his FG frequency and passing behavior? It probably should even if he has overall strong offensive RAPM playing the way Russell wants / knows how to play.

Evan Turner 4th worst? Will anyone give anything worth much for him in this trade market? I would not give much value to get him or give him a large role.

George and Hibbert both in the 10 worst. Still work to do there.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Thu Feb 06, 2014 9:04 pm
by steveshea
Eventually, I believe the authors will have to address the underlying set of instances, and maybe partition the space into the meaningful instances and others. This will be very challenging, but I worry that players' EPVs are being unfairly damaged by passes early in a play. Consider the following example. At the beginning of a play, Rubio might pass the ball to a Pekovic or some other non 3-pt threat at the top of the key, not because Rubio believes he's setting Pekovic up to score, but because it's the first pass in a sequence of events that are designed to result in a high quality shot. Is Rubio accruing negative points for this exchange? Might this type of ball movement be hurting the Indiana duo in the bottom 10 as well?

In general, are there instances where taking a slight decrease in EPV is likely to improve future EPV in the possession? As another example, what if a player gets the ball on the block and anticipates a double team. After a kick out, the wing player dumps the ball back to the block where the player is single-covered. If the player on the perimeter is not a good 3 point shooter, this might result in a negative for the man on the post (for the kickout). I'm not sure.

As stated above, one solution may be to attempt to rule out these misleading instances/decisions (by partitioning the space). I have some ideas as to how to do this objectively, but every idea I come up with, I then poke several holes through.

Another possibility is to go away from the Markovian model, and instead look for something that is future variable length (or sometimes called finitarily) Markov. In such a model, you could choose not to assign any individual a very negative score for a decision during a possession that eventually led to a high quality shot. This would address any negative coming from the first few passes of a well-designed play.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Thu Feb 06, 2014 10:48 pm
by Mike G
Ugh

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Thu Feb 06, 2014 11:32 pm
by bchaikin
The other big issue (and perhaps this is their next step) is that the model doesn’t delve into assigning value on the defensive side and I think that area is going to be even more challenging in determining the value each player delivers.

ricky rubio plays 31 min/g, shoots 36% on 2s, 34% on 3s, but takes just 7.7 FGA/g, 2.8 FTA/g, i.e. that's only 9.1 ScOpp/g (scoring opportunities per game)...

when i run a simulation with him playing 32 min/g, the t-wolves go 48-34...

when i change his 2pt FG% to 50%, and his 3pt FG% to 40%, and rerun the simulation, the t-wolves go just 3-4 wins better, from 51-31 to 52-30. not a huge difference, but again he simply does not shoot that much...

yet he also leads the league with 8.4 ST/100min when the league average PG gets just 3.8 ST/100min - that's 215-220 steals playing 32 min/g and 82 games...

when i change his steal rate from 8.4 to 3.8, and rerun the simulation (with him shooting his current 36% on 2s and 34% on 3s) the team goes from 44-38 to 45-37, 3-4 wins worse...

so this "huge" change is his shooting - 36%->50% on 2s and 34%->40% on 3s - is equivalent in wins, about 3-4 wins, to a drop in steal rate from 8.4 to 3.8 ST/100min...

any time we address the capabilites of an individual player in terms of wins generated compared to other players, we need to address all that he does, and not just concentrate on one "bad" aspect of his game...

even if someone was to say "...well, if rubio shot better he would then likely take more shots than just 7.7 FGA/g...", which is most likely true, we can still model this, by simply adjusting in the simulation how often he shoots per touch...

but the fact that is his high rate of steals - plus his excellent defensive rebounding for a PG, and overall man defense - often get overlooked because it's easy to pick on his awful shooting...

since 1977-78, looking at all PGs ages 21-23 that played at least 1000 total minutes in the nba (214 PGs), ricky rubio has the 6th highest per minute defensive rebounding rate, and the 7th highest steal rate...

since just 1989-90, looking at all PGs ages 21-23 that played at least 1000 total minutes in the nba (138 PGs), ricky rubio has the 4th highest per minute defensive rebounding rate, and the best/highest steal rate...

so he certainly has a ton of value in terms of wins generated over and above the multitude of other PGs that in the same age range shot similar overall (sebastian telfair, troy hudson, frank johnson, etc.)...

also rubio has shot a 46.7% ScFG% (2s, 3s, and FTs) in this age range, jason kidd shot a 46.8% ScFG%...

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Thu Feb 06, 2014 11:36 pm
by Bobbofitos
Mike G wrote:Ugh
Agree w/ this.

I was very excited for Grantland when I saw the title, but there is a lot of work to be done. The results spat out seem so wrong.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 3:47 am
by Bobbofitos
X-post from another forum, since I think it does a great job of showing why this metric is very flawed...
Neither.

EPVA's correspondence with offensive value is (or could be) hurt by its dependence on the distribution of situations a player finds himself in.

Imagine a player who is capable of off-the-ball teleportation (!). As such, he attempts ten assisted wide-open corner 3s a game. Unfortunately, he's not a particularly great shooter, knocking down those looks at "only" a 34% clip--4.2% below league average. He's docked 1.26 EPVA per game on this set of plays alone, despite the fact that his ability to help generate additional >1.02 EPV possessions at will--many of which could be bail-outs--is great for his team and very valuable (though much less so when they're playing the Tallahassee Time-Freezers). He'd perform better according to EPVA if he avoided these favorable situations altogether. Rubio, unguarded and near the rim, could improve his EPVA by flailing his arms vigorously in a don't-give-me-the-ball manner, or by somersaulting out of the path of incoming passes.

EPVA is okay at the specific thing it can do, but it's purportedly a measure of "offensive value" and a means of quantifying decision-making ability*. All of this might sound like just a semantics argument, but I'd have a gripe with RAPM too, given its current aims, if I found out the formula was "[miles traveled on horseback per year][sup]2[/sup] - 10."


*The title of section 3.1 is "EPV-Added: Does Chris Paul make better decisions than the league-average player?"
(different author)

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 3:53 am
by v-zero
Another obvious issue to add here is that what a player does in possession N may well not only impact that possession, but also future possessions whose value depends on what the defense expects of a player given their recent behaviour.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 4:39 am
by sethypooh21
Agree with most of the critiques, but we should definitely keep in mind this is the first iteration of something, rather than a finished product. I was thinking about something similar with respect to game-state EV just the other day as a way to measure shot selection (or specifically quantify bad shot selection), and this model while not perfect in any way moves a little in that direction.

Unfortunately, since it's Grantland/Goldsberry that published, we're in for weeks of "well I read Ricky Rubio is the worst offensive player in the league" stuff.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 4:40 am
by sethypooh21
v-zero wrote:Another obvious issue to add here is that what a player does in possession N may well not only impact that possession, but also future possessions whose value depends on what the defense expects of a player given their recent behaviour.
This is a great point too. Because why wouldn't Steph Curry just launch of top of the arc 3 on 100% of possessions he dribbles down? is a question that gets asked.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 7:09 am
by mtamada
Mike G wrote:Ugh
Gotta agree. I'm reminded of DB's results that showed Dennis Rodman was the best player in the NBA. But he was merely running regressions in a different functional form rather than doing pathbreaking analysis, and moreover made the mistake of believing his own models. (The econometrician Henri Theil once wrote: "Models are to be used, not believed.")

Goldsberry's group is doing pathbreaking analysis (but, so is every person who's working with the new video data), and he's done good work before, and hopefully they will not fall into the trap of believing their own numbers. Instead they will hopefully keep working on and tweaking and improving their models.

Bottom line: their techniques for assigning credit for raising or lowering EPV have flaws that one could drive a truck through. Resulting in useless stats. That doesn't mean that their work is useless; every bit of research utilizing the new data is going to be flawed, tentative, experimental, etc. Eventually we will get better models and better results, thanks to the work of Goldsberry and all the others working on these data.

But those results are just not ready for prime time, or even semi-prime time. If Goldsberry were a professor (which ironically enough, he is), that's the kind of result that you share with your buddy down the hall, and then go back to the computer to work some more until you get something useful. I wouldn't even put results like that into a working paper, nor share them in a seminar, and certainly not present them at a national conference. Get better results that are at least semi-useful, and then start sharing them outside your department.

It's exciting work, but the results are still half-baked. Or maybe even just quarter-baked.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 5:29 pm
by bbstats
This sort of highlights the benefits of regression-style player rating systems: they don't really require *any* theory, and give us reasonable results. What professor Kirk has made here is 100% "theory-based," along the same vein as ORTG, DRTG, etc.

I think the main issue here, as you all have been pointing out, is that his theory-based model has some huge caveats (i.e. giving all credit to the scorer, none to the screener etc).

This is why I really like models that combine theory and regression, i.e. BoP SPM, A4PM etc.

Personally, having 2013 Russell Westbrook at the bottom of my metric would make me IMMEDIATELY re-evaluate my metric...I think a big problem with the metric is saying that a player's decision 100% impacts their rating. What if something is a set play that the coach always runs for an inefficient chucker like Monta Ellis? It wouldn't be fair to dock the passer points there...

So perhaps the larger problem of why this metric fails somewhat is its inability to be predictive. One could argue that yes, the expected values changed EXACTLY as Goldsberry has measured...however, an inability to predict future possessions makes it worth much less as a metric.

/my2cents

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 7:10 pm
by AcrossTheCourt
Years ago when people were first developing models, there was a "laugh test." If Shaq wasn't at the top, then your model was wrong. I think we can apply that here with LeBron (and Durant.)

And how do you not assign credit for the screener with SportVU data? That seems like one of the most basic, coolest features of having that spatial data. I'll assume it will be included with the next model because that's one of the stats I really wanted with SportVU.

Goldsberry really loves his ShotScore metric, which gives credit for shooting above the average from zones, not giving credit via actual points. That looks like the main problem here, and it's why Westbrook (below average shooter in many zones but still gets to the rim/line a lot) is penalized while Chris Paul and Dirk (elite midrange shooters) are at the top. I understand hittiing a tough midrange shot when the clock is winding down is valuable, but it appears he's giving too much credit here.

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 9:09 pm
by KirkG
Be nice you guys... I'm delicate. But, yeah, thanks for reading, we were careful to really temper our enthusiasm about the "results" and try to emphasize the ideas... We view this as a nice first step and not some kind of panacea.

The model is full of shortcomings right now, and everyone involved remains unsatisfied. Eventually - like ASAP - we need to consider the impacts of off-ball players and individual defenders not to mention 1,000 other things.

Still I'm really proud of the Grantland piece and the Sloan article... you guys are the best, and if you ever have ideas or suggestions or criticism, I'm all ears. For those coming to Sloan this year, please use that opportunity to tell me what we can do better.

kg

Re: Kirk Goldsberry Article & Sloan Paper - Databall

Posted: Fri Feb 07, 2014 10:45 pm
by sethypooh21
Thanks for responding Kirk. I think the biggest problem is only sort of in your control - given the podium you have, the "results" are going to be amplified and refracted across the basketball internet. Ricky Rubio worst offensive player in the league is now sort of a meme. I understand why you have to include examples in the article, but it still leaves a bad taste that these things quickly become conventional wisdom among a certain strata of observer.