APBRmetrics

The discussion of the analysis of basketball through objective evidence, especially basketball statistics.
It is currently Wed Jul 23, 2014 4:20 pm

All times are UTC




Post new topic Reply to topic  [ 1 post ] 
Author Message
PostPosted: Fri Apr 15, 2011 10:44 am 
Offline

Joined: Thu Apr 14, 2011 11:10 pm
Posts: 2245
Author Message
dsparks



Joined: 22 Feb 2008
Posts: 61


PostPosted: Mon Feb 25, 2008 6:59 am Post subject: NBA playing style similarity network diagram Reply with quote
Hello all,
I am wondering if I could have some feedback on my work on the network diagrams here: http://arbitrarian.wordpress.com/2008/0 ... -networks/

There are links to methodological details within that post. I am under the impression that others have used factor analysis to isolate statistics on which to do matching, but I was attempting to go as "objective" as possible. I would sincerely appreciate any commentary you may have, including:
a) Do the comparisons seem generally valid?
b) Are there comparisons that surprise you, but on reflection, make sense?
c) Are there comparisons that are so counterintuitive that they couldn't possibly be valid?

Thank you for your time.
David


http://arbitrarian.wordpress.com
Back to top
View user's profile Send private message
Mike G



Joined: 14 Jan 2005
Posts: 3615
Location: Hendersonville, NC

PostPosted: Mon Feb 25, 2008 9:01 am Post subject: Reply with quote
Fascinating, to say the least. At first glance, it looks 3-D. Of course, there are even more D's than that.

Mostly I'm seeing intuitive matches. Browsing the perimeter, we see rebounders who don't score, scorer-rebounders who don't pass, etc.

Near Larry Bird, I find Kevin Garnett and Julius Erving. OK. But there's Tayshaun Prince -- other than height, almost the opposite player. Statistically-similar Grant Hill is clear over on the far side of the field.

Following this link:
http://arbitrarian.wordpress.com/2008/0 ... -matching/
... you've described a formula:

# Generate set of names and all boxscore counting statistics.
# For each player in the set, generate a set of ratios for each boxscore stat over every other.
# Percentile these, so that ratios with typically low values (e.g. pts/fga) are not outweighed by those with typically high values (e.g. min/bk).
# Find the Euclidean distance between each pair of players in n^2-space, by finding the square root of the sum of squared differences between each player’s ratios


So, if Prince gets 2.4 times as many points as rebounds, he is considered similar to Bird (in this ratio); even if one player is a 24-10 producer, while the other is a 12-5 guy. Am I understanding that correctly?

I also do similarities using boxscore stats and Euclidean distance. I just use per-minute rates, pace-standardized. A 24-10 player (per 36) looks more similar to a 25-8 player than to a 12-5 player. In a game, they're more interchangeable.
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
Ed Küpfer



Joined: 30 Dec 2004
Posts: 787
Location: Toronto

PostPosted: Mon Feb 25, 2008 9:33 am Post subject: Reply with quote
Thank you, I've been waiting for someone to get on this. Miscellaneous observations:

* In one of your posts, you wrote

dsparks wrote:
... trying to come up with a way to estimate players’ positions from the data. I did come up with a pretty novel method, which works about 70% of the time...

Can you describe more precisely what the 70% means?

* From experience, I know the "percentalization" method of normalisation works fine. However, I am concerned about the choice of inputs, which is something I had trouble with. For example, you have

dsparks wrote:
“Big” = (tr+bk)/(fga+tr+as+st+bk)

If you add turnovers to that, would it change the distances significantly? I would have more confidence if you validated by substituting (or differently weighting) other stats. (Wait, now I read this, which seems to account for it. I'm a little confused now as to what the Big formula is for.)

* I love the output graph, but the names are hard to read. Can you make the font a different colour than the nodes?


_________________
ed
Back to top
View user's profile Send private message Send e-mail
dsparks



Joined: 22 Feb 2008
Posts: 61


PostPosted: Mon Feb 25, 2008 10:58 am Post subject: Thanks for responses Reply with quote
First of all, thanks a lot for responding. This is just a free-time hobby for me, and while I think it's interesting, I don't really have the level of basketball (or statistical) knowledge I would like to have, so I appreciate this community's feedback and interest.

Quote:

So, if Prince gets 2.4 times as many points as rebounds, he is considered similar to Bird (in this ratio); even if one player is a 24-10 producer, while the other is a 12-5 guy. Am I understanding that correctly?


Using the ratios method I described, yes, this is correct.

Quote:

I also do similarities using boxscore stats and Euclidean distance. I just use per-minute rates, pace-standardized. A 24-10 player (per 36) looks more similar to a 25-8 player than to a 12-5 player. In a game, they're more interchangeable.


This is a good idea. I went ahead and made a network diagram using (un-pace-adjusted) per-minute statistics, and posted that here: http://arbitrarian.wordpress.com/2008/0 ... revisited/

The matches look as good or better than using the ratios methodology, I would say, especially if you're looking for "substitutability" comparisons. Here, your 24/10 guy looks more like a 25/8 guy than a 12/5 guy. Bird is still most closely matched with Garnett and Webber (which I love, because I think all-around guys like that are just awesome--Bird is one of Lebron's closest matches, too.), and Prince is way far away. The only downside is that you lose the "fun counterintuitives" like Mullin and Jordan's proximity. Also, I think there is something to be said for the idea that, using the ratios method, you're getting two players who have a similar probability distribution over what they will do when given a possession, regardless of per-minute efficiency. However, I'm glad you suggested the per-minute method--it looks good, and I love an excuse to make another diagram.

Quote:
In one of your posts, you wrote

dsparks wrote:
... trying to come up with a way to estimate players’ positions from the data. I did come up with a pretty novel method, which works about 70% of the time...

Can you describe more precisely what the 70% means?


This quote refers to something that's not published on the blog, but can be seen here: http://www.duke.edu/~dbs9/envisioning/f ... itions.txt
The p0s column is an estimated position, based on "ideal types". I don't want to get all into the details now, but the methodology is similar to the ratio matching method. Mostly, I was just messing around, but the results were interesting enough to at least publish in part. What I mean by 70% is that in the small sample I link to in the .txt above, I believe I have a 70% accuracy rate in correctly identifying each players' dominant position. Again, I haven't fully vetted this, but if there's enough interest, I'd be willing to investigate it futher.

Quote:
dsparks wrote:
“Big” = (tr+bk)/(fga+tr+as+st+bk)

If you add turnovers to that, would it change the distances significantly? I would have more confidence if you validated by substituting (or differently weighting) other stats. (Wait, now I read this, which seems to account for it. I'm a little confused now as to what the Big formula is for.)


The Big, Shooter, and Guard characterizations, as well as the color scheme in general, are just that: used for the color scheme and not the analysis. Raw box score stats are used to generate the ratios and proximities. However, I think that the RGB scheme does a pretty good job of highlighting playing style, broadly construed, and I think the aesthetic is extremely compelling. I would be willing to listen to other suggestions for color-classification, although I'm pretty smitten with this one, especially given that I'm not basing any real conclusions on it.

Quote:
I love the output graph, but the names are hard to read. Can you make the font a different colour than the nodes?

If you go to my update post here: http://arbitrarian.wordpress.com/2008/0 ... revisited/, at the bottom, I've made a high-contrast version just for you.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527


PostPosted: Mon Feb 25, 2008 1:41 pm Post subject: Reply with quote
Glad you made it here. I'd been by your site a few times in the past and noted it and got back to see the recent work and almost posted and linked to it but haven't blown up the charts to where I can read them and all the supporting documents yet. I may comment later.
Back to top
View user's profile Send private message
dsparks



Joined: 22 Feb 2008
Posts: 61


PostPosted: Mon Feb 25, 2008 7:46 pm Post subject: Another diagram, with a twist Reply with quote
I hope I'm not getting redundant, but I have put together yet another diagram--this time loosening the network bonds a little bit, to produce a multitude of smaller clusters. If you're not bored yet of the whole idea, you might find it interesting:


The microcosmic NBA petri dish: http://arbitrarian.wordpress.com/2008/0 ... etri-dish/
_________________
David

http://arbitrarian.wordpress.com
Back to top
View user's profile Send private message
Harold Almonte



Joined: 04 Aug 2006
Posts: 616


PostPosted: Mon Feb 25, 2008 8:03 pm Post subject: Reply with quote
I think you need to put some carthessian order to that dish.
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527


PostPosted: Tue Feb 26, 2008 12:47 am Post subject: Reply with quote
There are some significant threads showcasing some of Ed's work with similaities studies, cluster or factor analysis and what have you in back pages of the forum that he can point you to or you can find by searching his posts.

But for what it may be worth in response to this statement in your most recent article:

"I would be very interested in collectively coming up with a sort of “baller’s taxonomy,” wherein we try and identify the different clusters using some more subjective terms. I think we could come up with a better vocabulary to describe players and define playing styles. If you have any ideas, please put them in the comments"

this thread had my earlier attempt at a list of player types / labels and some discussion with Ed and others that you might find interesting or of use.

http://tinyurl.com/2grwce
Back to top
View user's profile Send private message
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 2:42 pm Post subject: Reply with quote
So I decided to try a neural net clustering of these distances, using the correlations between each of the players. It seems like a 4x3 model fits pretty well. Some highlights:

12 nodes, having ([13 67 102] [6 24 12] [44 33 20] [92 59 28]) members

nodes (1,2) and (1,3) are the most similar, comprising 169 players
nodes (4,1) and (4,2) are the next most similar, comprising 151 players

2 broad classes are formed, a big men class, comprising nodes (1,1),(1,2),(1,3),(2,2),(2,3) and the not big men being the remainder

node (1,1) is the node furthest from any other, with no big name players (some members include Derrick McKey, Tom Gugliotta, and George Lynch)

NBA Top50 players (many of them are not in this data set):
(1,2)
Dave Cowens, Wes Unseld
(1,3)
Kareem Abdul-Jabbar, Patrick Ewing, Elvin Hayes, Moses Malone, Kevin McHale, Hakeem Olajuwon, Shaquille O'Neal, Robert Parish, David Robinson
(2,3)
Charles Barkley, Julius Erving, Karl Malone
(3,1)
Scottie Pippen
(3,2)
Larry Bird, Clyde Drexler, James Worthy
(3,3)
George Gervin, Michael Jordan
(4,2)
Tiny Archibald, Magic Johnson, John Stockton, Isiah Thomas
Back to top
View user's profile Send private message
dsparks



Joined: 22 Feb 2008
Posts: 61


PostPosted: Fri Feb 29, 2008 3:01 pm Post subject: I'm glad to see others have worked on this, too Reply with quote
Harold: When I read your post, I didn't know what Cartesian order was, but then I looked it up, and I think you're right, except I don't know how to apply it to my plots.

Mountain: The thread you reference is a good read. I think it's interesting how there are so many approaches one could take to a classification system. I am, personally, a little reluctant to identify classes first, and then put players into them. This board is great because it seems like there are so many smart people with so many talents and ideas, just waiting to take a crack at any problem.

findingneema: Neural net clustering is a cool idea, albeit yet another subject I know little of. I would love to see a .csv of your output, if you're willing to share. I also seem to get a distinction between big men and ~big men in my analysis. I wonder if this is the "primary" fundamental axis of difference among players, or if this finding is just a function of our methods or the available statistics. For example, would we instead expect the "primary" axis to be defensive-leaning vs. offensive-leaning? Or scoring-minded vs. other-things-minded? This is all interesting stuff to me.
_________________
David

http://arbitrarian.wordpress.com
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 787
Location: Toronto

PostPosted: Fri Feb 29, 2008 3:25 pm Post subject: Reply with quote
I'm so happy to see some other folks try their hand at this, especially folks who (unlike me) know what they're doing.

I am concerned about validation. When I do classification, I usually run it at the player-season level -- I can then check each player-season's class against adjacent seasons. dsparks, I may have missed the post where you described your dataset -- are those career numbers for each player, or a particular season?

Also, your concern about a priori classes is relavant. However, from my perspective, I know certain classes of players exist -- these classes may be subjective to some degree, but there is wide concensus on the types. For example, the low-usage rebounding big defender is "out there" in some real sense. My primary interest is in a) seeing if these a priori classes "exist" as statistical classes (answer so far: some do, some don't), and b) seeing how well we can classify these players into homogenous groups (answer: some well, some not so well).
_________________
ed
Back to top
View user's profile Send private message Send e-mail
dsparks



Joined: 22 Feb 2008
Posts: 61


PostPosted: Fri Feb 29, 2008 3:34 pm Post subject: Reply with quote
Ed: Your validation concern is a good one. The statistics I'm using are for the players' careers (actually, it's even narrower than that: the part of each player's career that falls between 1979-80 and 2006-07, which cuts a lot of careers short, like Kareem, for example--so this is not an ideal dataset. I offer a somewhat lame reason for my selection of these years here: http://arbitrarian.wordpress.com/2007/08/08/my-dataset/). If I have time, sometime in the near future, I may rerun one of the plots using the best single-seasons over this span, to see if we get Jordan matching Jordan, etc. I think this would be a good test of methodological validity, and also very interesting in itself.

I really like your idea of attempting to statistically locate these groups you (and all of us) know actually exist. I think another interesting step would be to classify teams based on their assortment of player types, or to identify collections of player types that work well together. Like, for example, is having a "star guard" a "star big" and a bunch of "glue guys" a recipe for success? Does pairing a scoring SG with a defensive PG lead to better outcomes? Etc, etc. Keep up the good work, and keep me posted on what you come up with.
_________________
David

http://arbitrarian.wordpress.com
Back to top
View user's profile Send private message
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 3:40 pm Post subject: Reply with quote
Some attempts at class descriptions:
(1,1) - Some not-so-notable forwards who tended to score about 10ppg (see above for examples)
(1,2) - PF and C who tend not to be big scorers, but do get boards (e.g. Marcus Camby, Vlade Divac, Chris Kaman, Horace Grant, Dennis Rodman, Ben Wallace)
(1,3) - PF and C who put the ball in the basket (e.g. the above Top50 players,
Elton Brand, Carlos Boozer, Yao Ming, Dwight Howard, Shawn Kemp, Ralph Sampson, Tim Duncan)
(2,1) - really small group, similar to node (3,1), G-F types with about 12 ppg and 4 rbg, (e.g. Blue Edwards, Bonzi Wells, Rodney Rogers)
(2,2) - forwards who score mostly from the wing, (e.g. Josh Howard, Sam Perkins, Luol Deng, Rasheed Wallace, Lamar Odom, Shawn Marion)
(2,3) - PF known for scoring, transitional node between (1.3) and (3,3), (e.g Karl Malone (though he's almost as close to (1,3), Charles Barkley, Dr. J, Tom Chambers)
(3,1) - wing F, not as good rebounders as (2,2), many known for 3pt shooting and/or defense, (e.g. Dan Majerle, Toni Kukoc, Mike Miller, Kyle Korver, Bruce Bowen, Scottie Pippen, Tayshaun Prince)
(3,2) - wing G/F, who shoot and score, (e.g. Chris Mullin, Vince Carter, Ron Artest, Rashard Lewis, Kobe Bryant, Ray Allen, Tracy McGrady, Paul Pierce, LeBron James, Michael Redd)
(3,3) - transitional group between (2,3) and (4,3), wing scorers, often pretty complete players, (e.g. Michael Jordan, Carmelo Anthony, Richard Jefferson, Dominique Wilkins, Adrian Dantley)
(4,1) - almost all PG, (e.g. Danny Ainge, Gary Payton, Jason Terry, Deron Williams, Derek Fisher, Tim Hardaway, Baron Davis, Joe Johnson, Jason Kidd)
(4,2) - PG who can score and SG mostly, (e.g. Monta Ellis, Isiah Thomas, John Stockton, Mark Price, Joe Dumars, Chris Paul, Chauncy Billups, Manu Ginobili, Gilbert Arenas, Allen Iverson, Reggie Miller, Magic Johnson)
(4,3) - SG, not generally 3pt shooters, (e.g. Richard Hamilton, Dwyane Wade, Calvin Murphy, Reggie Theus)

Last edited by findingneema on Fri Feb 29, 2008 3:47 pm; edited 1 time in total
Back to top
View user's profile Send private message
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 3:45 pm Post subject: Reply with quote
Ed Küpfer wrote:
I'm so happy to see some other folks try their hand at this, especially folks who (unlike me) know what they're doing.


Thanks Smile

Quote:
Also, your concern about a priori classes is relavant. However, from my perspective, I know certain classes of players exist -- these classes may be subjective to some degree, but there is wide concensus on the types. For example, the low-usage rebounding big defender is "out there" in some real sense. My primary interest is in a) seeing if these a priori classes "exist" as statistical classes (answer so far: some do, some don't), and b) seeing how well we can classify these players into homogenous groups (answer: some well, some not so well).


That class most definitely does exist, node (1,2) in my analysis. In fact, the players who score less and board and D-up more, like Ben Wallace, Dennis Rodman, and Marcus Camby, are the most prototypical members of the class (i.e. they are closest to the node center and furthest from (1,3), which has scoring big men).
Back to top
View user's profile Send private message
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 3:54 pm Post subject: Re: I'm glad to see others have worked on this, too Reply with quote
dsparks wrote:
findingneema: Neural net clustering is a cool idea, albeit yet another subject I know little of. I would love to see a .csv of your output, if you're willing to share. I also seem to get a distinction between big men and ~big men in my analysis. I wonder if this is the "primary" fundamental axis of difference among players, or if this finding is just a function of our methods or the available statistics. For example, would we instead expect the "primary" axis to be defensive-leaning vs. offensive-leaning? Or scoring-minded vs. other-things-minded? This is all interesting stuff to me.


Working on it (the .csv). I would definitely argue the primary separation is big men vs not big men. Rebounds, blocks, and 3pt shooting, and to a lesser extent assists, are big separators that will overwhelm the differences between scorers and non-scorers. So what I see is that scoring tends to break up blocks of big vs not big. I need to go back and read all your posts and get a better feel for your algorithm to calculate the distances, so I can better understand what's going on. But I do have some more goodies coming.

Author Message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 785
Location: Toronto

PostPosted: Fri Feb 29, 2008 3:56 pm Post subject: Reply with quote
findingneema: I wonder if you could post some "defining" summary stats for each group -- that would help put your verbal descriptions into context. For example, the intraclass mean of ppg for node X is 1.5 standard deviations higher than the global mean or something like that.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 4:14 pm Post subject: Reply with quote
So the original data was a symmetric 500x500 matrix of distances:



After re-ordering by node (and ordering within the node by distance to node center) the matrix looks like this:



The color scale goes from blue = 0.0 and dark red = 0.5 (the max in the data is ~0.62, but there's relatively few points out there).

If you follow along the diagonal (dark blue boxes), you can find all 12 nodes. Nodes (1,2) and (1,3) are #2 and #3, relatively big, and you can pretty easily see the differences between them. You can pretty easily see how the nodes compare to each other this way, especially the bigger ones.
Back to top
View user's profile Send private message
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 4:18 pm Post subject: Reply with quote
Ed Küpfer wrote:
findingneema: I wonder if you could post some "defining" summary stats for each group -- that would help put your verbal descriptions into context. For example, the intraclass mean of ppg for node X is 1.5 standard deviations higher than the global mean or something like that.


I would, but I need the original data that the distances were calculated from (either that some time to write some perl to parse some data from basketball-reference.com). Let me see what I can do.
Back to top
View user's profile Send private message
dsparks



Joined: 22 Feb 2008
Posts: 61


PostPosted: Fri Feb 29, 2008 4:22 pm Post subject: I can give you the original data Reply with quote
Here is the original data (note that this has 1000 players, just truncate it at 500 to get what you're looking for):
http://dsparks.googlepages.com/NBA1000careersextra.csv

I tell you what, though: if you're interested in writing perl to scrape the basketball-reference site, I would kill to have a crack at their Game Log data: http://www.basketball-reference.com/fc/pgl_finder.cgi
_________________
David

http://arbitrarian.wordpress.com
Back to top
View user's profile Send private message
dsparks



Joined: 22 Feb 2008
Posts: 61


PostPosted: Fri Feb 29, 2008 5:06 pm Post subject: Season-by-season network diagram Reply with quote
I just threw together an individual season version of the network diagram. It seems to reinforce the validity of the method. It has a fair amount of interesting information, since it allows you to witness players' evolution over time. Anyway, I thought I'd share. Sorry about the double post.

Here's the link:
http://arbitrarian.wordpress.com/2008/0 ... k-diagram/
_________________
David

http://arbitrarian.wordpress.com
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Fri Feb 29, 2008 6:11 pm Post subject: Reply with quote
findingneema wrote:
I would, but I need the original data that the distances were calculated from (either that some time to write some perl to parse some data from basketball-reference.com). Let me see what I can do.


dsparks wrote:
I tell you what, though: if you're interested in writing perl to scrape the basketball-reference site, I would kill to have a crack at their Game Log data: http://www.basketball-reference.com/fc/pgl_finder.cgi


Would you two walk into Target and openly discuss in front of the store manager what DVD's you were going to steal? Because that's essentially what you're doing here.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 6:22 pm Post subject: Re: I can give you the original data Reply with quote
dsparks wrote:
Here is the original data (note that this has 1000 players, just truncate it at 500 to get what you're looking for):
http://dsparks.googlepages.com/NBA1000careersextra.csv


Working my way through it, here's some relevant info on scoring vs non-scoring big men ( (1,3) vs (1,2), respectively):

Over their careers:
(1,3) made 50% more shots, took 45% more shots, made 88% more free throws, took 80% more free throws, and scored 56% more points than (1,2) and all of these are significant with student t < 0.01 (yes, I understand the multiple t test problem, not worrying about it now). On a per game basis, (1,3) scored 5.5 more ppg than (1,2). There was no significant difference in any rebounding category.
Back to top
View user's profile Send private message
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 6:26 pm Post subject: Reply with quote
jkubatko wrote:
findingneema wrote:
I would, but I need the original data that the distances were calculated from (either that some time to write some perl to parse some data from basketball-reference.com). Let me see what I can do.


dsparks wrote:
I tell you what, though: if you're interested in writing perl to scrape the basketball-reference site, I would kill to have a crack at their Game Log data: http://www.basketball-reference.com/fc/pgl_finder.cgi


Would you two walk into Target and openly discuss in front of the store manager what DVD's you were going to steal? Because that's essentially what you're doing here.


With all due respect, I was referring to using perl to parse data MANUALLY downloaded from the site using the CSV function. In fact, I downloaded stats for players who played > 24 mpg over the course of 1978-1979 and 2006-2007 that way, 6 pages of players.

And your analogy, btw, is bogus.
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Fri Feb 29, 2008 8:10 pm Post subject: Reply with quote
I don't want to hijack what has been an interesting thread , so just two quick replies.

findingneema wrote:
With all due respect, I was referring to using perl to parse data MANUALLY downloaded from the site using the CSV function. In fact, I downloaded stats for players who played > 24 mpg over the course of 1978-1979 and 2006-2007 that way, 6 pages of players.


Understood. That's why I added the option to convert to CSV, so people could use the data for small-scale research projects.

findingneema wrote:
And your analogy, btw, is bogus.


Hyperbole might me a better word for it. The thing that rankled me was the game log data remark. I have spent hundreds of hours and thousands of dollars putting that game log data together.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 708
Location: Raleigh, NC

PostPosted: Fri Feb 29, 2008 8:24 pm Post subject: Reply with quote
I know one thing is for sure: if you put it out there people are going to (if they have to) steal it. Sad, but true.

BTW, when did a CSV option become available? Sounds great.

You need to work on college... that is a very under served market. Sad I'm trying to parse my way through individual SoCon team website PBP data because no one seems to offer it all on one site. It obviously comes from the same DB, but I just can't seem to get to it in the same place.

/rant
Back to top
View user's profile Send private message Visit poster's website
dsparks



Joined: 22 Feb 2008
Posts: 61


PostPosted: Fri Feb 29, 2008 8:30 pm Post subject: Reply with quote
Quote:

Hyperbole might me a better word for it. The thing that rankled me was the game log data remark. I have spent hundreds of hours and thousands of dollars putting that game log data together.


I don't think findingneema meant anything malicious, but I have to admit that I was totally scheming. My apologies, no offense was meant. I understand your desire to protect what you have put so much into compiling, it's very impressive indeed. Please accept my apologies for the inappropriate behavior.
_________________
David

http://arbitrarian.wordpress.com
Back to top
View user's profile Send private message
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Fri Feb 29, 2008 8:49 pm Post subject: Reply with quote
Some per game stats (averages are weighted, not including the std devs, they're not weighted, which screws them up, I'm t-testing things):

min, fgm-fga, 3pm-3pa, ftm-fta, tr, ast, stl, bk, to, pts
all: 29.6, 5.1-10.9, 0.5-1.4, 2.7-3.6, 5.2, 3.1, 1.0, 0.6, 2.0, 13.5

and how nodes differ from the whole (including pts on all):
(1,1) 26.8 min, 10.4 pts
(1,2) 27.0 min, 3.7-7.6 fgm-fga, 0.0-0.1 3pm-3pa, 7.0 tr, 1.6 ast, 0.9 blk, 9.2pts
(1,3) 0.0-0.1 3pm-3pa, 7.8 tr, 1.9 ast, 1.2 blk, 14.8 pts
(2,1) 25.5 min, 4.1 tr, 11.3 pts
(2,2) 6.7 tr, 1.8 ast, 13.7 pts
(2,3) 32.8 min, 6.8-13.7 fga-fgm, 0.2-0.6 3pm-3pa, 4.5-6.0 ftm-fta, 7.5 tr, 18.4pts
(3,1) 1.1-2.9 3pm-3pa, 1.7-2.2 ftm-fta, 4.0 tr, 10.9 pts
(3,2) 33.8 min, 6.6-14.5 fgm-fga, 1.1-3.2 3pm-3pa, 17.9 pts
(3,3) 32.6 min, 8.0-16.0 fgm-fga, 0.2-0.8 3pm-3pa, 4.8-5.9 ftm-fta, 20.9 pts
(4,1) 1.9-2.4 ftm-fta, 2.9 tr, 4.9 ast, 0.2 blk, 11.8 pts
(4,2) 3.1 tr, 5.5 ast, 0.2 blk, 14.2 pts
(4,3) 3.0 tr, 0.2 blk, 16.0 pts
Back to top
View user's profile Send private message
Mountain



Joined: 13 Mar 2007
Posts: 1527


PostPosted: Fri Feb 29, 2008 10:35 pm Post subject: Reply with quote
I attempted to match my typology names from 2 years ago to the nodes and the data provided by findingneema:


(1,2) - PF and C who tend not to be big scorers, but do get boards (e.g. Marcus Camby, Vlade Divac, Chris Kaman, Horace Grant, Dennis Rodman, Ben Wallace)

(1,2) 27.0 min, 3.7-7.6 fgm-fga, 0.0-0.1 3pm-3pa, 7.0 tr, 1.6 ast, 0.9 blk, 9.2pts

Center defender
Lane clogger
Rugged Rebounder
Quick jumper
Post defender
Shot blocker


(1,3) - PF and C who put the ball in the basket (e.g. the above Top50 players,
Elton Brand, Carlos Boozer, Yao Ming, Dwight Howard, Shawn Kemp, Ralph Sampson, Tim Duncan)
(1,3) 0.0-0.1 3pm-3pa, 7.8 tr, 1.9 ast, 1.2 blk, 14.8 pts

Post scorer


(1,1) - Some not-so-notable forwards who tended to score about 10ppg (see above for examples)

(1,1) 26.8 min, 10.4 pts


(2,1) - really small group, similar to node (3,1), G-F types with about 12 ppg and 4 rbg, (e.g. Blue Edwards, Bonzi Wells, Rodney Rogers)

(2,1) 25.5 min, 4.1 tr, 11.3 pts

Swingman
Post 3


(2,2) - forwards who score mostly from the wing, (e.g. Josh Howard, Sam Perkins, Luol Deng, Rasheed Wallace, Lamar Odom, Shawn Marion)

(2,2) 6.7 tr, 1.8 ast, 13.7 pts

Tall 3
Shooting power forward
High post
Shooting 5


(2,3) - PF known for scoring, transitional node between (1.3) and (3,3), (e.g Karl Malone (though he's almost as close to (1,3), Charles Barkley, Dr. J, Tom Chambers)

(2,3) 32.8 min, 6.8-13.7 fga-fgm, 0.2-0.6 3pm-3pa, 4.5-6.0 ftm-fta, 7.5 tr, 18.4pts

All round power forward



(3,1) - wing F, not as good rebounders as (2,2), many known for 3pt shooting and/or defense, (e.g. Dan Majerle, Toni Kukoc, Mike Miller, Kyle Korver, Bruce Bowen, Scottie Pippen, Tayshaun Prince)

(3,1) 1.1-2.9 3pm-3pa, 1.7-2.2 ftm-fta, 4.0 tr, 10.9 pts

Shooting 3
Defender 3
Strong defender 2
Point 3


(3,2) - wing G/F, who shoot and score, (e.g. Chris Mullin, Vince Carter, Ron Artest, Rashard Lewis, Kobe Bryant, Ray Allen, Tracy McGrady, Paul Pierce, LeBron James, Michael Redd)

(3,2) 33.8 min, 6.6-14.5 fgm-fga, 1.1-3.2 3pm-3pa, 17.9 pts


(4,3) - SG, not generally 3pt shooters, (e.g. Richard Hamilton, Dwyane Wade, Calvin Murphy, Reggie Theus)

(4,3) 3.0 tr, 0.2 blk, 16.0 pts

Classic 2
Spot up shooter
Playmaker 2


(3,3) - transitional group between (2,3) and (4,3), wing scorers, often pretty complete players, (e.g. Michael Jordan, Carmelo Anthony, Richard Jefferson, Dominique Wilkins, Adrian Dantley)

(3,3) 32.6 min, 8.0-16.0 fgm-fga, 0.2-0.8 3pm-3pa, 4.8-5.9 ftm-fta, 20.9 pts

Penetrator 2
Power 3


(4,1) - almost all PG, (e.g. Danny Ainge, Gary Payton, Jason Terry, Deron Williams, Derek Fisher, Tim Hardaway, Baron Davis, Joe Johnson, Jason Kidd)

(4,1) 1.9-2.4 ftm-fta, 2.9 tr, 4.9 ast, 0.2 blk, 11.8 pts

Pure point
Drive and kick
Backup point
Veteran stabilizer


(4,2) - PG who can score and SG mostly, (e.g. Monta Ellis, Isiah Thomas, John Stockton, Mark Price, Joe Dumars, Chris Paul, Chauncy Billups, Manu Ginobili, Gilbert Arenas, Allen Iverson, Reggie Miller, Magic Johnson)

(4,2) 3.1 tr, 5.5 ast, 0.2 blk, 14.2 pts

Shooting point
Tweener


I combined a few nodes and some labels perhaps could be applied to another node, especially if aided by seeing more player names.
Back to top
View user's profile Send private message
Chicago76



Joined: 06 Nov 2005
Posts: 98


PostPosted: Sun Mar 02, 2008 2:25 am Post subject: Reply with quote
This is really interesting and very good work guys.

A couple of comments:

1-The per minute data is great. It may remove some of the "fun" comparisons like Mullin/Jordan, but in my mind it's better for that exact reason. Simple ratios don't tell the whole story because players posting nearly identical ratios can be on the court for different reasons. The low usage guy (a Prince) could be on the court for defense, while a high usage player is on the court for offense.

2-The season network is interesting because of the ability to track player progression. It corroborates the overall analysis, but it illustrates that the same player can be substantially different year to year. Bird in particular (to the right of the diagram) was very easy to interpret.

A couple of suggestions (may involve a bit of raw data manipulation) that I think could substantially improve where you're going:

1-Pace adjustment. Mike G alluded to the per min/pace adjusted data. The first part you took him up on, much to the improvement of the analysis. If you can, I'd take him up on the second part too. You're working across eras here and pace could differ as much as +/- 30-35% in the data. A guy playing 36 minutes a night taking 15 shots a game on a 90 poss/game team is a lot different than a guy taking 15 shots a game for the Nuggets 15-20 years ago.

2-Truncate career data to peak ages (maybe ages 25-29). You've compared players who are new to the league to those who logged 15-20 years. As we've seen in the season network, players change a lot over the course of their careers, and guys with longer careers tend to regress to the mean. Kareem is a great example--great PER stats and rebounding stats in his 20s, not so great by his mid to late 30s. It waters down his career stats. By including later age data for great players, you run the risk of minimizing the variance/differentiators in your analysis. Great players look a lot different than merely decent players at 28. At 32-35, the difference is just as great. It's just that the 20something great players now look like merely good players, while the other guys are at home watching Sports Center with the rest of us at 32-35. The Sports Center watchers don't have those years count against them, while the greats slide closer to the middle of your network.

Lumping in full careers of some players vs. guys who have only played at their peak may lead to some misleading results.
Back to top
View user's profile Send private message
findingneema



Joined: 25 Feb 2008
Posts: 34
Location: Atlanta, GA

PostPosted: Mon Mar 03, 2008 4:22 pm Post subject: Reply with quote
Chicago76 wrote:
A couple of suggestions (may involve a bit of raw data manipulation) that I think could substantially improve where you're going:

1-Pace adjustment. Mike G alluded to the per min/pace adjusted data. The first part you took him up on, much to the improvement of the analysis. If you can, I'd take him up on the second part too. You're working across eras here and pace could differ as much as +/- 30-35% in the data....

2-Truncate career data to peak ages (maybe ages 25-29). You've compared players who are new to the league to those who logged 15-20 years....

Lumping in full careers of some players vs. guys who have only played at their peak may lead to some misleading results.


As for the pace, I totally agree. Does actual possession data exist for that far back, or do we have to extrapolate from team shots, rebounds, and turnovers?

I agree on the peak years thing in principle, but I'm pretty sure I have a better, less biased way of doing it. Once I get the player season data (thanks Ed, or dsparks, do you have yours available somewhere), I know exactly what I'm going to do.
Back to top
View user's profile Send private message
Display posts from previous:
Post new topic Reply to topic APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC


Who is online

Users browsing this forum: DSMok1 and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group