Page 1 of 1

How do I measure noise?

Posted: Sun Mar 29, 2015 4:07 pm
by TeamEd
So, I want to revisit the question of measuring noise in a dataset. This is related to my project looking at productivity based on in-game splits, but I figure this is worth asking in a new thread.

I formulas that tell me how much players increase or decrease their rate of recording box score stats when trailing. My set goes back to 1996-97. This data appears to be OK for shot attempts, but quite noisy for assists, rebounds and blocks. I want to measure this noise to see if the numbers I'm finding are useful.

To do this, I think I need to measure linear regression of season X-1 numbers to season X numbers to get a correlation coefficient that tells me if the measure predicts itself. I don't really know how to do this, but I think I could figure it out. The problem as I see it is, I need to do this for every player in the dataset individually then I need to weight the result by minutes played or something to get an overall correlation coefficient for the data.... or something. I expect I'll also want to see if a three year average is predictive where season-by-season numbers aren't.

Anyway. I've googled and haven't found any tutorials. Although, I'm also not sure what exactly I'm looking for.

So, ignoring the above if it doesn't make sense: I have a set of new stats. I think they might be noisy. How do I measure this noise?

/ This exceeds my J-School education.

Re: How do I measure noise?

Posted: Mon Mar 30, 2015 5:03 am
by NateTG
Are there any tools for dealing with data or statistics that you're familiar with already?

You can find lots of stuff for doing regressions using R (which is a tool used for doing statistics).

If you run a regression you can check the http://en.wikipedia.org/wiki/Coefficien ... ermination or something similar to see how good your predictions are.
This data appears to be OK for shot attempts, but quite noisy for assists, rebounds and blocks.
Can you explain this a little more? Is this just because there are fewer assists, blocks and rebounds than shot attempts, or something else?

Re: How do I measure noise?

Posted: Mon Mar 30, 2015 10:13 pm
by TeamEd
NateTG wrote:Are there any tools for dealing with data or statistics that you're familiar with already?

You can find lots of stuff for doing regressions using R (which is a tool used for doing statistics).

If you run a regression you can check the http://en.wikipedia.org/wiki/Coefficien ... ermination or something similar to see how good your predictions are.
This data appears to be OK for shot attempts, but quite noisy for assists, rebounds and blocks.
Can you explain this a little more? Is this just because there are fewer assists, blocks and rebounds than shot attempts, or something else?
I don't really have a lot of experience with stats tools. I'll have a look at R.

What I mean is the year over year numbers I'm getting for change in shooting rate when behind appears to be fairly consistent. Year over year the change in assist and block rate etc. appear to be closer to random. I expect it's a sample size thing.

Re: How do I measure noise?

Posted: Mon Apr 06, 2015 3:09 pm
by Chris Hoffman
TeamEd wrote:So, I want to revisit the question of measuring noise in a dataset. This is related to my project looking at productivity based on in-game splits, but I figure this is worth asking in a new thread.

I formulas that tell me how much players increase or decrease their rate of recording box score stats when trailing. My set goes back to 1996-97. This data appears to be OK for shot attempts, but quite noisy for assists, rebounds and blocks. I want to measure this noise to see if the numbers I'm finding are useful.

To do this, I think I need to measure linear regression of season X-1 numbers to season X numbers to get a correlation coefficient that tells me if the measure predicts itself. I don't really know how to do this, but I think I could figure it out. The problem as I see it is, I need to do this for every player in the dataset individually then I need to weight the result by minutes played or something to get an overall correlation coefficient for the data.... or something. I expect I'll also want to see if a three year average is predictive where season-by-season numbers aren't.

Anyway. I've googled and haven't found any tutorials. Although, I'm also not sure what exactly I'm looking for.

So, ignoring the above if it doesn't make sense: I have a set of new stats. I think they might be noisy. How do I measure this noise?

/ This exceeds my J-School education.

Call me silly, but the only stats that relate to your study are assists. Rebounds may turn up an offensive rebound or a defensive rebound. So what percent of rebounds per team are offensive and defensive? That may help you get a handle on your rebounds.

Blocks for the sake of the study of answering the question "do the rates of shots increase when trailing" is irrelevant and whether or not the shots go in is not relevant either, it just sounds like you are asking when a team is behind do they pull the trigger more. So to simplify your study, only look at shots taken. Get a handle on that one stat, then add complexity of the assist.

I would argue that even inbounding the ball could be an assist, because the person who wants the ball will not inbound it. Also, the number of assists per possession could be a number you want to look at if the data exists. The question is, do you have enough data to account for the complexity of assists, if not, for now ignore it. Just my opinions,

Is an assist only accounted for when the shot goes in? If so then whether or not the shot goes in makes assists a stat you do not need to look at given your question. Hope it makes sense and check my logic here. I don't like assists because it only accounts for the last pass a player recieves before the field goal. You could argue, if it takes a team three passes around the arc to find someone open for a three pointer in rapid succession, who gets the assist if the shot goes in? The last person to pass the ball. But you could argue that the entire dynamic and synergy of the three passes is what created the opening for the three point shot to be taken, and thus all three passes should be attributed as assists for each respective player.

-happy Easter
-chris

Re: How do I measure noise?

Posted: Mon Apr 06, 2015 5:19 pm
by DSMok1
What you need, Ed, is to do year-to-year correlations and then back out how much noise there is vs. how much signal. Here's one method: http://blog.philbirnbaum.com/2011/08/ta ... -kind.html

Alternatively, you could do a simple regression to predict year 2 from year 1, and see what the slope and fit of the curves look like. That will give a good intuitive grasp.

I wouldn't worry too much about any changes in true signal between year 1 and year 2 at this point.

Re: How do I measure noise?

Posted: Tue Apr 07, 2015 3:09 pm
by TeamEd
DSMok1 wrote:What you need, Ed, is to do year-to-year correlations and then back out how much noise there is vs. how much signal. Here's one method: http://blog.philbirnbaum.com/2011/08/ta ... -kind.html

Alternatively, you could do a simple regression to predict year 2 from year 1, and see what the slope and fit of the curves look like. That will give a good intuitive grasp.

I wouldn't worry too much about any changes in true signal between year 1 and year 2 at this point.
Ok. This seems do-able where R is proving hard to get into. I was thinking something along these lines, but getting stuck 1. in figuring out how to set up my tables, and 2. in how to weight each player's contribution to an overall measurement of noise. I'll get on this. Thanks for the advice.

On the note about looking at blocks/rebounds/steals etc. I think there might be some interesting stuff there on a per player basis, but there's obviously going to be more issues in those numbers than the larger samples for shots.

And, yeah. Assists are an imperfect measure of passing. Ideally I'd want to know how raw passing rate increases or decreases, but I don't have that data. It is what it is.