It's an article of faith in these parts (and much of the rest of the internet) that roster decisions should not be made based on shooting percentages, as they are "unsustainable". Certainly there is evidence that some players shoot sustainably better than others, but the same is not regarded as being true of teams - Daoust's take on the Quinn era Leafs to the contrary.
Being a trained social scientist I wanted to get into the numbers and check this myself... So I did, only to find the same thing as pretty much everyone else - i.e., nothing much sustained from year to year (see below for gory details). But then I had the following conversation with myself:
Q: Why are we including defensemen in here? Their job on the offensive end of the ice is to keep the puck in, cycle it, and bomb the odd shot up from the blue line, At the NHL level that is more like playing pinball than paintball. Yes, some people have harder shots than others, but even the toughest of bombers are pretty much dependent on deflections and generating rebounds for the forwards. In fact, it's like playing pinball where someone else is responsible for using the flippers and bumping the table, and all you can do is fire the initial ball. There's not much reason to expect this to be very repeatable is there?
A: Not that I can think of. Why not ditch those guys and see if we can get a cleaner read on the data.
Q: And while we're at it, what about the bottom end forwards? Surely you are kidding yourself if you expect the Orrs of the world to reliably do anything with a hockey stick. Or, at least, anything legal. Heckopete, even the good fourth liners aren't really picked for their scoring prowess, so much as their ability to play a tight defensive game, and chip a few pucks at the net, just on the off-chance.
A: Stumped again. Let's dump those guys too.
So I ran another analysis based only a team's top 6 forwards, and, wouldn't you know it, the results came out a bit different this time.
Wanna see how that turned out? Let's start at the beginning:
(WHOLE) TEAM SHOOTING PERCENTAGE
I grabbed team statistics from Behind The Net (that's a link directly to the excel file), which has numbers covering 5 years (from the 2007-8 season to the 2011-12 one). Looking at the 5V5 numbers I calculated shooting percentages by dividing each team's GF/60 by their SF/60 in that same year. Over the course of those 5 years, teams ranged from a shooting percent low of 7.45% (the Isles) to a high of 9.18% (Pittsburgh). The fact that pretty much every hockey fan ever would have expect those teams to emerge in that order seems like prima facie evidence that SOMETHING reliable is going on, because if it was random, the NYI would have had just as high a chance of being on top as PITT did.
Anyway, there are 3 tests that I ran on each data set:
- A series of simple correlations, comparing each team's shooting percentage in year X with the their percentage in the previous year (i.e.,, in year X-1).
- An overall analysis that compiles every pair of consecutive years into one overall overall correlation
- An analysis which looks at each team's odds of moving up or down the pecking order (i.e., to a higher or lower decile).
1 - Series of correlations
|2007 to 2008||.15||NS|
|2008 to 2008||-.07||NS|
|2009 to 2010||-.15||NS|
|2010 to 2011||.15||NS|
Not much of anything going on here. The negative correlation from 2009 to 2010 seems, to some considerable extent, driven by Washington having a bizarrely anomalous year in which they consistently shot a mediocre 8% every year, except for 2009 in which they bizarrely hit an NHL high 11%. That's small samples for you.
2 - Overall correlation
To get away from the small samples, I stacked every pair of consecutive years on top of each other, like this:
|ANA 2007 sh%||ANA 2008 sh%|
|ANA 2008 sh%||ANA 2009 sh%|
|ANA 2010 sh%||ANA 2011 sh%|
|TOR 2007 sh%||TOR 2008 sh%|
|TOR 2008 sh%||TOR 2009 sh%|
|TOR 2009 sh%||TOR 2010 sh%|
Now technically this is a no-no, because you have the same data points included in both columns. For instance, ANA's 2008 shooting percent is in the right column (being correlated with their 2007 %), and also in the left column (being correlated with the 2009 %). That violates the assumption of independent measures, which means that the probability test results could be thrown off. That said, I'm not inclined to worry about this very much, because I doubt it'll make much difference to the final result in practical terms, and we're not exactly trying to publish this in Science here anyway. You could fix this by re-doing the analysis using only non-overlapping pairs of years, but I promise you, it would give basically the same result.
Anyway, if you combine all the scores like this, you get an overall correlation of r = .03, which is not statistically significant... and even if it were, it is still so close to zero that it wouldn't make any real difference to anything.
3 - Odds of moving up or down
The correlation coefficients above all implicitly assume that the effects of one year's shooting percentage on the next year's would happen in a linear way. While that's often a reasonable assumption, it isn't always, so I divided the shooting percents for each year up into deciles, and looked to see how much teams moved from one year to the next.
To illustrate what this means, the 3 teams that shot the worst in a given year were placed into a bucked labelled "0", and the next 3 worst shooting teams were put together into a bucked labelled "1", and so on, all the way up to the best 3 shooting teams that were put into a bucket labelled "9". That exercise was repeated for each of the 5 years. You can then subtract the decile rank that any team achieved in a given from the rank they managed the previous year, and see if they stayed in about the same spot (relative to the rest of the league), or if they moved considerably up or down.
I calculated the moves that each team made across each pair of consecutive years, and counted how many teams moved zero spots, how many moved one spot, how many moved two, all the way up to the number who moved the maximum of 9 spots (I didn't care in which direction). If shooting percentages are consistent then you would expect these numbers to be fairly small, as the bad shooting teams would keep on shooting worse than everyone else, the good shooting teams would continue shooting better, and everyone else would stay somewhere in the middle.
At first blush this analysis looks promising - you see a lot of low numbers, and not many high ones. But then if you think about it more, that's exactly what you would expect to see. After all, it's difficult for a team to move up or down 9 spots, because the only way you can do it is to start off as one of the bottom 3 teams and move to the top, or to be one of the top 3 teams and move down. If you are one of the 24 teams who started somewhere in between then you cannot possibly move 9 spots, regardless of how you do in the second year. In contrast, anybody can move two spots. In fact, most teams have a double chance of moving two spots, because they can go either one up, OR go one down.
If you game out what the likelihood of any given team moving any given number of spots is, and you plot that against the actual amount of movement seen. Assuming that I did this right, here is what you get:
"1" here means: "you were in the exact same decile on year 2 that you were on year 1", and "10" here means: "you started in the bottom decile and moved to the top, or you started in the top and moved to the bottom". As you can see there's a very slight trend for teams to stay a bit more stationary in the rankings than you would expect, but it doesn't seem very pronounced.
It just doesn't look good for the idea that team shooting percentage is sustainable.
TOP 6 FORWARDS SHOOTING PERCENTAGE
Unfortunately BehindTheNet.ca don't give us the data we need here. Their statistics for individual players give no indication of the number of shots that they took, so we don't have any way of calculating shooting percentages. Fortunately SkinnyFish was able to point me at Hockey Reference, where they do have the requisite numbers. The interface was a bit clumsy, but if you keep manually editing the URL to get different years, and cut 'n pasting the numbers off each page into excel, you can get what you need. I scooped the numbers for 7 consecutive years (2006-7 through to 2012-13). Unfortunately they don't break out ES from PP shooting percentages, so I was forced to use the overall numbers, but I did try a way to get around this a bit (see below).
Once I wrestled these into SAS, here were the next steps:
- Delete defensemen
- Delete anyone who played less than 40 games in a given year (let's keep our sample sizes up here people)
- Recode "WPG as "ATL", because it's the same team FFS.
- Delete anyone whose team was coded as "TOT" - I assume that means they were traded mid-year.
- Ranked the players on each team, in each year, by average ice time, and delete everyone who wasn't in the top 6.
- Calculate an average shooting percent for those top 6 forwards on each team, in each year
- Repeated the same 3 analyses as I ran above for whole team shooting percents.
1 - Series of correlations
|2006 to 2007||..41||p<.05|
|2007 to 2008||.26||NS|
|2008 to 2008||.14||NS|
|2009 to 2010||.21||NS|
|2010 to 2011||.13||NS|
|2011 to 2012||.19||NS|
Only one of these was significant, but they were all consistently in a positive direction. The fact that most of them are not significant is almost certainly due to the small sample sizes here. There are only 30 observations (i.e., teams) each year, and that is simply not enough to reliably tell whether differences in the .13 to .26 range are significant or not. Think of a bigger sample size here like having a bigger magnifying glass - the bigger of one you have, the smaller the objects are that you can see through it. There may or may not be an object under your lens, but without a strong enough lens to see it, you will always conclude that it isn't there, because you are simply too blind to see it.
2 - Overall correlation
We resolve the sample size problem by combining all of the years into one data set with 180 observations (i.e., 30 teams X 6 pairs of consecutive years). As I say above this is technically an ixnay due to the issues with correlated errors that it creates, but I'm doing it anyway. This gives pretty much the answer you would expect: A correlation of 0.21 that is significant p<.01.
That's not an ENORMOUS correlation, but it's not a tiny one either. It means that if you know the shooting percent of a team's top 6 forwards in one year, then you can take a better than random guess at their shooting percent for next year.
You can see here that there is a fair amount of turbulence, but there also seems to be a fairly consistent trend for teams to move less than you would expect by chance - that is basically a picture, right there, of teams top 6 forwards tending to shoot consistently better or worse than the rest of the league, from year to year.
What about power play time?
These numbers are for all goals, not just ES. So I hear you say: "maybe that consistency is just because some teams are better than others at drawing penalties. We know that people shoot a higher percentage on the PP, so maybe you're just seeing consistency is drawing PP time here."
On some level you have me there. I can't completely rule that out. But while I didn't have numbers that broke out the number of PP vs. ES (or SH) shots they took, I did have the number of goals that they scored in each context. That allows me to try to statistically correct for the effect of PP time. I'll explain how in the next paragraph, but it's technical, so if you aren't statistically minded just skip this paragraph.
I computed a ratio of the goals scored at ES vs. on the PP as a proxy for the amount of PP offensive activity that a forward was part of (technically I divided ESG by PPG+1, to avoid an awful lot of divisions by 0). I then ran a regression that used this ratio score to predict shooting percent - this shows a small but significant negative relationship (b= -0.11, p<.01), indicating that players who scored more of their goals at ES had very slightly lower shooting percents. I then saved the residual from this analysis as a "clean" measure of what a player's sh% would have been had they had the same amount of PP activity as an average player.
Using this statistically cleaned measure of shooting percent gives pretty much the same results as using the raw numbers. The "overall" correlation (i.e., from test 2) was still .21, ands still p<.01, and the decile graph looks about the same:
What all of this shows is that if you look only at the players whose job is to do the high skilled part of your scoring - and lets face it, the ones who score the bulk of your goals for you - then shooting percentage actually is a repeatable skill. Granted the effects aren't large here, but keep in mind that a team's top 6 forwards can change a fair bit from one year to the next - either through trades, or because someone gets injured one year and plays less than 40 games (which would eliminate them from my cut off), or because a slight shift in average ice time bumps players in and out of the top 6... there are a lot of ways to get turbulence here. But despite all of that, there is still a moderate amount of consistency in the shooting percentage of a team's top core of forwards.