Anyone that reads this blog regularly (and indeed, anyone who has any basic knowledge of statistics) knows the pitfalls of small sample sizes. In fact, small sample size (SSS) became a bit of a mantra around here last season, as we tried to project the performance of goalies, skaters, and the entire team after the Leafs started off on a relatively hot streak. While most of the season continued on the same course, eventually the SSS caught up with the team, and the Leafs came crashing back to earth.
Now, with the lockout over we're now faced with a shortened season, and many around here have been saying "anything can happen". Maybe James Reimer can go on a hot streak and we can make the PLAYOFFS!!!1 Maybe Carey Price will do the same and the Habs will win the division. Maybe Kessel will pot 40 in 48. Maybe he'll start off slow and not break 10. Maybe the Leafs get lucky for 48 games, or maybe we end up with another lottery pick.
With this question in mind (and inspired by something I read today, as well as provocation from theninjareg), I ran a few Monte Carlo simulations to see just how this season could shake out compared to a regular 82 game season.
(For all simulations, I simulated 1,000,000 seasons of both 48 and 82 games in Matlab, and present the summaries below).
To begin with, what are the chances someone like Phil Kessel has a crazy hot streak in a 48 game season? Similarly, what are the chances he gets in a rut and has a terrible season? And how does this compare to an 82 game season?
Lets take his career shooting percentage of 10.8% and round down to an even 10. Lets also imagine that he has an average of 3 shots each game (slightly below his career average of 3.35). For my simulation, I assumed that on each shot he had a 10% chance of scoring. But, chance is going to determine how much he'll score both in a game, and in a season. Over 1,000,000 imaginary seasons though, we have a good sense of how many goals he'll score (and how far away from his "true" SH% he is).
The SH% are presented here:
On the vertical axis is the number of events in the simulation. Because it's 1,000,000 simulations, you can think of it as percent (e.g., 2.5 = 25%). On the horizontal axis are shooting percentages, binned into .05 percentages.
What you'll see is probably what you'd expect if you know anything about SSS: over the course of a longer season, the observed SH% is much closer to the real SH%. Specifically, the shooting percentage will fall within 2.5% of the "real" value about 48% of the time. However, that value drops to about 42% in a 48 game season. If we increase that to 3.75%, this encompasses about 95% of outcomes. That is, in an 82 game season, assuming 3 shots a game, there's a 95% chance that Kessel scores between 15 and 33 goals. Now I realize that's 18 goals is a large range, but in a 48 game season, those chances drop to about 84%.
The upshot of the SSS is the increased chances at the extreme. There's a better than 8% chance that in a 48 game season someone like Kessel will shoot better than 13.75%, more than twice that of a 82 game season (3.4%). Unfortunately the same is true at the other end: there's about a 8% chance that a 10% shooter has a SH% of less than 6.75%. That would put a player like on a pace of scoring less than 15 goals in a full season. The chances are small, but they're real.
Based on this, I would put the chaos factor for shooters between 8 - 10% for a short season (compared to a regular one). In other words, about 10% of the outcomes will be towards the extremes in this season compared to a regular season.
Ah, league-average goaltending. Wouldn't that be nice? Since we don't know what that looks like, I simulated a goalie with a true SV% of .910, who sees 30 shots a game. I also assumed that a starting goalie would play about 35 games in a 48 game season, vs 62 in an 82 game season.
Here, SV% is plotted into bins of .10%, which seems like a large value, but it turns out that SV% can vary substantially in a shortened season. If a goalie plays 62 games, there's about a 53% chance of getting within .05% of his "true" SV% of .910 (.905 - .915%). In a shortened season though, this drops all the way to 41%. In other words, the majority of the time a goalie with a true SV% of .910 will have a SV% greater than .915 (league average) or less than .905 (Gustavsson-esque).
Interestingly, the short season seems to affect goalies on the extreme ends a lot more than it did for shooters. In a short season, there's about a 4.5% chance that the goalie's SV% ends up below .895% (Toskala-esque), compared to about 1% of the time in a regular season (actual Toskala). Again, same is true at the other end as well: There's about a 4% chance in a shortened season of a .910 goalie having a SV% above .925!
Overall, I'd say there's greater chaos for the goalies in a shortened season, somewhere between 10-15% more than during a regular season.
We can do the same thing for teams as well. Last season the Ottawa Senators and Washington Capitals squeaked into the playoffs with a .561 points percentage. Meanwhile the Buffalo Sabres were on the outskirts with a .543. So Lets take .55 as a value for a team that's a bubble team, and see how their seasons will play out over the different seasons.
At the team level, there's a little bit of skew to the outcomes. In a full season, teams are much more likely (~12.5% more likely, to be exact) to get within .1 of their expected points % of .55, but this difference will skew downwards slightly (.475 - .575 points). That is, through a full season, a team with a true points % of .55 will probably miss out on the playoffs the vast majority of the time: assuming a cut-off of about .575%, this team will miss the playoffs about 70% of the time, and usually just barely.
In a shortened season, however, the chances of missing the playoff cutoff for a .55% team drops to 62%. In other words, in a shortened season, a team with a true points percentage has about an 8% greater chance of making the playoffs just due to random chance and SSS. Moreover, the chances are about 3X greater that they get between .675 and .725 points % (putting them on a ridiculous 128 point projection)!
Obviously, the situation with a team's point % is complicated by goalies and scorers' luck, as well as that of the other teams, but again the point should be obvious: chance will come into play a fair bit this season - probably much more so than for individuals (~15 - 20% greater outcomes in the extremes).
Conclusions - Applying Bayes Theorem to Season Outcomes
Obviously, luck will influence this season a great deal. For players I think that will be somewhere between 10-15% more than in regular seasons. For teams, that number could be as high as 20%.
So why should we care?
Well when it comes to evaluating players and teams after the season is done, it will be important to remember just how much chance will have played a roll.
Take, for example, the case of a goalie. Let's say that in his first two years he had a SV% of .910. Let's also assume that this goalie is fairly young and untested, so we're only somewhat confident (say, 75%) that the .910 is the "true" SV%. Then, let's say in one single season he has a SV% of .920. In a normal season, we could assume that the probability of him getting a .920 given a true SV% of .910 is about 25% (based on our model). Let's also assume that the probability of getting a .920 if his real SV% is a .920 is about 55% (the value obtained for our "real" value in the model). Then, according to Bayes theorem, we should revise the "prior" to about 57% - i.e., the chances that the goalie's true SV% is .910 has dropped from 75% to .57% -- we should now be much less confident that his true SV% is .910 (and not, say, something closer to .920).
If, however, that season happened to be a shortened one, like this season, then our revised priors will be much different. Assuming, now, a 30% chance of getting a .920 SV% if the real SV% is .910, and a 40% chance that the .920 is the "real" value, then our new estimate of the probability of his true SV% being .910 is 69% - i.e., the chances that his true SV% based on his performance this season has only dropped from 75% to 69%.
In other words, we should be extremely cautious of judging the performance of players and teams in this shortened season. Specifically, any GM who's willing to sign a contract to a player based on one anomalous performance this season should be tarred and feathered. Because while individual outcomes may only appear to have a small chance of occurring by chance, if we have prior outcomes to judge most players on, far more weight should be given to those years than this crazy, chaotic season.
Also, there's a good chance that if the Leafs squeak into the playoffs this year, it will have been due to chance - their own, and the bad luck of other teams. Selling off assets to make it happen would be insane, since the larger sample sizes of the following seasons will put us much closer to our "true" value.
That said, Go Leafs Go!