This summer has seen a relatively revolutionary shift in its widespread perception of the usefulness of analytics. Hockey bloggers are being hired by NHL teams and there seems to be a groundswell amongst the league's traditionalist leaders to finally embrace the usefulness of the information analytics is providing.
As a math teacher - one that spends time in classrooms educating students specifically on statistics and statistical reasoning - part of the drawn out debate that has most intrigued me is the "eye test" vs "stats" arguments.
On the one side you have the traditional hockey watcher. The one that feels scouting is best done by "watching the games" and making decisions based on impressions you have of the things you're seeing. This is based on a large amount of experience watching hour after hour of players skating, shooting, hitting and scoring. The traditionalist 'knows' what the important aspects of the game are. They have had these key points hammered into their head through conventional descriptions by the greatest hockey minds the world has seen over the past 60+ years.
They know that dump and chase hockey is safer than run and gun. They know that shot blocking players that sacrifice their body for the good of the team are the glue that hold together a team defensively. They know that fancy skill players are a luxury. Most skill guys are only useful when the puck is on their stick and they can't be relied upon when the going gets tough.
I could go on and on with a number of other old saws that you hear from traditionalist viewers with regularity...but I won't. That isn't my point in this posting. See the funny thing is, I actually agree with a bunch of the things I just wrote because they are true depending on the context in which they are applied. The more we delve into analytics and statistics, the more we begin to see some of the traditional viewpoints borne out by the numbers.
Unfortunately - what the eye test fails to recognize is that it is not easy to identify actual skills by sight alone. The eye test does a very poor job of comparing players across the same team - let alone multiple teams. What I want to talk about in this posting is why this happens and what we can do to mitigate against it.
Heuristics and Our Lazy Brain
Human psychology is a very interesting area of study with almost universal application to any field involving decision making - particularly as more study is done on how we fall prey to the wiring in our own heads. Much of this has been analyzed and researched by 2002 Nobel Prize winner Daniel Kahneman and his now deceased research associate Amos Tversky. I'd like to discuss some of the relevant points of their work as they relate to this topic.
Part of the way our brain processes the vast quantities of information we are presented with daily is to use shorter process loops based on general rules that have become hardwired into our brains through evolution. These mental short cuts are known as "heuristics" - and many of them lead to cognitive biases when we need to sort through more complicated problems or processes.
Another way of thinking of this is - these mental short cuts are helpful when you're solving fairly simple problems, or when you're trying to decide something in the moment and time is at a premium. But when you have a lot of time to analyze something, and a lot of information to sort through, these short cuts actually get you into trouble and can lead you horribly astray. The types of problems we encounter when analyzing large complicated systems aren't the sort of thing our brains are evolved to handle efficiently.
Humans tend to over emphasize observations that are recent or stand out for their noteworthiness (availability heuristic). In terms of watching hockey games this means we intrinsically over-value the plays leading up to goals or scoring chances for or against - rather than the mundane run of play that happens in between. This over-valuation of rare results, in comparison to the mundane yet far more prevalent process occurring in the interim, can lead to large miscalculations and mistakes when it comes to sorting through the data we have that is available.
Humans also tend to form opinions based on our first impressions and then shift expectations based on what we have initially been presented with (Anchoring and Adjustment). This would be the type of thinking that confounds interpretations of individual players or even long term beliefs about what styles of play are most effective.
Additionally there are heuristics that create an escalation of commitment which we regularly see as the "Sunk Cost Fallacy". We over-value things that we have put effort, time, and money into. This leads to the type of cognitive biases that many first year economics students are warned about - and while this is a very basic concept it continues to pervade virtually all aspects of how professional sports franchises operate. It also explains why so many Leafs fans are loathe to give up on a player who is likely at peak value and only likely to decline in usefulness. If you've ever heard the mantra "sell high, buy low" this is exactly the type of logic that is ruined by the sunk cost fallacy.
So what do all of these heuristics have in common? As I mentioned earlier, they largely evolved as mental short cuts that our "Lazy Lizard Brain" evolved to speed up decision making under pressure. They often lead to poorly considered snap judgments and are best served when working with a limited amount of data, particularly in areas where the penalties for making a poor decision are outweighed by taking too long to reach a decision at all.
Put another way - when you're wandering around naked on a grass savannah and you don't know or understand that much about the inner workings of your surroundings, your ability to predict and evaluate outcomes is probably lacking. As an ignorant great ape who recently came down out of the trees, it would make sense to rely on the things you know and stick with them. You probably don't get a lot of second chances to try those things that almost killed you the first time, and if you find a system that works and will help you live for another few weeks until you can track down some fresh meat or edible plant matter it's probably best to continue with that system.
Unfortunately the modern world affords you a lot more time and information with which to make decisions, and relying on gut instincts that were honed during humanity's time as a savannah wandering, grub eating, survivalist may not be ideally suited to a realm where large amounts of money are exchanged and people's jobs (and thus lives) depend on sound planning and long term foresight.
Analytics and Decision Making
The major advancements in data analytics are being felt all over the globe in a wide variety of fields. Business and economics, politics, demographics, health, and yes even sports. The key point of this shift is to improve decision making. We now have lots of information we didn't have before and we can sift through it and find answers to questions that we haven't been asking. Often we stumble across things in the information that we didn't even know were happening and this leads us to other areas of interesting analysis which furthers our understanding of what's happening.
Let me bring this back to why we use certain statistics in hockey and where this is trending long term. Much debate has taken place (and apparently continues to) around the value of shot vs goal based statistics in hockey. All of this is frankly an issue of sampling.
On the one side you have a group of individuals that wish to "value" certain plays - be it goals or scoring chances - that they rightly consider to be of more importance in determining the outcome of hockey games. In this instance what we have is a determined effort to value results above process. Players who are perceived to "regularly" produce results are valued more highly than those perceived to "regularly" fail to produce results.
On the opposite side you have the group that would suggest there are significant issues with that assessment (full disclosure - I am a member of this second group). Foremost is the issue of sample sizes, randomness and rarity. Despite the very concrete value of goals and scoring chances, they occur at surprisingly irregular intervals in comparison to shot attempts.
A single NHL game generally has around 80-90 shot attempts at 5v5, 40-50 shots on goal at 5v5, 30-40 total scoring chances, and only 4 or 5 goals at 5v5. What you can logically infer from this information is that shot attempts occur more frequently than shots on goal and scoring chances, and vastly more frequently than goals.
Scoring chances and goals are excellent to track, but a problem arises when we consider what happens in between these events. The amount of randomness and noise present in individual series of plays in hockey extends to individual games and beyond - it interferes with accurate assessment beyond individual seasons worth of play. Consider that for an entire NHL team - skill doesn't overtake luck in terms of it's impact on the NHL standings until the season is almost 90% complete. For individual players (goaltenders or skaters) the best assessments we have indicate that we require over 3 complete seasons of play to be relatively confident in assessments of their true NHL talent level.
I don't make these statements lightly and think this is extremely important for the average follower to recognize. The "eye ball test" will be found lacking if you are relying on less frequent discrete chunks or packets of information. Relying solely on goals or scoring chances will lead to missing a LOT of information that is relevant to the assessment of player skills in areas that do not directly connect to scoring goals.
The fact that players are often misleadingly victimized or graced by positive or negative outcomes is often best displayed by the extreme variability in statistics such as PDO across multiple seasons. This is what makes +/- largely useless as a tool for player assessment on it's own.
In signal processing the inability to distinguish between multiple signals or the distortion of information being delivered by these signals is referred to as aliasing. Aliasing arises during reconstruction of an inadequately sampled signal. The information that is recorded as part of the sample being examined requires a reconstruction of the signal that lies in between the samples. The farther apart the recorded samples, the more likely aliasing is to occur.
By limiting ourselves to samples of data that occur farther apart in time, we leave our interpretation of what happens open to our biases and the failings of the heuristics that victimize our brains regularly. Thus in order to make fair and reasonable assessments, it behooves analysts to seek out the most detailed samples of data available. For the past 7 years the most detailed samples of data available for feasible analysis have been NHL game sheets, and specifically shot attempt counts (used as an indicator of which team is in possession of the puck at each time stamp).
The Future of Analytics
The most recent stage of hockey analytics has focused on the groundbreaking work by Eric Tulsky regarding Zone Entries, and the follow up work being pursued around Zone Exits initiated by Oilers bloggers Jonathan Willis and Derek Zona. Linkages have been demonstrated between carrying the puck into the offensive zone or out of the defensive zone and shot attempt or scoring chance generation. While this is a fundamental shift away from some classic views in the sport regarding the value of Dump & Chase (D&C) hockey, it should still be pointed out that the top possession team in the NHL - the Los Angeles Kings - regularly employ D&C to great effectiveness.
The real future though - in my opinion - lies in the data collection wave that is about to crest on the NHL's shores. Eventually debates over sample size will become a moot point as it is within our grasp to literally record EVERYTHING that is happening on the ice at all times (accurately I might add). At the 2014 NHL Board of Governors meetings and the Entry Draft, discussions were held between the league and competitors Stats LLC (the purveyors of SportVU), and Powerscout Hockey regarding the implementation of a video tracking system for NHL teams.
Powerscout has displayed its system already during Sportsnet CHL telecasts (most recently during the 2014 Memorial Cup), and made a presentation of it's system at this past week's 2014 Joint Statistical Meeting in Boston. While many North Americans are aware of SportVu thanks to its work with the NBA in recent seasons (most tellingly detailed in this piece profiling the use of the system by the Toronto Raptors), Powerscout should not be shrugged aside as a small fish in this pond.
The system employed by Powerscout is built upon proprietary technology originally developed and implemented by Prozone Sport - a European corporation that has worked with over 300 professional football outfits including Real Madrid FC, Manchester United FC and Arsenal FC. It should be noted that co-panelists at the 2014 JSM with Powerscout were the authors of a Prozone sponsored and facilitated research paper originating from Liverpool John Moores University on the topic of Advanced Data Analytics in Soccer.
Other companies that are pushing their way slowly into the data analytics realm in hockey include Stathletes who were profiled in this Globe and Mail piece back on July 7th based on their Small Business Challenge. Stathletes are currently contracted with multiple NHL teams - and are entering the CHL realm as an outsourced data collection service that send reports to the teams they are contracted with.
There still remains the realm of RFID tracking and accelerometer data on offer from companies like Catapult Sports, whose wearable technology developed at the Australian Institute for Sport (AIS) is already employed by the Philadelphia Flyers and Toronto Raptors in a training setting. The NFL just signed an agreement last week with Zebra Technologies to use their MotionWorks RFID tags on every player to collect as much useful data as possible and store it in databases for future analysis.
The final example I wish to present is the data services provided by one of the worlds largest and most profitable Enterprise Business Software companies, SAP AG of Germany to the German Football Federation. Publicized during the most recent FIFA World Cup in Brazil, the power that SAP can bring to bear in developing software for this type of sports analysis is frankly unrivaled.
While it appears superficially like hockey and the NHL are lagging other sports in the analytics realm (and in many ways it obviously is) - the world of Big Data and collection of useful information is rapidly spreading into every major business globally. That includes the NHL and in all likelihood the data available to all 30 NHL teams will drastically exceed what is available to the public (assuming they continue to keep the information private).
Once that data is available, the debates around the value of the "eye test" and subjective shot counting, scoring chance tracking, etc. will all frankly become moot. If you pair RFID tags, accelerometers, video tracking systems, and high powered data collection with machine learning algorithms you will entirely revolutionize the way that sport is broken down by the teams making use of it. Any team not making use of these capabilities will lose virtually any edge they have whatsoever.
This summer may seem like a sea change, but in my estimation this is just the beginning. I'm looking forward to it, and hopefully all of us can soon end the debates over whose information is more useful.