Update on that NHL shot distance story from yesterday: As many in the analytics community have noted, the league found the "glitch" in their system that was counting crease shots and attempts as being several feet away. I was told fix was made Wed night.— Greg Wyshynski (@wyshynski) October 17, 2019
The hope now is that the fix can be applied retroactively, and new data can be pushed for the first few games. For the rest of us, we should all consider how we test the tools we use, and how much me trust them. Data validation and software testing is dull, but necessary.
Everything we see is an illusion. We don’t think about it that way, but what we see is a fiction created in our brains based on the information our optic nerves supply it with. Fiction can be true, though, and we expect it to be true even if we know it’s not real. Abstraction of reality is something humans love deeply.
Someone paid $90 million for a David Hockney painting, a self portrait. And you could, if you liked, simply go get a photograph of Hockney instead, posed in the same way. But in the fiction of art we expect to find something reality alone can’t provide. We use our analytical and logical brains to make abstractions of reality and then study them just like we might a painting, and some people (me for example) might say that sports in and of itself is an abstraction of more complex reality. We play games to control the world, to understand it, to measure it out on a rectangle of a certain size.
This shot plot above is the abstraction that drives hockey analytics. And last night, in a plot twist no one seemed to see coming, we learned that what we thought was happening wasn’t real:
Yeah, so it looks like there is something very different about how the NHL is recording event location coordinates this season... and, umm, it's not great. Thread incoming— EvolvingWild (@EvolvingWild) October 15, 2019
Evolving Wild haven’t uncovered a wholesale change-up in how players are playing the game, and they’ve confirmed it’s not an early season statistical blip that happens every year like power play opportunities. This is a difference in the way the data is recorded.
The narrator of every hockey game is not the person calling the play-by-play on television, it’s someone else sitting up in the top of the arena, creating an abstraction of the game with numbers. The Official Scorers (it’s not a one-person job in the NHL) note down every event that they see on the ice. Everything we know about a hockey game comes from them. Hits, giveaways, shot blocks, all the things that the television broadcaster talks about comes from the scorer. But they also note down useful information too, seemingly by accident. The location of every shot is recorded, along with the type of shot, who took it, and where it went (miss, goal, etc.). This information, and the lists of the other, less meaningful events, are included in the play-by-play data sheet published to the web in real time for every game.
Every shot plot, heat map, line graph, or table of shot data from an NHL hockey game comes from that same source. And the source has always been thought to be reliable.
It’s not perfect, and we’ve always understood that. What the Official Scorer does is assign x,y coordinates to every shot (blocked shots are recorded at the location of the block), just like the app up above lets you do. It’s a guess, or more politely, an approximation.
It’s really easy to see where Cody Ceci shoots this puck from, it’s right on the faceoff dot.
Here’s where it was recorded:
The shot indicated as the Ceci goal is the small dot to the right of the big blob just inside the circle. This shot plot is from Moneypuck and updates live in-game. It’s not just a plot of unblocked shots, it’s an Expected Goals plot. The big blob was a goal, but it’s big because it was a shot with a higher chance by Moneypuck’s Expected Goals model of becoming a goal based on the shot location and type and some other factors.
Here’s the Auston Matthews goal that is represented by that big blob:
That dot isn’t quite in the right spot either, but it isn’t off by as much as the Ceci goal is.
This kind of error is normal and expected, and built into the practice of using the big pile of shot data from thousands of hockey games is the understanding that some of it is wrong. That’s just how all data collection and analysis works. The surprise isn’t that the location of those two goals was a little off, that’s not the plot twist, that’s just the beginning.
Evolving Wild found some anecdotes as well:
This Johansen goal was recorded 9 ft from the net, 35.5% xG. These were the two highest xG events this season. We updated our model, which meant much lower xG values overall, but we were expecting a ceiling of ~70%, not 40%. pic.twitter.com/7uGx6CRd1N— EvolvingWild (@EvolvingWild) October 15, 2019
Their examples aren’t the point, but rather, the weirdly low overall Expected Goals on what are supposed to be the two best shots so far this season is the real clue. The Twitter thread details their findings, but I’m going to give you one set of their images to hopefully make this issue clear:
This is all the unblocked shots from last season within 30 feet of the net. And now this season so far:
The white space around the blue paint makes it clear that the overall location of the shots as recorded has moved out from the net in some way. Not as clear to the untrained eye is the darker black concentration has stepped away as well.
In further analysis, Evolving Wild believe that the effect is a compression of the x coordinate location away from the net, but the blueline area shots are not affected.
Why is this happening?
I want to emphasize that this is not a case of players suddenly shooting from different locations en masse. Evolving Wild haven’t uncovered a wholesale change-up in how players are playing the game, and they’ve confirmed it’s not an early season statistical blip that happens every year like power play opportunities. This is a difference in the way the data is recorded.
One possible explanation is that the NHL Official Scorers are simply using different equipment to make their assessments of locations, and a different sort of error has crept in, but EW have a different theory:
To us, this seems to confirm that the NHL is now likely recording the location of the jersey/player and not the actual shot location/where the puck left the stick. This is a huge loss as that is *very important* piece of information. Time to have a beer and take a break.— EvolvingWild (@EvolvingWild) October 15, 2019
If they are correct, then the NHL is preparing the way for tracking technology that will be more accurate (we hope) but will bring in a new and unexpected discrepancy. People aren’t dots. A hockey player with a stick in his hands describes a very large area of effect on the ice. If the puck ceases to be the object that’s noted, and the player’s jersey becomes the object of focus, what is gained in removing human error will be lost in another way.
UPDATE: EW are moving away from this theory that the scorers were making location by body position, and to be very clear here: tracking technology has not yet been implemented, and this change is not related to that technology at all since it’s not being used.
If you go back and watch the Ceci goal again, you can see that if you imagine noting where the shoulder of his jersey is, the data the NHL produced is less wrong. Ditto the Matthews goal. Some of what we’re seeing in specific examples is likely just human error, but overall, if EW are correct in their assessment, shot locations will no longer be analogous from now on to the historical information.
Why does this matter?
The location of shots doesn’t affect Corsi or Fenwick calculation. That just all shots and all unblocked shots, and the locations don’t matter. So when I say that the Leafs have a 55% Corsi right now, we can be confident in that number like we’ve always been. Which means that after six games, we should be happy with it, but not quite ready to expect the rest of the season to average out to that number.
But when I tell you Expected Goals information about a team or a player, you should assume it’s wrong.
We now think these two articles are based partly on flawed information:
Expected Goals models in hockey all follow a similar form. They start with the location of unblocked shots and using calculations from the thousands of historical shots on record, determine the chances that a shot taken in that way and from that spot will be a goal if taken by a league average shooter on a league average goalie. This means that the result is largely dependant on and will fairly closely mirror Corsi or Fenwick results, but it has been refined to weed out high-volume but low-quality shooters, or to give high quality shooters their due. Expected Goals can also tell you more clearly how good your defenders perform and, in particular, how good your goalie really is.
If all shots in general are being recorded a few feet farther from where they actually take place, and mostly in terms of the distance from the net, the historical Expected Goals calculation, applied to this new data will show everyone as having a lower Expected Goals than they would be if the puck location of the shot was accurately recorded. But by how much? We simply don’t know. We’ll be getting the wrong impressions about the quality of the play leading up to a shot, and the quality of the shooter.
Arvind wrote about the Leafs top six, and he used these charts to talk about Auston Matthews this year compared to last year:
This first image shows information unaffected by shot location. You see the Matthews line with fewer shots against (CA/60) and more shots for (CF/60). We should assume that likely means the line is spending a greater percentage of time in the offensive zone. Good stuff, exactly what we want to see.
But when we analyze the Expected Goals and actual goal results, something troubling appears. Expected Goals are down at both ends of the ice, and yet actual goals, both for and against are up.
And now we know why. When we were assuming that the NHL shot location data this year was as reliable as it ever was, the only explanation for this was that Matthews and his linemates were actually producing play of such a poor quality that they were washing out all the benefit of their improved Corsi, but were lucky enough in the actual goals going in, it was hard to see just from their boxcars that they had some improvements to make to their play.
What’s still true is that actual goals are a really bad way to judge a game, a player a team or a season.
Now, by understanding that the Expected Goals model used was applying expectations based on the old shot location methods to the new data, we can understand that the Expected Goals on both ends of the ice are understated. The amount of actual goals scored over Expected is also overstated.
I also noticed this “problem” of the Leafs doing well in Corsi (55%) and badly, or more poorly in Expected Goals, and I saw it as a team-wide phenomenon. I even said this:
The Leafs are grading out, as a team, at below league average offensively. And yet, if you shoved that red blob up towards the net ten feet, you’re pretty much good. Okay, maybe 20 feet.
If you mentally adjust for this change in data recording, you can take the HockeyViz heat maps of Maple Leafs shooting in our articles and imagine them with all the shots, both for and against closer to the net, and therefore higher quality shots in general. You get a Maple Leafs team that is high-octane offence and really rather bad defensively.
Suddenly the Maple Leafs make sense again.
The Eye Test and Mental Adjustments
The purpose of an unreliable narrator in fiction isn’t just to pull the rug out from under you and tumble you into confusion. It’s supposed to make you go back and reexamine what you thought and why you thought it about everything that’s gone before.
When I wrote that joking statement above about shoving the red blob up a few feet, I was revealing that I was unsettled by what the shot location results were showing for this season. But I wasn’t at the point where I could have found the issue the way EW did. My eye test of the Leafs backed the view that they weren’t getting to the net the way they normally do. I was more surprised by the better defensive results the data showed.
So how to reconcile this? Perhaps the most likely answer is that the Leafs in general are not shooting as well as in the past, but the data recording issues are exaggerating it. Some early anecdotal work is suggesting that the discrepancy is more like four or five feet in difference, not 10 or more.
Trying to mentally adjust Expected Goals results is a minefield of cognitive bias. When fans take something like Corsi and apply a mental adjustment for something they believe matters like zone starts, they use it to adjust the data to what they want to be true. That’s human nature. Don’t even try it, is my advice.
It’s entirely possible that in some days or weeks, we’ll have a mathematical fix for this that will align these new results with the old format and all will be as normal. But for now a useful tool is, if not broken, held together with duct tape and hope.
Expected Goals is the base of most of the tools that attempt to isolate individual player performance. For years, the carping hordes of naysayers have claimed that measuring hockey results with shots doesn’t allow for the context that viewer impressions of play brings. They might choose to seize on this problem now and claim their big ah ha moment. They will want to run about adding just exactly the right amount of mental adjustment to get the answer they knew all along was true.
The next time someone decides the blond man who is soft as gossamer or angel’s wings, might shoot a lot, but it’s all from the outside, the counter argument is going to be harder to make. At the same time, the big, tall and square-jawed hero who seems to have taken a big step defensively this season might not actually be as advertised.
I don’t want to return to a world where softness is the contextualizing value applied to Corsi. I don’t want to hear about how some viewer’s recent impressions steeped in confirmation bias are the right opinion. I don’t want to rely on my own impressions! I want to remember they’re suspect. I want the lesson of this event to be that our tools we use to evaluate data are good ones. So good, we can find our own mistakes or the mistakes of others.
In these desolate times, where truth itself seems under siege, where “I reject your reality and substitute my own” isn’t a joke but a modus operandi of the braying pontificators who pollute society, I’ve held pretty tightly to the good-hearted and honest efforts of sports analysis that seek the most true things there are to be said. This has to be a hard bump in the road for them as they try to resolve how to rebuild their tools, how to tell truer truths again. I have confidence in them that they’ll find a way to handle this.
For now though, maybe a Corsi-focused analysis of the Leafs is in order. Coming soon at PPP.