As hockey fans, we're all pretty familiar with metrics of player performance. Typically, these are presented as raw numbers. For example, Nazem Kadri had 50 points last year. As a purely descriptive statement, there's no issue with saying that. However, you need more information to interpret that effectively. In particular, you need to know how it relates to Kadri's peers, in this case, the rest of the league.
That's the basic idea behind era-adjusted numbers, which are very common in baseball, but not something that have really taken hold in hockey. As times change, the interpretation of statistics do as well. As we all know, 50 points in the 80s isn't as impressive as 50 points in today's goal scoring environment. In order to interpret what 50 points means in today's context, you have to present it along with information on what the average hockey player achieves in that context. Now, for points, this may not be that big of an issue. Due to their ubiquity, all hockey fans have essentially memorized reference points in evaluating the scoring ability of players. We immediately internalize someone who scores 80+ points as a super star, or less than 5 points as Colton Orr. However, that's not as easily done for stats that are just starting to become common in mainstream analysis, such as Zone Start %, Points per 60, and so on. For these metrics, comparing a player's stats relative to his peers becomes more important.
I wanted to create a tool that made it easier to do just that. What I also wanted to do was to make the comparisons more visual than simply presenting the metrics scaled to a baseline, the way ERA+ and OPS+ are done in baseball. There's nothing wrong with that, but I think (for me anyways) adding a visual element makes the information presented easier to digest and easier to discuss.That led me to the idea of using radar charts. Ted Knutson of StatsBomb popularized this in soccer, and his work is essentially what gave me the idea of using this method. Essentially, a player's performance in several key attributes are plotted against certain reference points (50th percentile, for example), and provide the reader with an idea of the raw numbers the player has compiled, as well as how that ranks across the league.
The Chart, and The Interpretation
This is the chart I generated as of January 9th for Jakub Voracek. I'm using him as an example because his is a simple chart to look at and understand. All data is 5 on 5, from war-on-ice.com. The attributes, starting from 'G60' and going counter-clockwise are as follows: Goals per 60 mins, Assists per 60 mins, Relative Corsi For %, Corsi % of Teammates, Offensive Zone Start %, Personal Shooting %, On-Ice Shooting %, and Penalty Differential (raw). This dataset only includes forwards, so it doesn't accidentally compare Voracek to Roman Polak. The rings represent even intervals between the floor of the lowest value and the ceiling of the highest value for that particular attribute. For example, the lowest value in G/60 is 0, and the highest is 2.3 (Rick Nash). Therefore, the jump from one ring to another is ((ceiling(2.3) - floor(0))/# of rings including outer boundary = (3 - 0) / 5 = 0.6. The 'origin' and outer boundary of the chart are unlabeled for readability reasons.
We can immediately see that Voracek is an elite point scorer this year, around the 75th percentile in G/60, and well beyond it for A/60. To add to that, he seems to be a solid possession player relative to his teammates, although he does have an advantage playing with a very high quality of teammates (near the 75th percentile) and having very favourable zone starts (above the 75th percentile). Both his personal and on-ice shooting percentages are well above the average, meaning we can expect some regression, and lastly, his penalty differential is right around average.
Now, it's obvious that this tool doesn't delve too deep. It can't say how much of Voracek's scoring is boosted by his zone starts, or his high shooting percentages. But what it can do is provide a high level overview of how Voracek (or any other player) has been used, how he's produced, and what his relative strengths are as a player. More importantly, it can do that in one concise and understandable image. Where I feel this can be useful is for player evaluation (midseason/post-season assessment), as well as getting a basic idea of a player you aren't too familiar with (in quick evaluation of trades or free agent acquisitions, I could see this being useful). Ultimately though, it doesn't matter what I think. It matters what the community thinks of this tool, and whether they feel it has potential. Feel free to chime in with improvements, ideas, and criticisms.
As I've alluded to, I'm aware that the current iteration of radar chart I have is by no means a finished product, and there are certainly things I want to improve. A few of them include:
- Having a table on the side that has the actual raw values for each attribute
- creating a radar chart to be used for defensemen (I don't particularly care what percentage a defenceman shoots, because it's even more random than with forwards, and their point rate isn't that important)
- Potentially getting rid of the 25th/75th percentile plots, as it may result in information overload. What do you guys think about that?
- Adding functionality to chart one player against another (i.e. Nazem Kadri vs Tyler Bozak on the same plot). This actually wouldn't be that difficult, from what I can tell, and potentially would be very useful.
- As of right now, if someone touches the outer boundary, it means they possess the most extreme (positive) value in the dataset. However, if someone possessed the most extreme (positive) value in the dataset, they won't necessarily touch the outer boundary. This has to do with the methodology I detailed earlier of making the 'origin' the floor of the lowest value and the outer boundary the ceiling of the maximum value. I considered making the outer boundary the 95th percentile instead, so that it would be an easy marker of the 'elite' class across each attribute. Ultimately, I decided against it because you end up losing data in the chart when comparing between people in that top 5%.
My ultimate goal for this project is to eventually make a web tool that would allow people to generate these charts whenever they want, for whoever they want. At present, the hangup is getting the data in the format that the script I wrote requires. Currently, I download a CSV from war-on-ice.com and run a quick macro on it for formatting. From there, the program is pretty much automated, and I just have to specify the player name. I think it's possible to write a batch file to automate that first part, but I'm not very familiar with that, so it'll involve some exploration on my part. I'm excited to see what others have to say, positive or negative, so once again, please let me know if you have any comments, suggestions, or criticisms.
So I've gotten some really awesome feedback already. One suggestion I've seen a few times is to just plot the percentiles, as opposed to the raw values. I whipped up a quick chart doing that, so let me know if it's an improvement.