Previously I noted as shown in the table below how the large year to year variation in the correlation (R-squared) of Fenwick Close to P% should raise suspicions about the reliability of the correlation among users of Fenwick Close. This variability is shown by the average R-squared correlation of 35.6%+-15.1% over the previous 6 seasons. Note the standard deviation is over 40% of the R-squared correlation (15.1%/35.6%) which suggests at best a wide bodied Bell Curve of limited utility and at worse a Multi-Modal distribution function.
The Concern with Wide Body Gaussian Distributed Data
To Illustrate a potential problem with Fenwick Close consider the following example of a multi-modal distribution (the distribution shown is tri-modal) that is approximated with a Wide Body Gaussian (normal/bell curve). Applying a linear regression which assumes the data is Normally Distributed ignores the existence of these real modes. That is, the linear regression ignore 'reality" which can lead to erroneous or false conclusions by ignoring these real effect that is occurring at these three peaks. To see if this applies to Fenwick Close vs P%, we need to test for normality of the data.
To test normality (that data follows a typical bell curve for which linear regression is applicable), six seasons of data from all 30 NHL teams of Fenwick Close vs P% was tested and failed the Anderson Darling Normality (46%). Therefore, Fenwick Close vs P% does not conform to Gaussian or Normal Distribution which is confirmed visually below in the plot of predicted P'% - P%. Note that Fenwick Close passes the normal distribution test, however P% and more critically the error term (residuals) fail the normality test which is an underlying assumption when using linear regression models.
The distribution shape is better described as an asymmetric multi-modal distribution with small positive skew.
Implications of Non-Normal Distributions
There are some implications being the distribution is not a well behaved normal plot:
- Over the seasons, a higher # of team have a slightly negative fenwick (median is negative)
- Teams with a small positive or small negative shot differential win less often or win more often respectively then what a typical normal distribution would suggest (this happens frequently)
- Teams who out shoot by a large margin lose more then what a typical normal distribution would suggest (this happens moderately)
- Teams who are out shot by a large margin win more then what a typical normal distribution would suggest (this happens less frequently).
In summary, Fenwick (shot differential) is less important factor then what we may expect from Normal/Gaussian Distribution.
What does this mean about Correlation?
But the more important point is that because Fenwick close vs P% fails the normality test , a linear regression is not appropriate and the results of R-squared correlation between Fenwick Close and P% are uncertain and cannot be trusted. The residuals (error term) have excessive skew (too many errors in the same direction) which introduces a systematic error into the linear regression. The likely sources of this error is from P% itself which I tested and is not a normal distribution and/or perhaps the linear relationship between Fenwick Close and P% is simply not valid.
How to Correct These Errors (And Future Work)
Correcting the systematic error maybe as simple as removing the team points from shootouts or perhaps ignoring results from "blow out" games but further work is needed as the systematic errors make the regression and correlation suspect. Conversely, if Fenwick Close is telling us something about "winning" it maybe missing some important information such as perhaps goaltending SV% (just speculation) that would make the relationship to winning (P%) more statistically robust.
Another possibility to remedy the error is is the application of a nonlinear transformation of Fenwick Close. That is, the relationship between Fenwick and P% maybe log, quadratic or exponential or some non-linear expression. Finally, another source of error could be from outliers. If this is the case, then we need to ignore the tails of the distribution which could imply that teams with extreme high or low Fenwick Close will be ignored as they distort and introduce these systematic errors.
Fenwick Close to P% linear correlation is statistically uncertain but based on observation, one could conjecture that Fenwick Close may still be a useful rule of thumb or approximation in hockey analysis. However, care must be taken as one cannot rule out the probability that Fenwick Close vs P% may lead to erroneous conclusions.