[Ed note: Things are slightly busier than usual at what a cynical observer would refer to as “my real job”, but I had half this post written and I wanted to finish and post it while it could still be remotely considered “topical”. Please enjoy this mostly unedited, poor excuse for content and, as always, thx u 5 reading.]
I don’t really “do” hockey analytics. I’m just a guy who knows how to math and read, or at least that’s what I put on my resume. I still try to keep abreast of what’s going on in StatsWorld (AKA the least exciting theme park in existence) because it broadens my horizons in terms of my ability to understand what’s happening on the ice, and gives me another interface with which to connect to general hockey discussions.
That said, I know stats aren’t everyone’s jam. But here’s something to consider: if you’re a person who dismisses analytics as unimportant, first of all that’s really cool because I had no idea Brian Burke read this blog1, and secondly you’re on the wrong side of history. Mathematical analysis started trumping human intuition in a bunch of other fields before it got to sports. No one raves about their stock broker saying “I love this guy! Doesn’t use any mathematical tools at all, he just goes with his gut!”, and analogous attitudes really shouldn’t persist in sports once Don Cherry goes off the air. If this fact bothers you, you ought to start making your uneasy peace now. Science works, and it is precisely for that reason that it’s not going anywhere.
(If you’re simply the sort of person who dismisses analytics in hockey as uninteresting…I can’t really help with that and you should probably stop reading now. I promise I’ll be back next month with a funny post. Sorry.)
Anyway, in a halfhearted attempt to stay ahead of the curve, I went to the Rochester Institute of Technology Hockey Analytics Conference last weekend to talk hockey with some of my favourite nerds, and also Steve Burtch. Based on what I’ve seen and heard, here is what I think you need to know about where hockey stats are at and where hockey stats are going.
1. There is no one-size-fits-all analytics background
When all you’ve got is a hammer, everything starts to look like a nail. When you’ve got a finance background, as Kyle Stitch does, you can use that to apply beta values to analyze player consistency and contract risk. When you’ve got a business background, as Carolyn Wilke does, you’re particularly good at examining performance expectations with respect to contract size. When you’re a former goalie like Nick Mercadente, you become a lawyer, but then you also talk a lot about goalies. Hockey analytics is made all the more rich by the wide variety of backgrounds in the community, and you ignore people who are coming from a “non-traditional” background at your own peril and ignorance.
2. It is difficult to make predictions, especially about the future
At the team level, it’s been shown pretty consistently that shots (shots, in this case, being used in the Micah Blake McCurdy sense of the word, which is to say a shot on net, a miss, or a blocked shot) is a good indicator of team success in the long term. There will always be exceptions, but in the aggregate, shots are cool and good. However, if one is trying to use analytics to determine an individual’s talent level, things get a little more tricky. Ideally, you want a metric that measures talent to be fairly repeatable because you’re assuming talent is constant. To this end, goaltending and defense still lack a comprehensive and repeatable metric that can be used with authority. Evaluating junior players across different leagues also remains an underdeveloped area in hockey analytics.
The state of the union is that while goaltending, defense, and drafting are three very important considerations for a hockey team’s long term success, these are the things that have proven most difficult to create useful metrics for. When The Non-Believers say “Stats can’t tell you about what’s important”, in a way they’re right (for now).
However if you MUST trot out a stat about any of these things, please bear in mind that goaltending talent is differentiated most by save percentage of “high danger” shots, relative shot percentage (CorsiRel%) is a pretty repeatable metric for defensemen except when it’s not, and your guess is as good as mine when it comes to drafting.
3. This ain’t a scene, it’s a damn arms race
One of the more interesting trends is manual creation of new data sets. Jen Lute Costella in particular seems to have an ability to marshal a large number of loyal followers that is rivaled only by Voldemort (Look, you may not have agreed with everything The Dark Lord did, but you can’t deny he was a fantastic community organizer.), and she has used her army of devotees to track everything about every goal for the past seven years. This has resulted in a terrifingly large data set which, I’m told, has information that goes out to Row DO in Excel. Ryan Stimson’s Passing Project is another great example of people going out there and collecting data they wanted but didn’t have.
As the analytics community continues to approach the limits of what can be learned with shot-based metrics, the real cutting edge work is going to be done by people who have an infrastructure in place to create new, accurate data sets. As such, the days of becoming the next big hockey analytics superstar with nothing more than a spreadsheet and a dream are likely over at this point. You’re gonna need to bring some friends.
4. Work being done on the old data is going to inform what to look for in the new data
It seems reasonable to assume that more data is always better, but I don’t think that’s true. Everyone in analytics is trying to find the needle of truth inside the haystack of data, and simply dumping more data onto the haystack does not necessarily mean better inferences can be drawn. Faulty assumptions may mean that you end up going down a rabbit hole for a very long time. You still have to have an idea of where to look for what you’re trying to find. As such, a strong fundamental understanding of what’s important will be paramount in future research.
A good example of this was in Micah Blake McCurdy’s excellent presentation on zone starts. Intuitively you’d think that player usage must have an effect on player shot metrics, but Micah has shown that this effect is exceptionally small on average. The idea of a player who gets sheltered or buried is mostly a myth, but an assumption that usage is important may have led to bad analysis later down the line.
5. This new data is going to be difficult to interpret without knowledge of hockey systems
I would like to use an example from Jen’s spiffy data set to illustrate this point. One of the things Jen and Her Merry Band of Geeks tracked was the zone in which the 1st and 2nd assists originated. One of the things I noticed was that Kyle Turris has five times as many assists that originate in the defensive zone as Bobby Ryan. Now, is this because Kyle Turris plays with better puck moving defensemen, or is it because Bobby Ryan is not very effective off the rush? One couldn’t tell you without watching a lot of video or having a great deal of knowledge about Ottawa’s breakouts and offensive zone systems.
Context is important when using a stat, and it will only become more so in the future.
6. Integration is the watchword at the professional level
One of my least favourite tropes on Twitter is “If Hockey Team X hired an analytics expert, they could spend $100K a year to save $5 million a year.” There seems to be this idea that a smart person with access to war-on-ice.com should be able to pop up at meetings and say “Don’t sign Dan Girardi!” and fix the New York Rangers forever. However, as professionals such as Jack Han and Matt Pfeffer spoke at length about their experiences within hockey organizations, it’s clear that organizational buy-in is necessary for an effective contribution. Unless an analytics specialist has the ability to be involved in multiple levels of the hockey operation, they will simply be the person who delivers graphs to the coach or GM.
BONUS OBSERVATION: 7. The phrase “driving possession” needs to die.
“Drive possession” is just a way of saying “be good at hockey” for people who want to sound smart. It’s a term so catch-all that it’s been rendered largely meaningless. In the future, come specific or don’t come at all.
1. Low key, I think Brian Burke is one of the smartest guys in hockey. For one thing, his ability to troll a community that consists almost entirely of ostensibly intelligent people is unmatched. Dale Tallon could hold a press conference solely to announce “People who are interested in stats do not understand hockey, nor will they ever experience the loving touch of another human.”, and he wouldn’t even trend on Facebook. Only Brian Burke possesses that rare and wondrous ability to be totally confident in his own ignorance in such a way as to infuriate those around him. Also, if you take Burke at his word (which is not something I believe you should necessarily do), he is a guy who willingly eschews information that most people agree would make his job easier while still being quite good at his job. Think about that for a minute: Brian Burke intentionally handicaps himself, and yet is still one of the world’s thirty or twenty or fifteen best hockey general managers. That is amazing to me. Don’t dismiss Brian Burke; he’s got a lot more going for him than he’s given credit for.↩