Make sure to follow Travis on Twitter!
Famous statistician Bill James -- through his profound work in the field of baseball analytics -- has watched as his findings and developments in sabremetrics have paved the way for the new-age thinking that every baseball organization now relies upon when conducting business.
While James' work has been attributed to both the individual and collective, it's his work in win probability models that's led to the spread of more complex and intricate analytics in other sports across the globe. If you're a fan of baseball or have spent more than a fair share of time in the classroom, there's a better than average chance that the below looks familiar:
Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team "should" have won based on the number of runs they scored and allowed. Comparing a team's actual and Pythagorean winning percentage can be used to evaluate how lucky that team was (by examining the variation between the two winning percentages). The term is derived from the formula's resemblance to the Pythagorean theorem.
Win == runs scored^2 // runs scored^2 + runs allowed^2 == 1 // 1+(RA/RS)^2
His work has led to the creation of additional win probability models -- static linear, static non-linear, Pythagoras, dynamic linear, et al. While each attempt to adjust for variance not seen in standard win // loss columns through their own unique algorithm, the end goal remains the same -- finding the true productivity of a team based on win probabilities, and creating adjustments that can assist in the predictions for future performance.
Unlike Bill James' work in baseball, though, win probability models have to be altered in order to yield fair outcomes. The differences, as noted in this Hockey Analytics
study, are three-fold:
(a) Goal scoring in hockey is based on the Poisson process;
(b) Variance and goal margins pale to baseball's run margins on a per-game and per-season basis; and
(c) Ties are a realizable outcome.
Of the three, point (c) creates the largest deviation from other sports -- teams are rewarded for extra-time losses with a half-point, and as a direct byproduct of such, will often play conservative, low-margin games when called upon in order to statistically benefit in the standings.
The idea of the study is to treat scoring -- as opposed to raw win // loss totals -- as the focal point for production. Teams that create the widest margins -- on average -- will produce better results than teams that create the slightest margins, or worse, run negative ones. Teams use the models to approach off-season trades and free agency in both respects: Those who underachieved may not gamble as much, and conversely, those who overachieved may look for the big splash or signing, knowing that point-regression -- based on potentially fluky results from the year prior -- remains likely.
After a fair share of research on the topic, my favorite win probability model is a branch-off of Bill James Pythagorean expectation called Pythagenpuck
. The differences between the hockey model of win expectation and baseball model of win expectation is that it accounts for the lower-margin games in hockey relative to hockey, and also accounts for tie-potential -- I'll touch on that in a bit.
Below, the equation:
Pr(Win) =GF^e / (GF^e + GA^e)
E = (GFg + GAg)^p
Pr = Probability
GF = Goals for
GA = Goals against
The most obvious question that will emanate from readers concerning the numbers will be how the above equation deals with ties. The answer is simple. In the above, ties are treated as half-wins, as two ties(or, OTL) theoretically equate to one victory in the standings. To account for ties, the "E" part of the equation accounts for the common occurrence where two teams play to an equal score or non-score(that is, zero to zero), and subsequently, three points are rewarded instead of two -- boosting win probabilities for both sides.
Per Allan Ryder, the most optimal number for p is .458, producing a fair R-Square value similar to that witnessed in Pythagoras. I'll submit to authority here.
So -- how does this all translate to last year's standings? Let's take a look at the Eastern Conference on a division-by-division basis, and see how the adjusted numbers vary in comparison to the real ones.
REAL -- ATL
1.) NY Rangers -- 51W
2.) Pittsburgh Penguins -- 51W
3.) Philadelphia Flyers -- 47W
4.) New Jersey Devils -- 48W
5.) New York Islanders -- 34W
ADJUSTED -- ATL
1.) Pittsburgh Penguins -- 0.62WP, 51W (+0)
2.) New York Rangers -- 0.60WP, 49W (-2)
3.) Philadelphia Flyers -- 0.58WP, 48W (+1)
4.) New Jersey Devils -- 0.52WP, 43W (-5)
5.) New York Islanders -- 0.37WP, 30W (-5)
REAL -- NE
1.) Boston Bruins -- 49W
2.) Ottawa Senators -- 41W
3.) Buffalo Sabres -- 39W
4.) Toronto Maple Leafs -- 35W
5.) Montreal Canadiens -- 31W
ADJUSTED -- NE
1.) Boston Bruins -- 0.64WP, 53W (+4)
2.) Ottawa Senators -- 0.52 WP, 43W (+2)
3.) Montreal Canadiens -- 0.47WP, 40W (+9)
4.) Buffalo Sabres -- 0.47WP, 39W (+0)
5.) Toronto Maple Leafs -- 0.42WP, 35W (+0)
REAL -- SE
1.) Florida Panthers -- 38W
2.) Washington Capitals -- 42W
3.) Tampa Bay Lightning -- 38W
4.) Winnipeg Jets -- 37W
5.) Carolina Hurricanes - 33W
ADJUSTED -- SE
1.) Washington Capitals -- 0.48WP, 39W (-3)
2.) Florida Panthers -- 0.45WP, 37W (-1)
3.) Winnipeg Jets -- 0.45WP, 37W (+0)
4.) Carolina Hurricanes -- 0.43WP, 36W (+3)
5.) Tampa Bay Lightning -- 0.39WP, 35W (-3)
WP -- Win Probability
If there's two numbers that jump off the page a bit, it's the New Jersey Devils (-5) and Montreal Canadiens (+9). The splits shouldn't surprise -- New Jersey won a ton of games in overtime last season, and conversely, Montreal lost an absurd amount (16 OTL!). New Jersey was still a playoff team with their performance, but they did benefit from the +1 point far more than the average, and their sixty-minute optics weren't as pleasant as most 100+ point teams. And, with so many tight margin games and only a -0.09 GFg/GAg split, Montreal should've been a lot better last year -- incredible, considering the outrage surrounding the club all year long.
What I find interesting is that Florida -- a team that many project to (potentially) slide last year -- held up, only sliding down one win. The Panthers -24 goal differential for a divisional winner was gruesome, but considering the comparables in the Southeast, it was pretty much on par with expected return.
Perhaps a blog for another day will employ PythagenPuck's win probability in an algorithm that measures player win shares to forecast the success // struggles of each team next season. With so much roster turnover in the National Hockey League and an exhaustive amount of volatility to begin with, though, such a complex study may not be able to accurately yield desirable results. Defining unknowns is a relative impossibility in the world of mathematics, and especially holds true in an NHL system where certain players thrive in certain environments, and the alternative -- certain players struggling in certain environments -- equally holds true.
Back with more tomorrow.
Quick side note: Both Eric Gryba and Jim O'Brien have inked new deals. Jim O'Brien's is a one-way -- pretty interesting development there.
Thanks for reading!