clock menu more-arrow no yes

Filed under:

Playoff Prediction Model

New, 13 comments
In a previous post I discussed how I gathered up data from 75 playoff series from the five previous playoff years. In that post I provided some basic descriptive statistics of which factors were the best predictors.

I've taken the next step and put all of those factors into a multi-variate regression model in order to assess the relative value of each of these factions. For the non-statistical readers. regression is a great tool because it allows us to see which factors are most important in relation to other factors in explaining something (in this case which teams won the playoff series).

My overall model has a R-squared of .31 and three variables are significant (Shot %, Power Play Opportunities and Goals Against Average). In plain English, the model indicates that a) there is a lot of random crazy unpredictable stuff that happens in the playoffs and b) but there is also a predictable element as well. Three things (ST%, GAA, and PP chances) are the core of that predictable element.

After running my model on recent playoff history I then used the coefficients to try and predict each playoff series. Keep in mind the model can only tell us about the predictable part of the playoffs--there will always be a significant random/chance/luck element. To the extend that regular season numbers can guide us, the model makes the following predictions.

Who Will Win Each Series?
Eastern Conference
ATL 95% NYR 5%
BUF 100% NYI 0%
NJD 51% TBL 49%
OTT 46% PIT 53%

Western Conference
DET 41% CGY 58%
ANA 100% MIN 0%
VAN 0% DAL 100%
NAS 31% SJS 69%

Quarter Finals
BUF 70% PIT 30%
NJD 37% ATL 63%
ANA 42% CAL 58%
SJS 73% DAL 27%

Conference Finals
BUF 91% ATL 9%
SJS 65% CAL 34%

Stanley Cup Finals
BUF 48% SJS 52%


Unfortunately the model is not much help in predicting how long each series will go. When I ran a regression on number of games in a series nothing came up significant and the model explained zero variance.

Edit: How well did it work in the past?
OK, I went back and ran the model to see how well it did in the past 5 playoff seasons. Of course, since I used these years to create the model it better do something. Even with the small R-squared.

What is a good point of comparision? Well if we picked playoff series by a coin toss we would expect be right just 50% of the time. If we went with the home team every time we would get 68% of the series right.

Total Playoff Series Correct Predictions
2001 11/15 series 73% Using Home Ice 10/15 66% Difference +7%
2002 13/15 series 87% Using Home Ice 13/15 87% Difference 0%
2003 09/15 series 60% Using Home Ice 09/15 60% Difference 0%
2004 11/15 series 73% Using Home Ice 11/15 73% Difference 0%
2006 11/15 series 73% Using Home Ice 08/15 53% Difference +20%
Total 55/75 series 73% Using Home Ice 48/75 64% Difference +7%

So the model is about the same as simply using home ice until last year's playoffs where it performed much better. Why? Perhaps, it is because the playoffs were called much more like the regular season, thus making regular season statistics more useful. I noticed that scoring declined from regular season levels by roughly -15% in previous years, but last year playoff scoring barely dropped at all (-3%).

Additional Edit:
The Dependent Variable is won/lost playoff series coded 1/0.
Dependent Variables are:
Which team has home ice?
Which team has better offense?
Which team has better defense?
Which team has better PP%
Which team has better PK%
Which team has better SV%
Which team has better Shot %
Which team has more PP Opportunities?
Which team has fewer Times Shorthanded?
Which team has better goal differential?
Which team has better special teams goal differential?