The Methodology Behind our NFL Playoff Models

This article provides some background on the components of each NFL Playoff Model. By making use of multiple models that are comprised of different modeling techniques and variables, we can better assess each game. For example, if all the models are predicting the Pittsburgh Steelers to beat the Miami Dolphins, we can feel very confident in picking the Steelers to win. If the models are split, we can explore each model individually based on their predictors to identify areas where the model may be taking into account less than perfect information. For instance, if one model emphasizes defensive performance and Seattle is currently missing Earl Thomas, maybe that model does not represent Seattle as accurately as the others. Ultimately, by looking at multiple models we are able to reduce the noise inherent in all predictive models.

Model 1 has the fewest variables of all models, yet has been the most accurate when judging accuracy based on percent of out-of-sample games predicted correctly (71.4%). This model accounts for basic statistics (e.g. Total Yards, Points, 3rd-down conversion rate, etc.) as well as playoff experience and performance against other playoff teams.

Model 2 contains the same variables as Model 1 but uses a different prediction technique. While this model did not predict as accurately on out-of-sample games (68.4%), it correctly identified more Super Bowl champions than Model 1.

Model 3 is our base model, meaning it makes use of only basic team statistics (e.g. Total Yards, Points, 3rd-down conversion rate, etc.). Simple models can often be a good measuring point for the baseline in predictive accuracy. They also provide for easier interpretations.

Model 4 uses an ensemble approach by averaging predictions from multiple modeling techniques. One unique component of this model is that it heavily weights elite individual player performance at key offensive and defensive positions. This model has predicted the most Super Bowl champions correctly.

Model 5 weights recent performance more heavily than the other models. A few of the significant predictors in this model include performance over the second half of the season, strength of schedule, and margin of victory.

Model 6 is built off of a different data set than models 1-5, and thus provides a unique prediction relative to the other models. This model accounts for the amount of rest that each team has had coming into the game (e.g., are you coming off of a bye week?), and is influenced by offensive efficiency, defensive rushing efficiency, points against, passing yards, defensive third down percentage, number of drives per game, red zone success (offense and defense), and more.