My Model Monday: Modeling NFL Injuries

Injuries are inevitable in a game as physical as NFL football. Every season, numerous star players and important contributors are sidelined, leaving their fans and fantasy owners disappointed. Injuries appear to strike at random; an elite athlete can have his knee blown out in one cut like Dalvin Cook last season. However, is there a way to identify if certain players are more injury prone than others? I dug into the data to find out how well we can predict injuries among skill position players in the NFL.

Methodology

First, I pulled together a data set of season-level data for all running backs, wide receivers, and quarterbacks since 2009 from Pro-Football Reference. I then gathered data from weekly injury reports for all weeks since 2009. I had to build an estimation of games missed for each player based on whether they were listed as questionable, doubtful, or out. For “questionable” I gave a value of 0.5 games missed (assuming questionable players play 50% of the time), “doubtful” was 0.75, and “out” and “IR” was obviously 1.0. I then aggregated these values to get season totals for each player. Players that were placed on IR before the season started were excluded from the data. Finally, I joined the player data for a given year with each player’s injury data for the following season.

Using statistics from the prior season along with player characteristics like height, weight, and age, I fit two models for each position group to predict player injury risk. First, I used a logistic model to predict the probability of a player missing four or more games. This is a simple way to gauge a player’s overall injury risk. Modeling the actual number of missed games was more challenging because many players miss zero games while some miss 2-4 and some miss 10+. Eventually, I landed on a regression model that utilized a zero-inflated Poisson distribution. This technique accounted for the surplus of players who miss zero games in a season.

Quarterbacks

  • Some of the strongest predictors for injuries among quarterbacks are height (taller = more injury prone), weight (heavier = more injury prone), age (older = more injury prone), and fumbles from the year prior.
  • Interestingly, the amount of games missed to injury in the year prior does not seem to have much of an effect on injury risk in the following season. This is one reason why our model does not foresee Deshaun Watson as having a very high risk of injury.
  • The main driver in Jameis Winston’s modeled results were his very high 15 fumbles in 2017.

Wide Receivers

  • The strongest predictors of injuries for Wide Receivers are targets from season prior (higher = more injury prone), height (taller = more injury prone), weight (lighter = more injury prone), yards per game from season prior (lower = more injury prone), fumbles (higher = more injury prone), and games missed in season prior (higher = more injury prone).
  • Many of the players at the top of the list have thin frames (Will Fuller) or missed games in the prior year (Odell Beckham).
  • Players of note with low injury risk: JuJu Smith Schuster, Doug Baldwin, Julio Jones, Tyreke Hill.

Running Backs

  • The strongest predictors of injuries among running backs are weight (lighter = more injury prone), age (older = more injury prone), total yards from scrimmage in year prior (higher = more injury prone), fumbles in year prior (higher = more injury prone), total & avg games missed over career (higher = more injury prone).
  • As one would expect, players that get a high volume of carries and receptions are also more injury prone.
  • LeSean McCoy tops the list as the most injury prone due to his small frame (210 lbs) his high volume of total yards from scrimmage in year prior (1,586) and his career games missed 12.75 (since 2009).
  • Todd Gurley and Melvin Gordon are two high-volume backs that our models see as having relatively low injury risk in relation to other players who posted comparable statistics in 2017.

Model Accuracy

Modeling an outcome that is as unpredictable and volatile as NFL injuries is very difficult. While the models might be able to identify which types of players tend to get injured more frequently, predicting the actual number of games that a player will miss in a given season is less reliable. Below are plots depicting the predicted number of games missed vs. the actual games missed for quarterbacks, running backs, and wide receivers since 2013. The quarterback position has the strongest correlation between predicted and actual games missed (R^2 = 0.023). However, this is still a rather weak relationship. Wide Receivers have been the most difficult to model, which makes sense as they touch the ball less frequently and thus there is more randomness in the injuries they experience.

Conclusion

Modeling injuries in the NFL is tough and our modeled results for the 2018 season should be viewed with some skepticism. Perhaps it is most reasonable to use them to look at a player relative to other players at the same position, rather than locking in on a single player’s predicted values. That being said, there are some interesting takeaways that this analysis has brought to light:

  1. Height and weight have been some of the stronger predictors in our models across all position groups. I expected this, but was surprised that the relationships sometimes went in different directions. For example, heavier players were less injury prone among running backs and wide receivers but more injury prone among quarterbacks.
  2. Age is a big factor. Age was a strong predictor of injuries for running backs and quarterbacks. Older players tend to be more likely to get injured than younger players (or perhaps they do not recover as quickly).
  3. Injury history was clearly a big factor for running backs and receivers but it did not seem to stand out for quarterbacks.
  4. Fumbles in year prior was a significant indicator for quarterbacks and wide receivers. The more I think about it, the more this fits intuition. Fumbles are often caused by big hits that can jar the ball loose. Players who expose themselves to these hits or do not absorb contact well will likely fumble more often and get injured more often. And for quarterbacks, fumbles and sacks (which are also typically big hits) often come in the same instance. Quarterbacks who get the ball out quickly also get sacked (and injured) less often.