NBA Modeling Methodology

Our data is built off of all regular season games since the 1989-1990 regular season, giving us 27 seasons in all. It was fun to test our models performance over the Jordan-dominated 90’s, the Shaqobe 00’s, and to the modern pace-and-space era. There is no doubt that the style of play in the league has changed drastically over this time. This poses a challenge for modeling the most recent seasons, as most of our training data will not reflect the boom in outside shooting that we are seeing in the game today.

We utilized numerous modeling techniques such as logistic regression, random forest, and penalized regression to build our NBA Playoff models. Model evaluation was done primarily through comparing accuracy on of out of sample predictions. That is – training a model on the data with a particular playoff matchup removed, and then using the model to create a prediction for that matchup to see if it was correct. Doing this for every matchup in our data allows the model to be tested without already being built to fit that test. Variable selection was done through stepwise regression, our own basketball intuition, and, of course, trial and error. Finding the right combination of variables to employ was challenging because so many basketball variables had strong co-variance. For example – effective field goal percentage, true shooting percentage, and field goal percentage are extremely similar calculations.

Model Accuracy

For now, we focus on three distinct models, and our predicted brackets are generated by taking the average of their outputs for each series. The team with the higher average win probability across the three models is advanced to the next round, and the process is repeated until we have a champion. Using this method, we have been able to correctly predict 15 of the 27 champions (56%) going back to 1990, and 25 of the 54 conference champions (46%). Overall, our model has correctly predicted the winner in 78% of playoff matchups since 1990, based on out-of-sample testing.

Using these same methods and models, here are what our predictions looked like for the 2015 and 2016 NBA Playoffs, with the correct predictions marked in green. In absolutely shocking fashion, it had the Warriors winning both years. In both years combined, it predicted 25/30 series winners correctly, and only missed one winner in the 2015 playoffs: