NBA Modeling Methodology

Follow me

Our data is built off of all regular season games since the 1989-1990 regular season, giving us 28 seasons in all. It was fun to test our models performance over the Jordan-dominated 90’s, the Shaq and Kobe 00’s, and to the modern pace-and-space era. There is no doubt that the style of play in the league has changed drastically over this time. This poses a challenge for modeling the most recent seasons, as most of our training data will not reflect the boom in outside shooting that we are seeing in the game today.

Modeling Techniques

We utilized numerous modeling techniques such as logistic regression, random forest, and penalized regression to build our NBA Playoff models. Model evaluation was done primarily through comparing accuracy on of out of sample predictions. That is – training a model on the data with a particular playoff matchup removed, and then using the model to create a prediction for that matchup to see if it was correct. Doing this for every match up in our data allows the model to be tested without already being built to fit that test. Variable selection was done through stepwise regression, our own basketball intuition, and, of course, trial and error. Finding the right combination of variables to employ was challenging because so many basketball variables had strong co-variance. For example – effective field goal percentage, true shooting percentage, and field goal percentage are very similar calculations.

For the 2017 – 2018 playoffs we derived our predictions from taking the average of the output of four distinct models. Three of these models were built off of team-level statistics from the regular season, while the fourth model uses player-level advanced stats such as box plus minus, win shares, value over replacement and usage rate. These player level statistics were aggregated for each team through two methods: using the maximum value, and using a weighted average from the top 7 players in minutes played.

Model Accuracy

In each playoff series the team with the higher average win probability across our four models is advanced to the next round, and the process is repeated until we have a champion. Using this method, we have been able to correctly predict 16 of the 28 champions (57%) going back to 1990, and 37 of the 56 conference champions (66%). Overall, our model has correctly predicted the winner in 79.5% of playoff matchups since 1990, based on out-of-sample testing.

Using these same methods and models, here are how our predictions did for all NBA playoffs since 1990. These historical calculations were generated by removing a given year from our data, refitting our models and then using our models to generate a bracket for that year.

Year	Round 1	Round 2	Conference Finals	Finals	Total Correct	% Correct
1990	7	2	1	0	10	67%
1991	6	4	1	1	12	80%
1992	7	3	2	1	13	87%
1993	7	3	2	1	13	87%
1994	6	1	1	0	8	53%
1995	6	2	1	0	9	60%
1996	6	2	2	1	11	73%
1997	8	3	2	1	14	93%
1998	6	4	2	1	13	87%
1999	6	2	0	0	8	53%
2000	8	3	1	1	13	87%
2001	5	2	1	0	8	53%
2002	8	1	1	1	11	73%
2003	7	3	1	1	12	80%
2004	7	3	1	0	11	73%
2005	6	2	1	1	10	67%
2006	8	2	0	0	10	67%
2007	5	2	1	1	9	60%
2008	6	2	2	1	11	73%
2009	6	3	1	0	10	67%
2010	7	2	1	0	10	67%
2011	6	2	1	0	9	60%
2012	6	2	2	1	11	73%
2013	6	2	1	1	10	67%
2014	6	2	2	0	10	67%
2015	8	2	2	1	13	87%
2016	7	3	2	0	12	80%
2017	6	2	2	1	11	73%

Model 284

Analyzing sports through math and models

NBA Modeling Methodology

Modeling Techniques

Model Accuracy