Over the past three years, I have put together a series of statistical models that predict NCAA Tournament games, and, building off of each game, the tournament as a whole. The models started as an independent research project I did at St. Olaf College with Dr. Matt Richey. For a given game, the models use each team’s statistics and information from regular season games to predict (1) each team’s win probability for that game or (2) a point spread for that game. I use a handful of different models, so each game will have multiple probabilities/point spreads to consider, and will not always agree with each other.
What goes into the models?
The models have a Fat Eddie Lacy-sized amount of data to consider, everything from high school recruiting ranks to margin of victory. I use basic statistics (points, rebounds, assists, steals, blocks, shooting percentages), advanced statistics, as well as some metrics I have created on my own. Some variables consider every game from the regular season, while others only consider the last ten games, conference games, games vs. ranked opponents, or games vs. top 100 RPI opponents. I have also created a number of indicator variables, such as “did a team win their conference tournament?” and “did a team make the tournament last year?” Gathering, organizing, and validating the data has taken years, but, there is now tons of data to work with, and different models are able to utilize different factors of the data. The data includes every tournament game from 2001-2016, and continues to incorporate new games as they are played each year.
How accurate are the models?
Since a new tournament only rolls around once per year, we need quicker ways to assess the accuracy of these models. One way of doing this is by using historical data as “testing” data. For example, we know who won every game in the 2012 tournament. So, we can generate predictions for all 2012 games (just as we would for a new tournament) and then compare those predictions to the actual outcomes to assess accuracy. Repeating this process provides the ability to look at how the models have performed for each year (or each game). Under these testing scenarios, the best models get 85-88% of games correct. The bracket below shows displays an example of the testing scenario for 2012 (as outlined above). Every team that advances in the bracket represents the model’s prediction, with green colors indicating a correct prediction, and red colors an incorrect prediction:
When looking at the tournament as a whole, quantifying accuracy gets complicated, as one loss can make things turn ugly in a hurry (e.g., Michigan State losing in the first round in 2016). While keeping that in mind, under these testing scenarios, the best models have gotten 10 or 11 of the last 15 champions correct, along with an average of 2.9 final four teams, 6.9 elite eight teams, and 13.9 sweet 16 teams per year.
What about during an actual tournament?
I have had some version of these models in place for the past three years (2014-2016), and when applying them to a real tournament (i.e., generating predictions before the games are played), the best model got 70% of the games correct in 2016, and 79% correct in 2015. There will always be games that are nearly impossible to predict, but, ideally those figures creep towards the 85-88% accuracy that is achieved on the testing data. The image below shows an example of what one model predicted for the 2016 tournament, with each team colored based on their predicted win probability:
Since I have multiple models to consider, I never fill in my bracket solely using one model’s numbers. Rather, I look at predictions from all of the different models before coming to a consensus (ideally, there is some agreement among the models, giving more confidence in the predictions). I have been using this method for the past three tournaments, and have gotten two final four teams correct each year, as well as one team playing in the championship game each year (Connecticut ’14, Wisconsin ’15, and Villanova ’16) and one champion (Connecticut). From 2014-2016, these consensus picks averaged 5.7 elite eight teams correct per year.