The following article provides a brief overview of our March Madness win probability models and their historical bracket predictions on test data. For every game, the models produce a win probability for each team, which we use to fill out the bracket as a whole by advancing the team with the higher win probability. Check out all of our predicted brackets dating back to 2001.

Prediction Example

As an example of how to interpret the predictions, the table below shows our final four predictions from 2016. In row 1, we give Villanova an 84% chance of beating Oklahoma. Conversely, this also tells us that Oklahoma has a 16% chance of winning the game. We would call our prediction correct if the actual winner of the game had a win probability > 50% in our model. For a more detailed look at our methodology, check out this article.

 Team 1 Team 2 Predicted Winner (Win Probability) villanova oklahoma villanova (84%) north-carolina syracuse north-carolina (89%) north-carolina villanova villanova (67%)

Previous Brackets

Building off of the individual game predictions outlined above, we advance the team with the higher win probability in each game to fill out our brackets. Using this method, here are what our brackets look like for every tournament dating back to 2001*:
2001 Bracket     2002 Bracket     2003  Bracket
2004 Bracket     2005 Bracket     2006 Bracket
2007 Bracket     2008 Bracket     2009 Bracket
2010 Bracket     2011 Bracket     2012 Bracket
2013 Bracket     2014 Bracket     2015 Bracket
2016 Bracket     2017 Bracket

Model Accuracy (During Actual Tournaments)

I have used these models in 2014, 2015, 2016, and 2017 NCAA tournaments. Each year, it has gotten 2 Final Four teams correct. In 2016, it predicted 70% of winners correctly and in 2017 it got 76% correct. Here is a recap of how all our models performed on 2017 predictions, and you can also find our detailed predictions from every game in 2017 here.

Model Accuracy (on Test Data)

We can look at accuracy in terms of “what percent of games are predicted correctly?” or we can assess accuracy for the tournament as a whole (by advancing the team with the higher win probability in each game), placing a higher weight on Final Four teams, champions, etc. Our composite win probability model correctly predicts the winner 84.6% of the time on historical predictions, averages 2.9 Final Four teams, and has correctly picked 9 of the past 17 champions.

Check back once the bracket is released for our predictions on this year’s tournament

*These brackets were generated using average probabilities from seven separate models, advancing the team with the higher win probability in each game. For each year, all models were generated without data from that given year (e.g., for the 2017 tournament, models were built off of data from 2001-2016). Each separate model utilizes a different collection of variables and/or a different prediction technique such as logistic regression, boosting, or lasso regression.