The following article provides a preview of what our predictions will look like for the upcoming 2017 NCAA tournament. For every game, the models produce a win probability for each team, which we use to fill out the bracket as a whole. Check out all of our predicted brackets dating back to 2001.
As an example of how to interpret the predictions, the table below shows our final four predictions from last year’s tournament. In row 1, we give Villanova an 88% chance of beating Oklahoma. Conversely, this also tells us that Oklahoma has a 12% chance of winning the game. We would call our prediction correct if the actual winner of the game had a win probability > 50% in our model. For a more detailed look at our methodology, check out this article.
|Team1||Team2||Predicted Winner (Win Probability)|
Building off of the individual game predictions outlined above, we advance the team with the higher win probability in each game to fill out our brackets. Using this method, here are what our brackets look like for every tournament dating back to 2001*:
2001 Bracket 2002 Bracket 2003 Bracket
2004 Bracket 2005 Bracket 2006 Bracket
2007 Bracket 2008 Bracket 2009 Bracket
2010 Bracket 2011 Bracket 2012 Bracket
2013 Bracket 2014 Bracket 2015 Bracket
We can look at accuracy in terms of “what percent of games are predicted correctly?” or we can assess accuracy for the tournament as a whole (by advancing the team with the higher win probability in each game). Our best model correctly predicts the winner 87.1% of the time, and averages 6.5 Elite 8 teams, 2.8 Final 4 teams, and has correctly picked 11 of the past 16 champions. The table below provides some summary statistics on how the predictions from this model have performed in each year:
|Year||% Correct||1st Round||Sweet16||Elite8||Final4||Champ|
Check back once the bracket is released for our predictions for this year’s tournament
*These brackets were generated using average probabilities from eight separate models, advancing the team with the higher win probability in each game. For each year, all models were generated without data from that given year (e.g., for the 2016 tournament, models were built off of data from 2001-2015). The eight separate models each utilize a different collection of variables and/or a different prediction technique such as logistic regression, boosting, or lasso regression.