With Super Bowl LI behind us, it is time for a forecasting post-mortem. It was a historic game for the Patriots. They now have the distinction of being the only team to play in 9 Super Bowls, the only team to win in overtime and the only team to make such a big comeback in the second half. If you have been following my previous articles, you will know that my predictions were less than perfect. In this article, I delve into lessons from a bad forecast. In addition, we have a few more Super Bowl visualizations that are interesting and informative.
Lesson 1: The first step is the most important
There is an old saying, “The best time to start something is 20 years ago, the second best time is now.” Last week, I knew very little about Super Bowl history. Today, I have a database of Super Bowl results through history, several predictors and first level analysis. On Sunday, another data point was added
When I am working with clients to setup a new Business Intelligence project, the earliest reports and analysis are often disappointing. There is so much data that needs to be collected, cleansed and aggregated to feed a Business Intelligence system. Often the data is initially inconsistent and incomplete, as it has not previously been scrutinized. This cleansing and completion of the data is the first step, but it’s a step that doesn’t happen until BI is in place. Those first reports are the first iteration of an iterative process that will continue over the life of the project.
We started these articles with a chart showing all 50 Super Bowls. That chart now shows all 51:
Lesson 2: The iterative process involves the tools and the data
You might have noticed, there are two differences to the chart above when compared to the same chart presented in earlier articles. First, the chart has picked up an additional data point, Super Bowl LI. The second change was in the scaling of the lower bars. The previous chart scaled upper and lower bars independently. This visually overstated the scoring of the losing team. This chart has updated to accurately reflect the numbers visually.
Similarly, I’ve made a change to the single game dashboard. Overtime has been added to properly show the results of Super Bowl LI. This was an obvious change which had to be made sooner rather than later. The dashboard’s predictor should also be updated, but those changes require more analysis.
Here is the dashboard for Super Bowl LI:
Lesson 3: Wrong guesses improves processes quicker than right guesses
When we predict correctly, we aren’t likely to change any processes. The process and tools don’t get iterated and there is no evolution. I don’t recommend making changes when things look right. But, this being the first prediction with this model, it’s almost certainly going to be at least partially wrong. The wrong prediction leads us to re-evaluate each of the pieces in the prediction process.
Let’s have a look at what our dashboard would have looked like if Atlanta had won:
Comparing the two dashboards, there are several predictors that I disregarded in my analysis that should be looked at once again. Maybe these predictors are better than my estimation. Maybe they should have been disregarded. Regardless, each should be re-evaluated based on this most recent outcome.
Lesson 4: Be careful with Correlation and Causation
Let’s just be honest, any of these predictors have some level of correlation, but even for the strongest correlation, there is almost certainly causation going on. As a result, none of these predictors will ever be close to 100%. That said, this doesn’t mean correlations are meaningless – just not as reliable as a full on causation. Take the teams season record; here is the season record visualization for the last 10 years plus Super Bowl LI:
This chart appears to show a reverse correlation between season record and winning the Super Bowl. There have been two exceptions to this correlation in the last 11 years: Super Bowl XLII and the game held last Sunday.
Lesson 5: There will always be outliers and exceptions
Every rule is bound to have exceptions. This is true of causation, correlations and every predictor we might look at. The problem is identifying what might be usable data and what should be thrown out. Taking our Bubble visualization, there are several areas that appear to be outliers:
The Purple arrow is pointing to a single entry that might be an exception to be ignored. It’s possible the areas circled in Green, where there is no previous Super Bowl record, should be thrown out altogether. Similarly, the area circled in Red might be an outlier. In this case, there are two options: throw the area out, or create a secondary rule. It’s likely that where a single predictor doesn’t work, a short iteration of rules might.
Circling back to Super Bowl LI. This is the first time a team has played 9 times in the Super Bowl. It is also the first time we have gone into overtime and the biggest comeback in Super Bowl history. It is possible any one of these factors could make Super Bowl LI a ‘red circled’ outlier or the basis for a secondary rule.
Bonus: Word Cloud of Super Bowl winners/losers
That’s the winners in blue, the losers in gold. Notice the Patriots have had the most wins and losses while the Falcons cannot be seen. Maybe this was the visualization to use for predictions.