As Super Bowl LI, that’s 51 for the Roman Numerally challenged, rapidly approaches, millions of people are looking forward to the big game. Chief among these people are the diehard fans of the Atlanta Falcons and the New England Patriots. But there are also NFL fans, football fans, sports fans and so many among us who are eager to see what new commercials will be premiered in the most expensive 30 second slots of the year. I’m more of a fan of college football, but I am a bigger fan of data visualization and the Super bowl data has taught me some important lessons over the last couple of weeks. I present four of those lessons here.
Before plunging in, lets back up to the purpose of the visualization. It’s great to be able to see all 51 Super Bowls at a glance, but what am I looking for? Let’s look at it like a business. Super Bowls represent our sales. What we want to see are sales, pick out the trends, identify the outliers and, most importantly, use past data to predict the future. In short, we are trying to predict the results of the next Super Bowl based on the available data.
I gathered the data using a variety of resources and placed the results into a SQL Server database. In this case, the database is unimportant. We could create visualizations from a couple of Excel spreadsheets or flat files just as easily. The visualizations were created using IBM Business Analytics 11 (aka Cognos 11). The data was modeled in IBM Cognos Framework Manager 11.0, the dimensional data was modeled in Transformer 10.2.2.
Lesson 1: It doesn’t take much data to be too much data to analyze on sight:
Unlike your sales data, which can encompass millions of distinct records, the Super Bowl data is rather modest. In fact, the entire data set is 153 rows or records and maybe 20 columns. Altogether, it looks something like this:
Actually, that’s not all the data. It’s about a quarter of the data. All of it would take up several pages of this blog post. The point being, it’s impossible to show even a small data set in one view. If we can’t see it in one view, it’s impossible to analyze. Sure we can sort and filter and total the data, but that’s a lot of time, a lot of work and the results would be insufficient for our needs. If this was sales data, we would be better off trying to make sense out of the wallpaper (at least we could find obvious patterns in it).
If you would like to get your hands on this data please post a comment and we can push the data out to you in whatever format you prefer.
Lesson 2: Modeling data correctly sets the stage, but provides no answers
I have been known to say that the only data that models easily is sample data designed for that purpose. As noted above, we aren’t dealing with much data, but it still requires modeling. There are only two tables; Super Bowl and Super Bowl Teams. Super Bowl Teams is a fact table, carrying facts such as Score and Season wins/losses. Super Bowl is dimensional. It includes columns, which look like facts (Attendance, Temperature), but are not aggregable. Now that it’s modeled, let’s see what we have:
Again, that’s only some of the data and it may be easier to work with, but it’s still withholding its secrets. With sales data we’re right back to the wallpaper again.
Lesson 3: You can lead your data to a cube, but you can’t make it talk
There’s more than one way to model data, and dimensionally modeled data facilitates analysis. This is where our solution for the task at hand parts ways with a solution for Sales data. Sales data is and dimensional modeling go together like…the Super Bowl and chicken wings? (Insert your own analogy here) Any sales data will have a half dozen easy dimensions and a handful of measures. Run it into a cube and within an hour of drilling, slicing and dicing it will be telling you its secrets. The Super Bowl data just doesn’t have a structure that lends itself to a dimensional model. The dimensions aren’t so obvious. We would almost certainly end up with dozens of rather flat dimensions. The true power of dimensional modeling is aggregation and with flat dimensions there isn’t much aggregation. The measures aren’t so aggregable either. Game count, score, season records, past wins and losses – none of these are numbers that add up too well.
Ignoring my own advice, I created a Cognos Power-cube anyway. Then I sliced and diced and modified the cube model and worked it some more. I tried highlighting, charted the data, experimented with the measures and my data just wasn’t showing me anything new. This chart showing the number of times one team lost to another was typical of the bland results:
The Dallas Cowboys beat the Buffalo Bills twice, but they lost to the Pittsburg Steelers twice. At least we’ve found something, but this data has more interesting stories to tell.
Lesson 4: A little visualization makes the bland much more interesting
I threw that cube away, but that one chart got me thinking about how I might like to see the data. What would all 50 games look like at a glance? Here’s the last twelve:
Let’s add just a little bit of visualization:
That’s all 50 Super Bowls at a glance. On top are winning teams, color-coded to team colors. On the bottom, losing teams, also color-coded by team colors. Suddenly, we can see so much, so quickly:
Green Bay won the first two Super Bowls:
The Steelers had a nice run in the late 70s:
The Broncos had a bad spell:
But they made up for it in later years:
And those two losses the Buffalo Bills had to Dallas:
That was two in a row at the tail end of a four game losing streak for Buffalo – ouch.
We’ve just opened the door to all the possibilities and I’ve only highlighted a few of the many stories this one chart has to tell. I’m a big step closer to understanding Super Bowl trends and I haven’t even scratched the surface of visualizations yet.
This is the first post in a Super Bowl analysis journey. In the coming days we will dig deeper with more expansive and meaningful visualizations. On Friday, we will use everything learned to make an educated prediction.
If you have questions please feel free to leave a comment and I will do what I do best, provide answers.
Most importantly, look for Part 2 of my analysis on Super Bowl LI.