So, what’s an OLAP Cube, anyway?

As "An Excel User in a Cubed Kingdom" I’m starting my exploration of this new found land with a simple question: what is an OLAP cube? In plain English, please…

I like this simple, non-technical definition: a cube is a set of predefined answers. It’s up to you to select the right questions.

OK, let’s detail this.

Imagine that you have a very large database with the usual business data: orders, customers, sales representatives… Now you want to know how much a customer category ordered over the last year. You query the database and you get the answer. Then you want monthly sales. Query it again. Dig a little deeper to see what products that category ordered. Query it once more.

What is happening behind the curtain? Each time you enter a new query the system looks at each transaction (or a subset) and performs the necessary calculations to answer your query. You’ll get your answers, but it will be painfully slow: depending on your query and the database size, it may take hours. It is not an option.

But you don’t really need to see each individual order, do you? If you only need to know monthly sales, why should your system go through each transaction? If you pre-aggregate that data, you’ll get your answers much, much, faster, because there aren’t five million records, just 100.000. You’ll be able to actually work, instead of staring at your monitor, waiting for an answer.

This is what a cube does. It provides faster answers by eliminating the unnecessary detail for the task at hand. You shouldn’t look at a cube as an unique, condensed version of the database. While you have a virtually infinite number of questions that the database can answer, a cube focus on providing answers to a small set of questions. That’s why you can have different cubes (marketing, fin, sales), all of them getting data from the same source. They all answer different sets of questions.

When designing a cube (it’s your job, not IT’s), resist the temptation of a one-size-fits-all cube. Clearly define a coherent set of questions for your fundamental business needs and make sure they are answered once the cube becomes available.

If you are exploring your data you will not want to wait one hour each time you make a change. On the other end, a fast cube with no data to explore is useless. There is  a fine balance between maximum flexibility and maximum performance.

An OLAP cube not only ensures that you retrieve the right data from the database but also allows you to explore it efficiently. Two good reasons to add OLAP cubes to your toolbox.

Next time we’ll see how plain English can describe the structure of a basic OLAP cube.

An Excel User in a Cubed Kingdom

I’ve been using Excel for my entire professional career, most of the time in large corporations where adding a piece of software to the standard IT structure would be some kind of heresy. When I have a business need that can be solve by an out-of-the-box Excel installation that’s the path I follow (I’m also a power user of the company’s formal BI tool, so I know where to draw the line).

Over time, I’ve developed a framework that helps me to solve problems from a very specific point of view: how to minimize file size, how to minimize calculation time, how to deploy, how to update, and so on. This is the logic that you often must follow, and you tend to believe it is the best one. At that point you must start a conversation (a very fashionable word nowadays..) with someone that doesn’t share that logic.

Take OLAP cubes, for example. I never use them. I use the corporate BI tool or I create a 100% pure Excel application. But then Andreas told me about XLCubed and how you could deploy online your always-updated-file. That was a turning point because for me those are two killer features that I’ve been yearning for a long time.

I started to play with the tool, but still using the same logic. And was plain wrong. Sure I could use all my Excel background, but I needed to adjust it to a different logic. Using an OLAP cube you get a new set of functions that simplifies much of your work and you need to reevaluate some Excel functions because some of them will perform better under this new environment. You’ll leverage your Excel background to create a a new logic at a higher level.

I’m an experienced Excel user, probably like you, and this is just Excel on steroids. I’m leaving my comfort zone, one foot at a time, and I’ll document and share with you my learning curve. So come with me to the Cubed Kingdom and we’ll walk through this together.

First stop, next post: what is a cube, anyway?

Gemini – Smarter Excel Dashboards with End-User BI

I attended beginning this week the Microsoft BI Conference in Seattle where Microsoft presented an interesting new product that might change the way you build your Excel dashboards in the future: “Gemini”, an End-User Business Intelligence plug in for Excel.

End-User Business Intelligence has been around for quite a while with some of the most prominent Applix TM/1 , PALO, PowerOLAP and MIS Alea. These tools where tailored for non-IT proficient Excel users to allow them to build real Business Intelligence solutions with Excel, without leaving Excel or having to learn complicated BI or Data Warehousing techniques. All you have to learn is that a cube is like a very powerful multidimensional spreadsheet that can easily aggregate and hold large amounts of data. To pull out data from the cube you use a formula like this…

=LookupCube(“Sales”, “Units”, “Hats”, “Store 19”, “Oct-2007”)

…returns unit sales information consolidated by product line and region for the month shown. And this formula…

= LookupCube(“Sales”, “Sales USD”, “Hats”, “Store 19”, “Oct-2007 YTD”)

…returns year-to-date sales in US dollars for the same product line and region

Continue reading “Gemini – Smarter Excel Dashboards with End-User BI”

(de)Faults in Excel Charting

I recently spoke at SQLBitsIII, and an aspect which went down well was a simple overview on how to make the most important aspect of a graph, namely the underlying data, the prime focus and clear and easy to read. I also had the opportunity to attend Stephen Few’s Information Visualisation Workshops in London, which I’d thoroughly recommend. Stephen also spent some time, as part of a much more detailed overall agenda, on how a typical default chart can be morphed into an effective display.

So it’s back to basics this week, and how to improve the standard, out of the box Excel chart. Unfortunately, despite it’s pervasiveness, the default chart settings, which many users will never stray from, are in the case of Office 2007 not ideal, and in earlier versions, pretty awful. In this piece I’ll outline a few simple steps which can turn the default visual delights of the Excel graph into something you need not be embarrassed to put on the projector.

I’m using ‘classic’ Excel as my start point, because it’s still the incumbent in most organisations (and also because it’s worse). The example is for column charts, but the majority of the tips are valid for any chart type. As our start point we have the unit sales data for 3 products across 6 countries, as a default Excel Column chart, below.

BadChart

 

 

 

 

 

Nice. It’s wrong in a lot of ways, but how many hundred times have you seen this or a version of this? It’s well trodden ground if you have read Tufte, Few at al, but the key recommendations to improve things are surprisingly simple, and quick to implement.

1) Remove the Clutter and noise

The purpose of the chart is to display the data of interest clearly and concisely. It’s not to distract the user with pretty shading or 3D effects etc. Although the default chart is no-frills, there are a number of items which are adding nothing, or have undue prominence, and in doing so detract from the overall goal.

  • The Plot Area
    • The grey background to the plot area adds nothing, so we remove it
    • The border on the plot area – remove it also (numerous studies have shown we only need two axes to effectively group and visualise data)
  • Gridlines
    • The default gridlines are black, too visually intense. They are there for reference when required, not the prime focus, so are best muted – set them to a light grey.

2) Axes and Legend

The axes frame the chart, and are a key point of reference; however they should not draw the focus from the chart itself. As with the gridlines, they should be toned down.

  • Change the default black font colour to charcoal / dark grey
  • Change the default axis colour from black to charcoal / dark grey
  • Typically reduce the font size to 8

Rules for the legend are similar to those for the axis

  • Change font from black to grey
  • Remove border or change it’s colour to very light grey
  • Typically reduce font size to 8
  • For a clustered column, my preference is for the Legend positioned at the bottom, and reading across in the display order of the columns.

3) Columns and colour

The black column borders add nothing, and as such should be removed, they are another form of Tufte’s ‘non-data ink’.

On to colour, and unfortunately Excel’s default chart fills are heavily saturated plum and wine with a light cream..  So I’d strongly suggest changing the chart colour palette. For column charts, there is typically a reasonable block of colour for each series, so the colour scheme shouldn’t be too bold, or it becomes an eyesore. You should aim for mid-intensity colours of similar saturation (unless one is intended to stand out), pastels tend to work well.

 

All the steps above are simple and fairly fast to action, with one exception, the colour scheme. Unless you already have pre-prepared palettes it’s possible to spend an age trying to get the ideal combination – remember the 80/20 rule!

GoodChart2

 

 

 

 

 

 

In my example above, which hopefully you’ll agree is an improvement, I’ve used the colour palette from our upcoming ‘Chart Tamer’ product. Chart Tamer is a lot more than just a colour palette, but that aspect has benefited from minds with much more expertise in colour than mine, and I’ll go with their choice over mine every time!

Household Income Distribution 1967 – 2005 As Small Multiples Chart

In my last post I tied to fix an overloaded line chart Jorge presented in a recent post about loss aversion:

image
Jorge asked "does it make any sense to add those nine series to a single chart?
My attempt to fix the chart by using some color coding, has its shortcomings that caused quite some discussion.

image

So again, how can you give the users all the data they expect while keeping the chart clean and readable?

D Kelly O’Day pointed out "More data or better colors won’t help a poor chart type selection" and presented a dot plot

image

Lets try to select the right chart type. In Chart Rules, As Simple as Possible, But Not Any Simpler! I presented an easy to learn set of rules to determine the best chart type .


1. Determine the relationship you want to display

In our case a we have a Distribution Relationship, we want to show the Distribution of the Income Levels


2. Determine if you want to emphasize individual values or the overall pattern and

emphasize individual values or the overall pattern  and Determine the chart type

As we want to emphasize individual values a column chart works best.

image

This chart already gives us a good feel for the income distribution in 1967- Looks like a almost perfect bell distribution with a belly for the mid income levels. But how did things change from 1967 to 2005? Lets create a set of small multiples to show the situation in 1967, 2005 and the increase from 1967 to 2005.

image

Using Color to Group and Label in Charts

Jorge wrote in a recent post about loss aversion, the fact that “people strongly prefer avoiding losses than acquiring gains”.

Loss aversion […]: Translated to chart-making, it means that there is a “tendency to avoid losing data at any cost”. The chart below shows you the Money Income Of Households as published.

image

Take the above chart, for instance: does it make any sense to add those nine series to a single chart?

Remove irrelevant data series and you risk a mutiny on the Bounty, even if relevant trends are easier to detect. It is absurd, but very human.

So, how can you give the users all the data they expect while keeping the chart clean and readable?

Continue reading “Using Color to Group and Label in Charts”

Scott MacCloud Pesents Google Chrome

Scott MacCloud comic writer and expert in visual communication created a set of wonderful, information dense comic pages for Google’s new web browser Google Chrome. He presents on 38 pages the technical concepts behind Google Chrome so that even a layman can understand them.

image

Regarding Google Chrome I found an interesting article over at the Cooper Journal, where Tim wrote that Google Chrome exists, contrary to the believe of the computer press, for one reason: To provide a framework for web-based applications to look, feel, and act like desktop applications. I can’t wait to see the end of the area of crippled web interfaces.

I really can recommend you to add Scott MacCloud’s excellent book Understanding Comics to you data visualization book shelf. A very inspiring book that explains the inner workings of comics and visual communication in general, and most of the insights apply for the design of visual interfaces, usability and data visualization:

Continue reading “Scott MacCloud Pesents Google Chrome”

Smart Dashboard Ranking Tables

Chandoo and Robert over at the PHD blog have a nice a 4 post series of posts about Creating KPI Dashboards in Microsoft Excel.

I really recommend you to read Robert’s articles. Having scrolling in your sorted table is just a really smart addition to your Excel dashboard.

Yesterdays post was about Adding One Click Sort:

Continue reading “Smart Dashboard Ranking Tables”

Heatmap Tables with Excel

The following article shows how to create Heatmaps in Excel 2003, for Excel 2007 & 2010 please see the updated article “Heatmap Tables with Excel – Revisited

This Heatmap Table shows you the revenues and the discounts of a company over the course of one year per product group. The size of a bubble shows the revenue made in a particular month and the bubble color shows the discount rate given. The discount rate has been encoded as a range of green colors, ranging from a light green, for low discounts to a dark green for high discounts. The years and product totals are shown at the right and bottom as an integrated part of the table.

image

Tufte often talks about the integration of number, images, and word, and I think he’s quite right. A way archive this in Excel is to integrate charts into tables, so called graphical tables, a very effective means to archive More Information Per Pixel. I already wrote about graphical tables here, here, and here.

 

Continue reading “Heatmap Tables with Excel”

Interactive English League Bumps Chart

This bumps chart shows you the English Premier League 2007 – 2008. It allows you to highlight and compare two teams by clicking a team in in the table or a line in the bumps charts.

image

 

Interactively highlighting a data series in a large data set is very powerful. All lines in the data set are set to light Gray to show you the big picture and the patterns and general tends in the data set. Once you have identified an interesting data series you click it and we highlight it with a bright, saturated color. This has the effect of bringing it into the foreground and allows the user to see the details in the context of the other data series.

As a bumps chart has the lines equally spaced on the value axis you can put an Excel table next to the chart that serves as a legend on one side, and as a detailed ranking table on the other side.

 

Continue reading “Interactive English League Bumps Chart”