Saturation variable function for treemaps

For those of you not familiar with a treemap, it is simply a bunch of boxes somehow arranged inside a bigger box; each of them also contains its own set of smaller boxes, and so on. Each box has an associated weight that determines its size.

It is a powerful tool to visualize your information, yet it can soon become overcomplicated, not to say absolutely chaotic.

The first problem when you want to design a treemap is how to layout the boxes within the available region. Luckily this is an already studied issue, and there are lots of excellent recipes out there.

After you implement your favourite treemap algorithm, you will realise that the smaller boxes are too small (just look at the bottom right corner of the picture). It is nearly impossible to extract any useful information from them. So why bother at showing them at all? Let's just limit the maximum number of boxes to a fixed number and keep the rest outside, or inside a special box containing all those little tiny pieces you don't want to be out there, distracting the user.

This approach works remarkably well, depending on what you are trying to show. Since boxes' sizes are determined by their weight (importance), bigger boxes are the most important, and having a clean view of them is important, specially if you want to display some additional information inside them (not necessarily more boxes).

Note that for visualizations where fragmentation is an important property to analise (such as a filesystem), just removing the smaller boxes from sight won't work. But let's assume this is not the case and move on.

The next problem (and motivation for this post) is that even if you limit the amount of the boxes, there is still the possibility that an extremely big box will "eat" the rest. And all you'll see is a huge box taking all of the space, with no clue of where the rest of your boxes has gone.

First approach is not using a linear function to generate the weights of the boxes. By taking the square root of each weight before processing, the different between boxes will be much smoother, and the overall treemap will look better, even if there is no "big bad box".

However, if the big box is really big you will need a more powerful function, such as a logarithm, but now a normal treemap will look as if every box weighted the same. Dead end here.

Specifying the maximum possible size of a box is an interesting approach. By forcing that no weight would ever be greater than half of the total sum, you solve the big bad box problem. But soon you will meet the small badder box. Consider the case where all boxes are relatively big, except for one considerably smaller. And you want to show it, there is no excuse for sending it to the "I don't care" group of boxes.

Now taking the reverse approach seems a more reasonable solution. Limiting the minimum size of the boxes will allow you to always have some room assured to render at least a description of the box. And even if you have an XL box, forcing the rest to have a minimum size will also limit the size of the big one (remember that the available rendering area is limited!).

The naive approach of setting if size < MIN: size = MIN for each box may work, but the problem is that the "small" boxes will all look the same size, even if they shouldn't be.

So we finally arrive to the reason of this post. What we are really looking for is a function that will assure that every box has a minimum size, yet maintain a relation between their sizes and their original weights.

The square root worked fine at the beginning, so we'll begin from there. Note that a bigger index will allow more flattening of the boxes. Or a smaller exponent, which is the same (remember that sqrt(x) = x^0.5).

What we want is an exponent that, applied to every weight, will ensure that every box has at least a fixed percentage of the maximum (let's say 2%).

Consider that all boxes sizes are a1, a2, a3, a4... a10. And that a1 is the smallest one, being just 1% of the total. Mathematically, what we are looking for is some x such that:

a1^x > 0.02 * Sum(a^x)

But we want x to be as big as possible, while verifying the relation. Remember that a small x will make everything look the same size, and we want everything to look as different as possible while respecting minimum sizes.

Note that the minimum allowed percentage must be small enough to allow all of your boxes to fit. If you are planning to draw 10 boxes and want each of them to take at least half of the available room, then you are in trouble.

Since there is no exact way of solving the equation (maybe for the inequation there is, just setting a ridiculously small value of x), a good approach is using binary search. Though Newton-Ralphson could be faster, it is much more difficult to implement, and we are not looking for a very accurate value, we can stick to the simpler algorithm and keep our conscience clear.

As a side note: I included the link to Inequations in wikipedia because the Windows Live Writer spell checker doesn't recognise it. It is a word: if wikipedia says so, it is true, so take that, WLW!

This snippet of code does just what we want in just 15 lines (counting the empty ones that are just there for readability). It also ensures that if it is not necessary to apply any function, then a square root will be applied anyway to make it look smoother.

const double MINIMUM = 0.02;

double exp = 0.5, max = 0.5, min = 0;
IEnumerable<Double> original = new List<Double>(values);
IEnumerable<Double> result = new List<Double>(values);

do
{
    exp = (max + min) / 2;
    result = original.Select(x => Math.Pow(x, exp));

    if (result.Min() < MINIMUM * result.Sum()) max = exp;
    else min = exp;
}
while (max - min > 0.0001);

Though we are using this method for rendering a treemap, it can be used in several other contexts where you need to ensure a minimum value for each item in a collection while keeping a relation between the items.

Also note that just by keeping the value of x it is possible to reconstruct the original sequence of values, so the transformation is a bijection.

Now guess why I included the link to the definition of bijection in Wikipedia.

Saturation variable function for treemaps

Visualizing multi-dimensional data

Silverlight Loaded Events