Category Archives: Design

Brunel 0.8: Enhanced Color mapping

Published by:

The two main new features of Brunel 0.8 are an enhanced UI for building (as described by Dan) and a through re-working of our code for mapping data to color. This post is going to talk about the latter — with a lot of examples!

Twelve Ways to Color Africa

The data set we are using is from http://opengeocode.org. We took a subset of the countries and data columns (CSV data)  for this exercise.

These examples are using some prototype code for geographic maps that we are going to introduce into a later version of Brunel (probably v1.0, slated for January), but maps looks so nice, we wanted to use them for this article. Please do not depend on the currently functionality — consider this “advance preview” and highly subject to change.

Because there are a lot of maps, these are not live versions, but static images — click on them to open up a Brunel editor window where you can see it live and make changes.

The Brunel language reference describes the improvements to the color command in detail. Here we just show examples!

Categorical Colors

africa2africa3

The above two images are created by the following Brunel:

  • map(‘africa’) x(name) color(language) label(iso) tooltip(#all) style(‘.label{opacity:0.5;text-shadow:none}’)
  • map(‘africa’) x(name) color(language:[white, nominal]) label(iso) tooltip(#all) style(‘.label{opacity:0.5;text-shadow:none}’)

For all our examples, the only changes are the color statement, so from now on we’ll just refer to the color command.

If you use a simple color command, as in the first example, Brunel chooses a suitable palette. In this case “language” is a categorical field, so it chooses a nominal palette. This is a palette of 19 colors chosen to be visually distinct.

The second example specifies which colors we want in the output space. The first category in the “language” field is special, so we ask for a palette consisting of white, then all the usual colors from the nominal palette.

africa4Because we know the data well, we can hand-craft a color mapping here that reflects the language patterns better. I used color(language:[white, red, yellow, green, cyan, green, green, blue, blue, blue, blue, gray, gray, gray, gray, gray])  to use red for lists containing Arabic, green when they contain English, and blue when they contain French. I mixed the colors to show lists where the languages are mixed.

The geographical similarities in languages can be seen pretty easily in the chart, but the colors are a bit bright. Which leads to the following …

For areas and “large” shapes, Brunel automatically creates muted versions of colors, so names like “red” and “green” are less visually dominant and distracting. This can be altered by adding a “=” to the list of colors, which means “leave the colors unmuted”, or a series of asterisks, which means “mute them more”. Here are a couple of examples, using the same basic palette as the previous one

africa5africa6

africa7If you have a smaller fixed number of categories in your field, you can use palettes carefully designed to work well for that number. Rather than provide them in Brunel, our suggestion is to go directly to a site that allows you to select them (Cynthia Brewer’s site ColorBrewer is  the standout recommendation) and copy the array of color codes and paste them directly into the Brunel code.

For the example on the right, we did exactly that, using en:[‘#beaed4’, ‘#7fc97f’]) as out colors (the quotes are optional in this list)

Color Ranges

For numeric data, we want to map the data values to a smoothly changing range of values. So, instead of defining individual values, we define values which are intermediate points on a smoothly changing scale of colors. We do this using the same syntax pattern as for categorical data. We are using the latitude of the capital city to color by, rather than a more informative variables, so the color changes can be seen more clearly.

africa8africa13

On the left we specified color as color(capital_lat) so we get Brunel’s default blue-red sequential scale. This uses a variety of hues, again taken from ColorBrewer, to provide points along a linear scale of color. On the right we use an explicit color mapping from ColorBrewer, color(capital_lat:[‘#8c510a’, ‘#bf812d’, ‘#dfc27d’, ‘#f6e8c3’, ‘#f5f5f5’, ‘#c7eae5’, ‘#80cdc1’, ‘#35978f’, ‘#01665e’]), where we simply went to the site, found a scale we liked and used the export>Javascript method. Note that Brunel will adapt to to the number of colors in the palette automatically.

africa9africa10

The above two charts show the difference between asking for color(capital_lat:reds) and color(capital_lat:red). When a plural is used, it gives a palette that uses multiple hues, with the general tone of the color being requested. With a  singular color request, you only gets shades of that exact hue. Generally we would recommend the former unless you have some specific reason to need the single-hue version.

africa11africa12

We can specify multiple colors in the same way as we do for categorical data, using capital_lat:[purpleblues, reds]) on the left and capital_lat:[blue, red]) on the right. When we have exactly two colors defined, we stitch them together, running through a neutral central color, to make a diverging color scale that highlights the low and high values of the field.

Summary

Mapping data to color is a tricky business, and in version 0.8 of Brunel our goal is twofold: To ensure that if you only specify a field, a suitable mapping is generated, and second, to allow the output space of colors to be customized for user needs. In future versions of Brunel we will add mapping for the input space, so, for example, we could tie the value mapped to white in the last example to be the equator, not simply midway through the data range. Look for that in a few months!

Brunel: Open Source Visualization Language

Published by:

BRUNEL is a high-level language that describes visualizations in terms of composable actions. It drives a visualization engine (d3) that performs the actual rendering and interactivity. It provides a language that is as simple as possible to describe a wide variety of potential charts, and to allow them to be used in Java, Javascript, python and R systems that want to deliver web-based interactive visualizations.


At the end of the article are a list of resources, but first, some examples. The dataset I am using for these is a set of data taken from BoardGameGeek which I processed to create a data set describing the top 2000 games listed as of Spring 2015. Each chart below is a fully interactive visualization running in its own frame. I’ve added the brunel description for each chart below each image as a caption, so you can go to the Builder anytime and copy the command into the edit box to try out new things.

data('sample:BGG Top 2000 Games.csv') bubble color(rating) size(voters) sort(rating) label(title) tooltip(title, #all) legends(none) style('* {font-size: 7pt}') top(rating:100)

This shows the top 100 games, with a tooltip view for details on the games. They are packed together in a layout where the location has no strong meaning
— the goal is to show as much data in as small a space as possible!
In the builder, you can change the number in top(rating:100) to show the top 1000, 2000 … or show the bottom 100. You could also add x(numplayers) to divide up the groups by recommended number of players

data('sample:BGG Top 2000 Games.csv') line x(published) y(categories) color(categories) size(voters:200) opacity(#selection) sort(categories) top(published:1900) sum(voters) legends(none) | data('sample:BGG Top 2000 Games.csv') bar y(voters) stack polar color(playerage) label(playerage) sum(voters) legends(none) at(15, 60, 40, 90) interaction(select:mouseover)

This example shows some live interactive features; hover over the pie chart to update the main chart. The main chart shows the number of people voting for games in different categories over time, and the pie chart shows the recommended minimum age to enjoy a game. So when you hover over ‘6’, for example, you can see that there have been no good sci-fi games for younger players in the last 10 years. Use the mouse to pan and zoom the chart (drag to pan, double-click to zoom).

data('sample:BGG Top 2000 Games.csv') treemap x(designer, mechanics) color(rating) size(#count) label(published) tooltip(#all, title) mean(rating) min(published) list(title:50) legends(none)

Head to the Builder Site to modify this. You could try:

  • change the list of fields in x(…) — reorder then or use fields like ‘numplayers’, ‘language’
  • remove the ‘legends(none)’ command to show a legend
  • change size to ‘voters’ — and add a ‘sum(voters)’ command to show the total number of voters rather than just counts for each treemap tile

Do you want to know more?

Follow links below; gallery and cookbook examples will take you to the Brunel Builder Site where you can create your own visualizations and grab some Javascript code to embed them in your web pages … which is exactly how I built the above examples!

Comics and Visualization

Published by:

Understanding Comics book cover; Scott McCloudComics and Visualization

Although this book is over a decade old now (and Scott has a number of later books that follow on from this one), this is still a highly valuable book to read, getting great review from famous artists as a fundamental resource for comic book writers. I read this from the perspective of a visualization expert, and found a number of interesting points in the book, especially the earlier sections. He defines comics as “juxtaposed pictorial and other images in deliberate sequence, intended to covey information and/or to produce an aesthetic response in the viewer (p.9)”, which, to my mind, allows many visualizations to fits his definition! The concept of small multiples, when presented in a “deliberate order” such as via a trellis display, fits particularly well into this definition, so I was encouraged to read on. Some highlights of the book, from my point of view:

  • The use of simpler icons / symbols to make depictions of reality more universal; that argument resonates more strongly with me than Tukey’s data-ink concept. I feel more convinced by the argument that additional detail is bad when it makes it harder for us to understand the high-level picture because it draws us too much into the physicality of the shapes being used.
  • McCloud presents a triangular space, the vertices of which are “reality”, “language” and “the picture plane” into which comic styles can be placed. I think there is also value in looking at various styles of visualization and seeing where they fit in. Treemaps, for example, have more “realistic” versions using cushions, while keeping the same structure. Scientific, geographic or fluid display visualizations are more realistic than, say, statistical graphics.
  • Less is More” applied to the number of intermediate representations used — this argues that for visualizations of, say, a process evolving over time, we should not simply slice at even times, but instead look for important features we want to show, and show fewer frames.
  • Lots of good stuff on how time is perceived when displayed at a sequence.
  • Can Emotions be Visible?” is the motivating question for chapter five — I would be very curious to see if we could apply his ideas to visualizations — maybe people like pie charts because they seem warm, serene and quiet, whereas a line chart with gridlines is rational, conservative and dynamic?

As an aside, I included a comic in my book on Visualizing Time, more as a whimsy than anything else, but I’m glad that I have at least a tenuous link with Scott McClouds’s highly recommended book! comics

Every Now and Again, a Pie can be Good

Published by:

It is hard to find anyone in visualization today with much time for pie charts. In fact it seems de rigueur to disdain them. And yet we see an awful lot of them. Now, I’m not going to claim that they are a good, general purpose chart, but I do always like to think of times when a chart will actually work well.

When Pie Charts Work At All

One well-known requirement for a pie to have a chance of working is that the data represent a fraction of a whole. That’s the big selling point of pie charts — each data row should represent a fraction of the overall data. So pies work best for percentages and fractions, and second-best for counts, populations, weights — things for which there is a natural feeling that summing them all up and saying “that represents 100%” is good.

On the side of evil is when the numbers must not be summed — if the data represent means (for different sized groups) or degrees Fahrenheit, then a pie representation is flat-out wrong. It’s not a bad rule to say:

Only Use a Pie if it makes sense to think of the data values as summing to 100%

The second rule I’d suggest is based on the inability for people accurately to judge angles. Pies do not work well for that, so if you need accurately to judge numbers, do not use a pie. Pies work well for “A is about twice as big as B” or “ C is definitely smaller in the second pie”. They are not good for “C is very slightly lower than D” or “B is just under 33%”. Stating it positively:

Use a Pie if the goal is to make broad comparisons, not detailed ones.

Finally, I’d offer a third suggestion, rather than a rule. It’s based on the observation that a bar chart (a natural competitor to a pie chart) is very often improved by ordering — high to low values, for example. Pies can often look radically different when categories are re-ordered, and although it is sometimes suggested that you do this ordering for pies, I think that a pie for categories that can be re-ordered would almost certainly look better in another form. Instead I would suggest the following:

Use a Pie when the categories have a natural order

When Pie Charts Work Well

Stephen Few (Save the pies for Dessert: http://www.perceptualedge.com/articles/08-21-07.pdf) quotes a study showing that when pies have been shown to be actively superior to bar charts — it is when it makes sense to want to compare sums of categories (e.g. the sum of the first two against the sum of the second two); the reason being that in a pie, you can compare angles for multiple segments easily, whereas in a bar chart that is not easy. 

Survey Data: Bar Chart and Pie Chart

Survey Data: Bar Chart and Pie Chart

Continue reading

Vega: A New Grammar-Based Specification for Visualizations

Published by:

I’m a big fan of using languages for visualization rather than canned chart types. I’ve been working with the Grammar of Graphics approach for a number of years within SPSS and now IBM, and my book “Visualizing Time” is composed 95% of Grammar-based visualizations. It’s pretty safe to say it’s my preferred approach.

Protovis (the forerunner of D3, to a great extent) was built on Grammar approach; Bostock and Heer’s 2009 article (on Heer’s site at http://hci.stanford.edu/jheer/files/2009-Protovis-InfoVis.pdf) gives a very good statement of the benefits of the Grammar-based approach as opposed to the “Chart Type” approach:

The main drawback of [the chart type] approach is that it requires a small, closed system. If the desired chart type is not supported, or the desired visual parameter is not exposed in the interface, no recourse is available to the user and either the visualization design must be compromised or another tool adopted. Given the high cost of switching tools, and the iterative nature of visualization design, frequent compromise is likely.

Continue reading

From the Vaults: How to Speak Visualization

Published by:

In English, we use many different words to describe the same basic objects. In one survey, researchers Dieth and Orton explored which words were used for the place where a farmer might keep his cow, depending on where the speaker resided in England. The results include words like byreshipponmistallcow-stablecow-housecow-shedneat-house or beast-house. We see the same situation in visualization, where a two-dimensional chart with data displayed as a collection of points, using one variable for the horizontal axis and one for the vertical, is variously called ascatterplot, a scatter diagram, a scatter graph, a 2D dotplot or even a star field.

There have been a number of attempts to form taxonomies, or categorizations, of visualizations. Most software packages for creating graphics, such as Microsoft Excel focus on the type of graphical element used to display the data and then sub-classify from that. This has one immediate problem in that plots with multiple elements are hard to classify (should we classify a chart with a bars and points as a bar chart, with point additions, or instead classify it as a point char, with bars added?). Other authors have started with the dimensionality of the data (one-dimensional, two-dimensional, etc.) and used that as a basic classification criterion, but that has similar problems.

Visualizations are too numerous, too diverse and too exciting to fit well into a taxonomy that divides and subdivides. In contrast to the evolution of animals and plants, which did occur essentially in a tree-like manner, with branches splitting and sub-splitting, information visualization techniques have been invented more by a compositional approach. We take a polar coordinate system, combine it with bars, and achieve a Rose diagram. We put a network in 3D. We addtexture, shape and size mappings to all the above. We split it into panels. This is why a traditional taxonomy of information visualization is doomed to be unsatisfying. It is based on a false analogy with biology and denies the basic process by which visualizations have been created: composition.

Continue reading