Tag Archives: vis

Maps Preview

A short update today; we have been working on intelligent mapping for Brunel 1.0 (due in January) and since it’s a subject many people are interested in, we thought we’d put up a “work in progress” video showing how things are progressing. It’s a rough video, so you get to see my inability to type accurately as well as some rough transitions. Showing the video at full resolution is recommended.

Usual disclaimers apply: this is planned for v1.0 in January, we expect it to work as described, but no guarantees — Enjoy!

Brunel 0.8: Enhanced Color mapping

Published by:

The two main new features of Brunel 0.8 are an enhanced UI for building (as described by Dan) and a through re-working of our code for mapping data to color. This post is going to talk about the latter — with a lot of examples!

Twelve Ways to Color Africa

The data set we are using is from http://opengeocode.org. We took a subset of the countries and data columns (CSV data)  for this exercise.

These examples are using some prototype code for geographic maps that we are going to introduce into a later version of Brunel (probably v1.0, slated for January), but maps looks so nice, we wanted to use them for this article. Please do not depend on the currently functionality — consider this “advance preview” and highly subject to change.

Because there are a lot of maps, these are not live versions, but static images — click on them to open up a Brunel editor window where you can see it live and make changes.

The Brunel language reference describes the improvements to the color command in detail. Here we just show examples!

Categorical Colors

africa2africa3

The above two images are created by the following Brunel:

  • map(‘africa’) x(name) color(language) label(iso) tooltip(#all) style(‘.label{opacity:0.5;text-shadow:none}’)
  • map(‘africa’) x(name) color(language:[white, nominal]) label(iso) tooltip(#all) style(‘.label{opacity:0.5;text-shadow:none}’)

For all our examples, the only changes are the color statement, so from now on we’ll just refer to the color command.

If you use a simple color command, as in the first example, Brunel chooses a suitable palette. In this case “language” is a categorical field, so it chooses a nominal palette. This is a palette of 19 colors chosen to be visually distinct.

The second example specifies which colors we want in the output space. The first category in the “language” field is special, so we ask for a palette consisting of white, then all the usual colors from the nominal palette.

africa4Because we know the data well, we can hand-craft a color mapping here that reflects the language patterns better. I used color(language:[white, red, yellow, green, cyan, green, green, blue, blue, blue, blue, gray, gray, gray, gray, gray])  to use red for lists containing Arabic, green when they contain English, and blue when they contain French. I mixed the colors to show lists where the languages are mixed.

The geographical similarities in languages can be seen pretty easily in the chart, but the colors are a bit bright. Which leads to the following …

For areas and “large” shapes, Brunel automatically creates muted versions of colors, so names like “red” and “green” are less visually dominant and distracting. This can be altered by adding a “=” to the list of colors, which means “leave the colors unmuted”, or a series of asterisks, which means “mute them more”. Here are a couple of examples, using the same basic palette as the previous one

africa5africa6

africa7If you have a smaller fixed number of categories in your field, you can use palettes carefully designed to work well for that number. Rather than provide them in Brunel, our suggestion is to go directly to a site that allows you to select them (Cynthia Brewer’s site ColorBrewer is  the standout recommendation) and copy the array of color codes and paste them directly into the Brunel code.

For the example on the right, we did exactly that, using en:[‘#beaed4’, ‘#7fc97f’]) as out colors (the quotes are optional in this list)

Color Ranges

For numeric data, we want to map the data values to a smoothly changing range of values. So, instead of defining individual values, we define values which are intermediate points on a smoothly changing scale of colors. We do this using the same syntax pattern as for categorical data. We are using the latitude of the capital city to color by, rather than a more informative variables, so the color changes can be seen more clearly.

africa8africa13

On the left we specified color as color(capital_lat) so we get Brunel’s default blue-red sequential scale. This uses a variety of hues, again taken from ColorBrewer, to provide points along a linear scale of color. On the right we use an explicit color mapping from ColorBrewer, color(capital_lat:[‘#8c510a’, ‘#bf812d’, ‘#dfc27d’, ‘#f6e8c3’, ‘#f5f5f5’, ‘#c7eae5’, ‘#80cdc1’, ‘#35978f’, ‘#01665e’]), where we simply went to the site, found a scale we liked and used the export>Javascript method. Note that Brunel will adapt to to the number of colors in the palette automatically.

africa9africa10

The above two charts show the difference between asking for color(capital_lat:reds) and color(capital_lat:red). When a plural is used, it gives a palette that uses multiple hues, with the general tone of the color being requested. With a  singular color request, you only gets shades of that exact hue. Generally we would recommend the former unless you have some specific reason to need the single-hue version.

africa11africa12

We can specify multiple colors in the same way as we do for categorical data, using capital_lat:[purpleblues, reds]) on the left and capital_lat:[blue, red]) on the right. When we have exactly two colors defined, we stitch them together, running through a neutral central color, to make a diverging color scale that highlights the low and high values of the field.

Summary

Mapping data to color is a tricky business, and in version 0.8 of Brunel our goal is twofold: To ensure that if you only specify a field, a suitable mapping is generated, and second, to allow the output space of colors to be customized for user needs. In future versions of Brunel we will add mapping for the input space, so, for example, we could tie the value mapped to white in the last example to be the equator, not simply midway through the data range. Look for that in a few months!

Villains of Doctor Who

Published by:

I’ve always been a big Doctor Who fan; growing up with the BBC and seeing many incarnations of the Doctor striding across the TV screens, defeating his enemies armed with intelligence, loquaciousness, and a small (admittedly sonic) screwdriver. In particular I recall being terrified of the villains in “The Talons of Weng-Chiang“, which, nowadays, do not seem particularly scary. But the villains of Who have always been magical!

So I was excited to find a data set of Doctor Who villains through 2013 (courtesy of The Guardian) and used it for a short Brunel demo video. I left it as-is for the video, but it was clear the data set needed a bit of cleaning. The column names were more like descriptions than titles, which was annoying, but the biggest issue was the Motivation column, which was more like a description than  categorization. So I edited the data a little — changing the column titles and then providing a manual clustering of motivation into smaller categories, creating three motivation columns: Motivation_Long, the original; and Motivation_Med, Motivation_Short — my groupings of those original categories. With these changes, I saved the resulting CSV file as DoctorWhoVillains.csv. You can check out an overview of the motivation columns in the Brunel Builder.

As usual with data analysis, it took way longer to do the data prep work than to use the results! I quite like this summary visualization, which is simply three shorter ones joined together with Brunel’s ‘|’ operator:

Doctor Who Villains through 2103

Doctor Who Villains through 2103

The bubble chart and word cloud show pretty much the same information — the cloud scales the size by the Log of the number of stories (otherwise the Daleks tend to exterminate any ability to see lesser villains) and is limited to the top 80-ish villains by appearance count. The bottom chart shows when villains first appeared and their motivation. The label in each cell is a representative villain from that cell, so the Sensorites are a representative dominating villain from the 1960-1965 era. The years have been binned into half decades. At a glance, it looks like extermination and domination are common themes early on, whereas self interest is more of a New Who (post-2000) thing. Serving Big Bad is evenly spread out over time.

 The Brunel script for this is quite long, as I wanted to place stuff carefully and add styling:

data('http://brunelvis.org/data/DoctorWhoVillains.csv') bubble color(Motivation_Short) size(Episodes) sort(First:ascending) label(Villain) tooltip(Villain, motivation_long, titles) at(0, 0, 60, 60) style('* {font-size: 7pt}') | data('http://brunelvis.org/data/DoctorWhoVillains.csv') cloud x(Villain) color(motivation_short) size(LogStories) sort(first:ascending) top(episodes:80) at(40, 0, 100, 55) style(':nth-child(odd) { font-family:Impact;font-style:normal') style(':nth-child(even) { font-family:Times; font-style:italic') style('font-size:100px') | data('http://brunelvis.org/data/DoctorWhoVillains.csv') x(motivation_short) y(first) color(episodes:gray) sort(episodes) label(villain) tooltip(titles) bin(first:10) sum(episodes) mode(villain) list(titles) legends(none) at(0, 60, 100, 100) style('label-location:box')

 Without the data and decoration statements, this is what it looks like — three charts concatenated together with the ‘|’ to make an visualization system:

bubble color(Motivation_Short) size(Episodes) sort(First:ascending) label(Villain) tooltip(Villain, motivation_long, titles) 
| cloud x(Villain) color(motivation_short) size(LogStories) sort(first:ascending) top(episodes:80)
| x(motivation_short) y(first) color(episodes:gray) sort(episodes) label(villain) tooltip(titles) bin(first:10) sum(episodes) mode(villain) list(titles) legends(none)

 I was curious about when villains first appeared, so came up with this chart — stacking villains in their year of first appearance (click on it for the live version):

And here are a couple of additional samples I made along the way …

Blogging With Brunel

 

Here’s a short video showing how to create a blog entry using Brunel in a few minutes. The data comes from The Guardian and is completely unmodified, as you can see in the video from the fairly odd column names! I’m making a cleaner version of the data and hope to have some samples of that up in a  few days.

The video is high resolution (1920 x 1080) and about 60M. It’s probably best viewed expanded out to full screen.

A Quick Look at Lots of Songs …

Published by:

Songs by year, rating and genre

A Quick Look at Lots of Songs …

Songs by year, rating and genre

iTunes information about my songs, showing year, genre and my ratings

A quick visualization of the songs in my iTunes database. I was curious to see if there were any sweet spots in my listening history. As always, showing correlation between three different variables is hard, and here I wanted one dot per song, so the density is quite high (clicking on the image to show it full size is recommended).

Perhaps the most interesting thing for me personally was that I though i liked Alternative music more than I actually appear to. I notice especially that the 2010-2015 bin for mid-value rating is dominated by Alternative!

Appropriate Mappings

Published by:

Donating.vs.Death-Graph.0

Vox Article on viral memes and charitable giving

First, a disclaimer. This is not a post about the actual issues this article raises; just about the presentation of those claims. The image from the article has appeared in numerous places and been referenced by a number of news sources, as well as appearing in my Facebook and twitter feeds.

And it’s a bad image.

One minor issue is that it is hard to work out which circle relates to which disease, as the name of the disease only appears on the legend, so you are constantly moving your eyes from grey dot on left to the legend, to the grey dot on the right. Hard to make much sense. The fact that the legend doesn’t seem to have any order to it doesn’t help either. If this were 20 diseases instead of eight, the chart would be doomed!

Kudos for picking appropriate colors though. It helps that they used a natural mapping (pink <–> breast cancer; red <–> AIDS) that might help a bit.

The more worrying issue is that it makes a classic distortion mistake; look at the right side and rapidly answer the question, using just the images, not the text: “How many more deaths are there due to the purple disease than the blue disease?” 

Using the image as a guide, your answer is likely to be in the range 10 to 20 times as man, because the ratio of the areas is about that amount. When you look at the text, though, it’s actually only about four times. The numbers are not encoding the area, which is what we see, but they are encoding the radius (or diameter) which we do not immediately perceive.

The result is a sensationalist chart. It takes a real difference, but sensationalizes it by exaggerating the difference dramatically. If you want to use circles, map the variable of interest to AREA, not RADIUS. It fits our perceptions much more truthfully. It’s not actually perfect; we tend to see small circles as larger than they really are; but it’s much, much better).

So, here’s a reworking:

WhereWeDonate Vs. Diseases That Kill

I tried to keep close to the original color mappings, as they are pretty good, but have used width to encode the variable of interest, keeping the height of the rectangle fixed. I also labeled the items on both sides so we can see much more easily that heart disease kills about 4x as many people as Chronic Obstructive Pulmonary Disease. 

I also added some links between the two disease rankings to help visually link the two and aid navigation. The result is, I believe, not only more truthful, but easier to use. In short, it works.

Comics and Visualization

Published by:

Understanding Comics book cover; Scott McCloudComics and Visualization

Although this book is over a decade old now (and Scott has a number of later books that follow on from this one), this is still a highly valuable book to read, getting great review from famous artists as a fundamental resource for comic book writers. I read this from the perspective of a visualization expert, and found a number of interesting points in the book, especially the earlier sections. He defines comics as “juxtaposed pictorial and other images in deliberate sequence, intended to covey information and/or to produce an aesthetic response in the viewer (p.9)”, which, to my mind, allows many visualizations to fits his definition! The concept of small multiples, when presented in a “deliberate order” such as via a trellis display, fits particularly well into this definition, so I was encouraged to read on. Some highlights of the book, from my point of view:

  • The use of simpler icons / symbols to make depictions of reality more universal; that argument resonates more strongly with me than Tukey’s data-ink concept. I feel more convinced by the argument that additional detail is bad when it makes it harder for us to understand the high-level picture because it draws us too much into the physicality of the shapes being used.
  • McCloud presents a triangular space, the vertices of which are “reality”, “language” and “the picture plane” into which comic styles can be placed. I think there is also value in looking at various styles of visualization and seeing where they fit in. Treemaps, for example, have more “realistic” versions using cushions, while keeping the same structure. Scientific, geographic or fluid display visualizations are more realistic than, say, statistical graphics.
  • Less is More” applied to the number of intermediate representations used — this argues that for visualizations of, say, a process evolving over time, we should not simply slice at even times, but instead look for important features we want to show, and show fewer frames.
  • Lots of good stuff on how time is perceived when displayed at a sequence.
  • Can Emotions be Visible?” is the motivating question for chapter five — I would be very curious to see if we could apply his ideas to visualizations — maybe people like pie charts because they seem warm, serene and quiet, whereas a line chart with gridlines is rational, conservative and dynamic?

As an aside, I included a comic in my book on Visualizing Time, more as a whimsy than anything else, but I’m glad that I have at least a tenuous link with Scott McClouds’s highly recommended book! comics

Visualizing Tennis

Published by:

I’m a member of the American Statistical Association’s “Statistics in Sport” section (http://www.amstat.org/sections/sis/) and I’m also British by birth, so Andy Murray’s success at Wimbledon this year was interesting to me for two reasons. I took a look at some of the data on Murray (collected by IBM’s SlamTracker initiative — http://2013.usopen.org/en_US/slamtracker/ ) with a view to doing a little visual analysis, so now I have another reason to be interested …

I found some data on his performance over a few years leading up to Wimbledon 2013 and wanted to look at trends. Now usually I prefer to create several linked visualizations and look at them together, but for this data I found that several of the stats I was interested in worked nicely when plotted in the same system. Here’s what I came up with:

Image

Continue reading

Vega: A New Grammar-Based Specification for Visualizations

Published by:

I’m a big fan of using languages for visualization rather than canned chart types. I’ve been working with the Grammar of Graphics approach for a number of years within SPSS and now IBM, and my book “Visualizing Time” is composed 95% of Grammar-based visualizations. It’s pretty safe to say it’s my preferred approach.

Protovis (the forerunner of D3, to a great extent) was built on Grammar approach; Bostock and Heer’s 2009 article (on Heer’s site at http://hci.stanford.edu/jheer/files/2009-Protovis-InfoVis.pdf) gives a very good statement of the benefits of the Grammar-based approach as opposed to the “Chart Type” approach:

The main drawback of [the chart type] approach is that it requires a small, closed system. If the desired chart type is not supported, or the desired visual parameter is not exposed in the interface, no recourse is available to the user and either the visualization design must be compromised or another tool adopted. Given the high cost of switching tools, and the iterative nature of visualization design, frequent compromise is likely.

Continue reading

Chord Display (Music)

Published by:

ITunes Music with a RAVE Chord visualization

ITunes Music with a RAVE Chord visualization

I took the data from my last post, aggregated up some fields and made a Chord Diagram for it, using RAVE. I was lazy and didn’t do a stellar job on rolling up years, so the year indicated is actually the center of a 4-year span — so 2007 is actually [2005.5, 2009.5] which is a little odd.

No big insights here — podcasts are all recent; alternative music is mostly recent too (Eels and Killers are artists with a large number of songs in my library). Interesting that I didn’t buy a lot of music form around 1999 …

I thought there were more packages that could do chord visualizations, but was only able to find some D3 examples.