Tag Archives: datavis

Maps Preview

A short update today; we have been working on intelligent mapping for Brunel 1.0 (due in January) and since it’s a subject many people are interested in, we thought we’d put up a “work in progress” video showing how things are progressing. It’s a rough video, so you get to see my inability to type accurately as well as some rough transitions. Showing the video at full resolution is recommended.

Usual disclaimers apply: this is planned for v1.0 in January, we expect it to work as described, but no guarantees — Enjoy!

Villains of Doctor Who

Published by:

I’ve always been a big Doctor Who fan; growing up with the BBC and seeing many incarnations of the Doctor striding across the TV screens, defeating his enemies armed with intelligence, loquaciousness, and a small (admittedly sonic) screwdriver. In particular I recall being terrified of the villains in “The Talons of Weng-Chiang“, which, nowadays, do not seem particularly scary. But the villains of Who have always been magical!

So I was excited to find a data set of Doctor Who villains through 2013 (courtesy of The Guardian) and used it for a short Brunel demo video. I left it as-is for the video, but it was clear the data set needed a bit of cleaning. The column names were more like descriptions than titles, which was annoying, but the biggest issue was the Motivation column, which was more like a description than  categorization. So I edited the data a little — changing the column titles and then providing a manual clustering of motivation into smaller categories, creating three motivation columns: Motivation_Long, the original; and Motivation_Med, Motivation_Short — my groupings of those original categories. With these changes, I saved the resulting CSV file as DoctorWhoVillains.csv. You can check out an overview of the motivation columns in the Brunel Builder.

As usual with data analysis, it took way longer to do the data prep work than to use the results! I quite like this summary visualization, which is simply three shorter ones joined together with Brunel’s ‘|’ operator:

Doctor Who Villains through 2103

Doctor Who Villains through 2103

The bubble chart and word cloud show pretty much the same information — the cloud scales the size by the Log of the number of stories (otherwise the Daleks tend to exterminate any ability to see lesser villains) and is limited to the top 80-ish villains by appearance count. The bottom chart shows when villains first appeared and their motivation. The label in each cell is a representative villain from that cell, so the Sensorites are a representative dominating villain from the 1960-1965 era. The years have been binned into half decades. At a glance, it looks like extermination and domination are common themes early on, whereas self interest is more of a New Who (post-2000) thing. Serving Big Bad is evenly spread out over time.

 The Brunel script for this is quite long, as I wanted to place stuff carefully and add styling:

data('http://brunelvis.org/data/DoctorWhoVillains.csv') bubble color(Motivation_Short) size(Episodes) sort(First:ascending) label(Villain) tooltip(Villain, motivation_long, titles) at(0, 0, 60, 60) style('* {font-size: 7pt}') | data('http://brunelvis.org/data/DoctorWhoVillains.csv') cloud x(Villain) color(motivation_short) size(LogStories) sort(first:ascending) top(episodes:80) at(40, 0, 100, 55) style(':nth-child(odd) { font-family:Impact;font-style:normal') style(':nth-child(even) { font-family:Times; font-style:italic') style('font-size:100px') | data('http://brunelvis.org/data/DoctorWhoVillains.csv') x(motivation_short) y(first) color(episodes:gray) sort(episodes) label(villain) tooltip(titles) bin(first:10) sum(episodes) mode(villain) list(titles) legends(none) at(0, 60, 100, 100) style('label-location:box')

 Without the data and decoration statements, this is what it looks like — three charts concatenated together with the ‘|’ to make an visualization system:

bubble color(Motivation_Short) size(Episodes) sort(First:ascending) label(Villain) tooltip(Villain, motivation_long, titles) 
| cloud x(Villain) color(motivation_short) size(LogStories) sort(first:ascending) top(episodes:80)
| x(motivation_short) y(first) color(episodes:gray) sort(episodes) label(villain) tooltip(titles) bin(first:10) sum(episodes) mode(villain) list(titles) legends(none)

 I was curious about when villains first appeared, so came up with this chart — stacking villains in their year of first appearance (click on it for the live version):

And here are a couple of additional samples I made along the way …

Blogging With Brunel

 

Here’s a short video showing how to create a blog entry using Brunel in a few minutes. The data comes from The Guardian and is completely unmodified, as you can see in the video from the fairly odd column names! I’m making a cleaner version of the data and hope to have some samples of that up in a  few days.

The video is high resolution (1920 x 1080) and about 60M. It’s probably best viewed expanded out to full screen.

Brunel: Open Source Visualization Language

Published by:

BRUNEL is a high-level language that describes visualizations in terms of composable actions. It drives a visualization engine (d3) that performs the actual rendering and interactivity. It provides a language that is as simple as possible to describe a wide variety of potential charts, and to allow them to be used in Java, Javascript, python and R systems that want to deliver web-based interactive visualizations.


At the end of the article are a list of resources, but first, some examples. The dataset I am using for these is a set of data taken from BoardGameGeek which I processed to create a data set describing the top 2000 games listed as of Spring 2015. Each chart below is a fully interactive visualization running in its own frame. I’ve added the brunel description for each chart below each image as a caption, so you can go to the Builder anytime and copy the command into the edit box to try out new things.

data('sample:BGG Top 2000 Games.csv') bubble color(rating) size(voters) sort(rating) label(title) tooltip(title, #all) legends(none) style('* {font-size: 7pt}') top(rating:100)

This shows the top 100 games, with a tooltip view for details on the games. They are packed together in a layout where the location has no strong meaning
— the goal is to show as much data in as small a space as possible!
In the builder, you can change the number in top(rating:100) to show the top 1000, 2000 … or show the bottom 100. You could also add x(numplayers) to divide up the groups by recommended number of players

data('sample:BGG Top 2000 Games.csv') line x(published) y(categories) color(categories) size(voters:200) opacity(#selection) sort(categories) top(published:1900) sum(voters) legends(none) | data('sample:BGG Top 2000 Games.csv') bar y(voters) stack polar color(playerage) label(playerage) sum(voters) legends(none) at(15, 60, 40, 90) interaction(select:mouseover)

This example shows some live interactive features; hover over the pie chart to update the main chart. The main chart shows the number of people voting for games in different categories over time, and the pie chart shows the recommended minimum age to enjoy a game. So when you hover over ‘6’, for example, you can see that there have been no good sci-fi games for younger players in the last 10 years. Use the mouse to pan and zoom the chart (drag to pan, double-click to zoom).

data('sample:BGG Top 2000 Games.csv') treemap x(designer, mechanics) color(rating) size(#count) label(published) tooltip(#all, title) mean(rating) min(published) list(title:50) legends(none)

Head to the Builder Site to modify this. You could try:

  • change the list of fields in x(…) — reorder then or use fields like ‘numplayers’, ‘language’
  • remove the ‘legends(none)’ command to show a legend
  • change size to ‘voters’ — and add a ‘sum(voters)’ command to show the total number of voters rather than just counts for each treemap tile

Do you want to know more?

Follow links below; gallery and cookbook examples will take you to the Brunel Builder Site where you can create your own visualizations and grab some Javascript code to embed them in your web pages … which is exactly how I built the above examples!

Visualizing Tennis

Published by:

I’m a member of the American Statistical Association’s “Statistics in Sport” section (http://www.amstat.org/sections/sis/) and I’m also British by birth, so Andy Murray’s success at Wimbledon this year was interesting to me for two reasons. I took a look at some of the data on Murray (collected by IBM’s SlamTracker initiative — http://2013.usopen.org/en_US/slamtracker/ ) with a view to doing a little visual analysis, so now I have another reason to be interested …

I found some data on his performance over a few years leading up to Wimbledon 2013 and wanted to look at trends. Now usually I prefer to create several linked visualizations and look at them together, but for this data I found that several of the stats I was interested in worked nicely when plotted in the same system. Here’s what I came up with:

Image

Continue reading

Every Now and Again, a Pie can be Good

Published by:

It is hard to find anyone in visualization today with much time for pie charts. In fact it seems de rigueur to disdain them. And yet we see an awful lot of them. Now, I’m not going to claim that they are a good, general purpose chart, but I do always like to think of times when a chart will actually work well.

When Pie Charts Work At All

One well-known requirement for a pie to have a chance of working is that the data represent a fraction of a whole. That’s the big selling point of pie charts — each data row should represent a fraction of the overall data. So pies work best for percentages and fractions, and second-best for counts, populations, weights — things for which there is a natural feeling that summing them all up and saying “that represents 100%” is good.

On the side of evil is when the numbers must not be summed — if the data represent means (for different sized groups) or degrees Fahrenheit, then a pie representation is flat-out wrong. It’s not a bad rule to say:

Only Use a Pie if it makes sense to think of the data values as summing to 100%

The second rule I’d suggest is based on the inability for people accurately to judge angles. Pies do not work well for that, so if you need accurately to judge numbers, do not use a pie. Pies work well for “A is about twice as big as B” or “ C is definitely smaller in the second pie”. They are not good for “C is very slightly lower than D” or “B is just under 33%”. Stating it positively:

Use a Pie if the goal is to make broad comparisons, not detailed ones.

Finally, I’d offer a third suggestion, rather than a rule. It’s based on the observation that a bar chart (a natural competitor to a pie chart) is very often improved by ordering — high to low values, for example. Pies can often look radically different when categories are re-ordered, and although it is sometimes suggested that you do this ordering for pies, I think that a pie for categories that can be re-ordered would almost certainly look better in another form. Instead I would suggest the following:

Use a Pie when the categories have a natural order

When Pie Charts Work Well

Stephen Few (Save the pies for Dessert: http://www.perceptualedge.com/articles/08-21-07.pdf) quotes a study showing that when pies have been shown to be actively superior to bar charts — it is when it makes sense to want to compare sums of categories (e.g. the sum of the first two against the sum of the second two); the reason being that in a pie, you can compare angles for multiple segments easily, whereas in a bar chart that is not easy. 

Survey Data: Bar Chart and Pie Chart

Survey Data: Bar Chart and Pie Chart

Continue reading