Villains of Doctor Who

I’ve always been a big Doctor Who fan; growing up with the BBC and seeing many incarnations of the Doctor striding across the TV screens, defeating his enemies armed with intelligence, loquaciousness, and a small (admittedly sonic) screwdriver. In particular I recall being terrified of the villains in “The Talons of Weng-Chiang“, which, nowadays, do not seem particularly scary. But the villains of Who have always been magical!

So I was excited to find a data set of Doctor Who villains through 2013 (courtesy of The Guardian) and used it for a short Brunel demo video. I left it as-is for the video, but it was clear the data set needed a bit of cleaning. The column names were more like descriptions than titles, which was annoying, but the biggest issue was the Motivation column, which was more like a description than  categorization. So I edited the data a little — changing the column titles and then providing a manual clustering of motivation into smaller categories, creating three motivation columns: Motivation_Long, the original; and Motivation_Med, Motivation_Short — my groupings of those original categories. With these changes, I saved the resulting CSV file as DoctorWhoVillains.csv. You can check out an overview of the motivation columns in the Brunel Builder.

As usual with data analysis, it took way longer to do the data prep work than to use the results! I quite like this summary visualization, which is simply three shorter ones joined together with Brunel’s ‘|’ operator:

Doctor Who Villains through 2103

Doctor Who Villains through 2103

The bubble chart and word cloud show pretty much the same information — the cloud scales the size by the Log of the number of stories (otherwise the Daleks tend to exterminate any ability to see lesser villains) and is limited to the top 80-ish villains by appearance count. The bottom chart shows when villains first appeared and their motivation. The label in each cell is a representative villain from that cell, so the Sensorites are a representative dominating villain from the 1960-1965 era. The years have been binned into half decades. At a glance, it looks like extermination and domination are common themes early on, whereas self interest is more of a New Who (post-2000) thing. Serving Big Bad is evenly spread out over time.

 The Brunel script for this is quite long, as I wanted to place stuff carefully and add styling:

data('http://brunelvis.org/data/DoctorWhoVillains.csv') bubble color(Motivation_Short) size(Episodes) sort(First:ascending) label(Villain) tooltip(Villain, motivation_long, titles) at(0, 0, 60, 60) style('* {font-size: 7pt}') | data('http://brunelvis.org/data/DoctorWhoVillains.csv') cloud x(Villain) color(motivation_short) size(LogStories) sort(first:ascending) top(episodes:80) at(40, 0, 100, 55) style(':nth-child(odd) { font-family:Impact;font-style:normal') style(':nth-child(even) { font-family:Times; font-style:italic') style('font-size:100px') | data('http://brunelvis.org/data/DoctorWhoVillains.csv') x(motivation_short) y(first) color(episodes:gray) sort(episodes) label(villain) tooltip(titles) bin(first:10) sum(episodes) mode(villain) list(titles) legends(none) at(0, 60, 100, 100) style('label-location:box')

 Without the data and decoration statements, this is what it looks like — three charts concatenated together with the ‘|’ to make an visualization system:

bubble color(Motivation_Short) size(Episodes) sort(First:ascending) label(Villain) tooltip(Villain, motivation_long, titles) 
| cloud x(Villain) color(motivation_short) size(LogStories) sort(first:ascending) top(episodes:80)
| x(motivation_short) y(first) color(episodes:gray) sort(episodes) label(villain) tooltip(titles) bin(first:10) sum(episodes) mode(villain) list(titles) legends(none)

 I was curious about when villains first appeared, so came up with this chart — stacking villains in their year of first appearance (click on it for the live version):

And here are a couple of additional samples I made along the way …