Author Archives: Dan Rope

Data APIs and Visualization: The Sum is Greater Than the Parts

Published by:

U.S. Government data has always been available for free to the public, but it can be tricky to find data curated to the point that it is easily consumed by all of our fancy visualization tools.  The folks over at datausa.io have done some interesting work building a data API that works across a variety of common and useful government statistics.  I was curious to see what the potential might be when we take a simple web data API and feed its content to a simple visualization language…

The mechanics turned out to be easy since the datausa.io API allows results formatted as CSV–which is what Brunel Visualization can consume.  So the data query essentially boils down to a URL placed inside a Brunel data() statement.  Visualizations can even be immediately created by pasting these URLs into the “data” section of the Brunel Visualization online app.

So, on to some examples..  This first one uses workforce data from the ACS PUMS data provided by the US Census Bureau.  The top graph shows a heatmap of wages by hours worked per week (binned) and colored by age for full time employees.  The age value is the median of the average ages of the occupations in the bin.  Note: it would probably be better here to calculate a weighted average of age using the field containing the number of people within the occupation.  Click on a cell to see the occupations within it below on the bubble chart.  The size of the bubble represents the number of people in the occupation and the color corresponds to the Gini coefficient.  Higher (darker) Gini values indicate greater inequality wages for the occupation.

It’s interesting to poke around with some of the outlying cells to find the occupations with the highest wages and shortest hours or vice versa.  Also, the occupations in these outlying cells seem to have the most consistency for wages.

The full code (including retrieving the data) for the above example is:

data('http://api.datausa.io/api/csv?show=soc&sumlevel=3') 
x(avg_wage_ft) y(avg_hrs_ft) color(avg_age_ft:red) median(avg_age_ft)  
    bin(avg_wage_ft, avg_hrs_ft)  interaction(select) | 
bubble x(soc_name) color(gini_ft:blue) size(num_ppl_ft) label(soc_name) 
    sum(num_ppl_ft) tooltip(#all) interaction(filter)

As expected, a lot of government data is summarized geographically.  This next example uses health metrics aggregated at the state level from the University of Wisconsin’s County Health Rankings.  The histograms show the distributions of four of these metrics across all 50 states.  Roll the mouse over a histogram bar to highlight (brush) to see which states correspond to those values–or, click on your state (if you reside in the US) to see where the values for your state land for each metric.

Again, the full source code is:

data('http://api.datausa.io/api/csv/?show=geo&sumlevel=state&required=adult_obesity,health_care_costs,diabetes,excessive_drinking&year=latest') 
map key(geo_name) opacity(#selection) tooltip(geo_name,adult_obesity,health_care_costs,diabetes,
    excessive_drinking) at(0,0,100,50) interaction(select) |  
bar x(adult_obesity) axes(x) y(#count) bin(adult_obesity) opacity(#selection) stack 
    interaction(select:mouseover) at(0,50,50,75)  |  
bar x(health_care_costs) axes(x) y(#count) bin(health_care_costs) opacity(#selection) stack 
    interaction(select:mouseover) at(0,75,50,100)  |  
bar x(diabetes) axes(x) y(#count) bin(diabetes) opacity(#selection) stack  
    interaction(select:mouseover) at(50,50,100,75) | 
bar x(excessive_drinking) axes(x) y(#count) bin(excessive_drinking) opacity(#selection) stack 
    interaction(select:mouseover) at(50,75,100,100)

Having served in a government data agency in the past, I am well aware that a major concern that comes with this type of flexibility is the potential for misuse by not reading the fine print about what the data is and what it represents.  Nonetheless, powerful data APIs combined with flexible, rapid visualization design provide significant and interesting learning opportunities.

Snowzilla!

Published by:

After shoveling the driveway several times and burning through the Netflix que, one way to counter act cabin fever is to hunt down some snowfall data and play around with it.  So, I found some data over at the National Weather Service that contains snowfall depth measurements collected from a variety of sources around the region at various time points during the storm.

The map shows the maximum snowfall depth at any given location recorded from Friday until Sunday.  The deepest measurements are labeled. The area near West Virginia clearly bore the brunt of the storm, but there were some areas closer to DC that came close.  Everyone pretty much got a lot of snow.

snowzilla_mapBrunel Code:

map('usa') + x(Lat) y(Lon) max(Snowfall) color(Snowfall:blues) tooltip(City,Snowfall) style("stroke-width:0;opacity:.4;size:15px") + 
map('labels')  + x(Lat) y(Lon) max(Snowfall) top(Snowfall:10)  label(Snowfall, '"') text style("font-family:Impact;fill:darkblue") tooltip(City,Snowfall)

Apparently there was a bit of controversy regarding the exact snowfall measurement at Washington National Airport.  To try to look at this, I added a timeline graph that is linked to the map so I could see the snowfall amounts at different time points.  The data are binned to roughly 2 hours and these bins are colored by the number of measurements taken within the time range.  Clicking on a bin shows the measurements on the map–and I zoomed in to the airport.  I do not appear to have all the data showing the issue; but, I can see measurements in nearby areas and who did them.  Perhaps the upshot was that the difference is significant for historical and business reasons–but it probably won’t make your back feel much better.

snowzilla_zoom_mapBrunel Code:

map('usa') at(0,0,100,75) + x(Lat) y(Lon) size(Snowfall:200%) max(Snowfall) color(Source) label(Snowfall,City) tooltip(Snowfall, City, Source) interaction(filter)  at(0,0,100,75) + map(labels)  at(0,0,100,75) |  x(Time) bin(Time:20) color(#selection) opacity(#count) interaction(select) tooltip(Time)  at(0,85,100,100)

Since the storm lasted nearly 36 hours, it can also be interesting to look at the depths over time.  The variable sized paths below show that the snow generally started to really pile up Friday night and also that Maryland and West Virginia seemed to reach their peak a little bit sooner than Virginia.  The boxes that are overlaid on the paths show the number of measurements that were taken at binned time intervals.  More measurements were taken in Virgina and Maryland–and most measurements seem to have been taken towards the end of the major accumulation.

snowzilla_time

Brunel Code:

path x(Time) y(State) color(State) bin(Time:20)  size(Snowfall) max(Snowfall) legends(none) + x(Time) y(State) color(#count) bin(Time:30) style('height:20px')

Lastly, if you are familiar with the area, you’ll quickly recognize the county names.  Below is a cloud with county names sized by the max snowfall depths and colored by their state.  Counties with larger snowfall amounts appear more towards the center.  Most names are nearly the same size because everyone got a lot of snow!

snowzilla_cloud

Brunel Code:

cloud color(State) size(Snowfall:150%) label(County) max(Snowfall) sort(Snowfall)  style('.element {font-family:Impact;}')

Blogging with Brunel: Part 2

Published by:

 

Graham’s earlier post demonstrated how you can use the online Brunel app (http://brunelvis.org/try) to try out Brunel and include live visualizations that show your data within blog posts or other online content.

Partly because it is somewhat self-serving–since we will use this very blog for this exact purpose; but mostly, because we feel this is something Brunel can offer, we have updated the app with several new features and a new look.

The video demonstrates all of the new features, but the major ones are:

  • The Gallery and Cookbook are now integrated directly into the app
  • More editing features (titles, descriptions and resizing)
  • Uploaded or referenced data will show the individual data fields–selecting any one will show a quick visualization of that field
  • More deployment options.  Specifically, there are two new options that produce self-contained visualizations that do not require our service for deployment.

Feel free to give it a spin at http://brunelvis.org/try.  Thoughts?  Ideas?  Feature requests?  Issues?  Let us know on github..

Using Brunel in IPython/Jupyter Notebooks

Published by:

 

Analytics and visualization often go hand-in-hand.  One of the great things about notebooks such as IPython/Jupyter is that they provide a single interface to numerous data analysis technologies that often can be used together.  So, using Brunel within notebooks is a very natural fit.  For example, I can use a wide variety of python libraries to cleanse, shape and analyze data–and then use Brunel to visualize those results.

Additionally, coming up with a good visualization is a highly iterative process:  Try something, look at the results and refine until done.   So, again, the notebook metaphor of having live code execution near the results is extremely convenient.  Lastly, since notebook cells containing output can themselves be interactive, direct manipulation techniques such as brushing/linking, filtering and selection are also available.

To try this out, we have provided an integration of Brunel for Python 3 that runs in IPython/Jupyter notebooks.  Details on how to install and get started are on the PyPI site.  The video above gives a very small taste of the kinds of things that are possible.