On Learning Data Visualization

Thursday, March 4, 2010

Font Standardization Across Platforms

This visualization uses the data collected in VisiBone's Font Survey.

Thursday, February 18, 2010

The Design of Everyday Things

Today I read Donald Norman's "The Design of Everyday Things," which investigates the principles of design that make a product intuitive or not to the user. Peppered with anecdotes, the book harps on common ways in which modern design baffles users. Norman presents seven guiding principles of good design:

1) Use both knowledge in the world and knowledge in the head
2) Simplify the structure of tasks
3) Make things visible: bridge the gulfs of Execution and Evaluation
4) Get the mappings right
5) Exploit the power of constraints, both natural and artificial
6) Design for error
7) When all else fails, standardize

Granted, some of these might be confusing to someone who has not read the book, but I suppose that is the whole point of reading it. Anyway, I chose to read this book in order to get a better theoretical understanding of how to design user interfaces, which is a key component of interactive data visualization. While the projects completed within this thesis might not be terribly interactive, it is a field I would like to pursue more in the future.

Tuesday, February 9, 2010

Hourlybinx Project

For the past week or so I've been pushing out data to my hourlybinx twitter account. I record when I wake up, when I go to bed, when I eat meals, and when I drink caffeine or alcohol. In addition, I check in every hour to record my energy and happiness levels.

Once I had amassed enough data, I used the python twitter module to connect to the twitter API and pull all of the data. I then parsed it in Python and exported the data as a CSV. I then read that CSV into Processing and used that to graph the information.

The final project can be viewed here. All of the aforementioned data is shown on a day-by-day basis, and there is user input to switch between days. The project uses a sample set of 200 tweets, but I am still running the Hourlybinx Project, and at some point I will update the site to reflect the larger data set.

I'm not sure how long I am going to continue running this project, but if I get enough data I could start doing weekday averages or something else. Hourlybinx data is public and freely available to anyone else wishing to use it.

Saturday, January 30, 2010

Monday, December 7, 2009

Graph Representation

The final example in Visualizing Data was a graph representation. To start off, we used a short text sample and graphed it in such a way that each word merited its own box, and a line was drawn between each pairing of words that appeared in sequence in the text. The display of the words on the page followed a physics simulation algorithm (specifically mimicking a string), causing the words to try and arrange themselves at the lowest energy state between its connections. If the user clicked and dragged on a word, this would a) fix its position b) turn the color red c) adjust the other words into a new arrangement of lowest energy.

While that doesn't look terrible with such a small data set, it is easy to see how this would quickly get to be too much. In the next iteration, we make all objects a small yellow circle, and only change them into words if selected. Example text for the next three are the first chapter of "Huckleberry Finn."

Same idea with this one, but the radius of each circle is now weighted by frequency in the text. Also, less jarring colors.

The possibilities for this type of representation are pretty big. I am excited to start playing around with it. In closing, here's a picture of how another program rendered this same data set: (Graphviz, filetype .dot)

Thursday, December 3, 2009

Treemaps

For the latest chapter, we harnessed the power of Treemaps (history here) to generate a quick-and-dirty visual comparison of objects. The magnitude of some attribute determines the area given to that object in the representation, giving a visual patchwork that conveys the relative weight of each object relative to the others in its class.

To quickly illustrate the power of this method, the chapter starts off with the example of displaying word frequency in Mark Twain's "Following the Equator":

After laying the conceptual foundation, we turn the idea of Treemaps to a more complex and useful application. The final project asks the user to select a directory to start at, and then maps the files and folders contained in that directory to their relative data usage. Opening window:

I selected my "Pictures" folder, and the first screen appears as such:

As one might gather from the picture, the boxes represented are assigned a hue based on their location from the top-left to bottom-right corners. In this screenshot, focus is given to the "2009-11" folder, which brings the brightness of this box up, and simultaneously dims the other boxes in the field. Clicking on a box causes it to display a recursive Treemap of the folder contents inside of it.

This process is demonstrated the screenshot below, which is a zoom in on the folder "Snapshots" and then a highlighting the folder "2009_05_24", which has a Treemap on all of the files found within that folder. Emphasis is on the file IMG_0087.JPG.

One final touch that I want to mention is that the value of each hue is adjusted based on the most recent modification date of each folder/file. The timescale on that, moreover, uses an algorithm that evaluates all objects displayed on the screen and computes a logarithmic approximation of the set. This is displayed in the following screenshot: (mouse emphasis on the folder "2008-12-15")

Tuesday, December 1, 2009

Zip Code Project

A couple days ago I finished reading the Processing Handbook, which gave a very solid overview of the Processing language. I now feel like I have a good feel for what the language is capable of doing, and thus have a better understanding of what all I can put to use in my own visualizations.

After the diversion into theory, I finished the Zip Code project in Visualizing Data. In this example, we create a faux-population density map by mapping the latitude and longitudes of the center of each zip code in the US. The data pre-processing that went into this exact project was quite interesting. In addition to putting the data in a friendly text format (no commas, reformatting the city names from all-caps), we had to convert the latitude and longitude points to a projective view of the US, since this is the view that most people are used to seeing. (Map below and algorithm from here)

After formatting the data, we arrive with a scatterplot of zip codes. This example gets interesting, however, by allowing user input in typing in numbers, which in turn highlight the zip codes that start with that number(s). Below is the map for "9":

If one types in all five digits of a zip code, the name of the town with that zipcode is displayed on the screen:

The final component of this code is a zoom function that can be activated by clicking on the "zoom" on the bottom right. This zooms the viewing window in to see the zip codes containing the numbers typed. Here is "4":