Monday, December 7, 2009

Graph Representation

The final example in Visualizing Data was a graph representation. To start off, we used a short text sample and graphed it in such a way that each word merited its own box, and a line was drawn between each pairing of words that appeared in sequence in the text. The display of the words on the page followed a physics simulation algorithm (specifically mimicking a string), causing the words to try and arrange themselves at the lowest energy state between its connections. If the user clicked and dragged on a word, this would a) fix its position b) turn the color red c) adjust the other words into a new arrangement of lowest energy.


While that doesn't look terrible with such a small data set, it is easy to see how this would quickly get to be too much. In the next iteration, we make all objects a small yellow circle, and only change them into words if selected. Example text for the next three are the first chapter of "Huckleberry Finn."


Same idea with this one, but the radius of each circle is now weighted by frequency in the text. Also, less jarring colors.


The possibilities for this type of representation are pretty big. I am excited to start playing around with it. In closing, here's a picture of how another program rendered this same data set: (Graphviz, filetype .dot)

Thursday, December 3, 2009

Treemaps

For the latest chapter, we harnessed the power of Treemaps (history here) to generate a quick-and-dirty visual comparison of objects. The magnitude of some attribute determines the area given to that object in the representation, giving a visual patchwork that conveys the relative weight of each object relative to the others in its class.

To quickly illustrate the power of this method, the chapter starts off with the example of displaying word frequency in Mark Twain's "Following the Equator":


After laying the conceptual foundation, we turn the idea of Treemaps to a more complex and useful application. The final project asks the user to select a directory to start at, and then maps the files and folders contained in that directory to their relative data usage. Opening window:


I selected my "Pictures" folder, and the first screen appears as such:


As one might gather from the picture, the boxes represented are assigned a hue based on their location from the top-left to bottom-right corners. In this screenshot, focus is given to the "2009-11" folder, which brings the brightness of this box up, and simultaneously dims the other boxes in the field. Clicking on a box causes it to display a recursive Treemap of the folder contents inside of it.

This process is demonstrated the screenshot below, which is a zoom in on the folder "Snapshots" and then a highlighting the folder "2009_05_24", which has a Treemap on all of the files found within that folder. Emphasis is on the file IMG_0087.JPG.


One final touch that I want to mention is that the value of each hue is adjusted based on the most recent modification date of each folder/file. The timescale on that, moreover, uses an algorithm that evaluates all objects displayed on the screen and computes a logarithmic approximation of the set. This is displayed in the following screenshot: (mouse emphasis on the folder "2008-12-15")

Tuesday, December 1, 2009

Zip Code Project

A couple days ago I finished reading the Processing Handbook, which gave a very solid overview of the Processing language. I now feel like I have a good feel for what the language is capable of doing, and thus have a better understanding of what all I can put to use in my own visualizations.

After the diversion into theory, I finished the Zip Code project in Visualizing Data. In this example, we create a faux-population density map by mapping the latitude and longitudes of the center of each zip code in the US. The data pre-processing that went into this exact project was quite interesting. In addition to putting the data in a friendly text format (no commas, reformatting the city names from all-caps), we had to convert the latitude and longitude points to a projective view of the US, since this is the view that most people are used to seeing. (Map below and algorithm from here)


After formatting the data, we arrive with a scatterplot of zip codes. This example gets interesting, however, by allowing user input in typing in numbers, which in turn highlight the zip codes that start with that number(s). Below is the map for "9":


If one types in all five digits of a zip code, the name of the town with that zipcode is displayed on the screen:


The final component of this code is a zoom function that can be activated by clicking on the "zoom" on the bottom right. This zooms the viewing window in to see the zip codes containing the numbers typed. Here is "4":