On Learning Data Visualization

Saturday, January 30, 2010

Monday, December 7, 2009

Graph Representation

The final example in Visualizing Data was a graph representation. To start off, we used a short text sample and graphed it in such a way that each word merited its own box, and a line was drawn between each pairing of words that appeared in sequence in the text. The display of the words on the page followed a physics simulation algorithm (specifically mimicking a string), causing the words to try and arrange themselves at the lowest energy state between its connections. If the user clicked and dragged on a word, this would a) fix its position b) turn the color red c) adjust the other words into a new arrangement of lowest energy.

While that doesn't look terrible with such a small data set, it is easy to see how this would quickly get to be too much. In the next iteration, we make all objects a small yellow circle, and only change them into words if selected. Example text for the next three are the first chapter of "Huckleberry Finn."

Same idea with this one, but the radius of each circle is now weighted by frequency in the text. Also, less jarring colors.

The possibilities for this type of representation are pretty big. I am excited to start playing around with it. In closing, here's a picture of how another program rendered this same data set: (Graphviz, filetype .dot)

Thursday, December 3, 2009

Treemaps

For the latest chapter, we harnessed the power of Treemaps (history here) to generate a quick-and-dirty visual comparison of objects. The magnitude of some attribute determines the area given to that object in the representation, giving a visual patchwork that conveys the relative weight of each object relative to the others in its class.

To quickly illustrate the power of this method, the chapter starts off with the example of displaying word frequency in Mark Twain's "Following the Equator":

After laying the conceptual foundation, we turn the idea of Treemaps to a more complex and useful application. The final project asks the user to select a directory to start at, and then maps the files and folders contained in that directory to their relative data usage. Opening window:

I selected my "Pictures" folder, and the first screen appears as such:

As one might gather from the picture, the boxes represented are assigned a hue based on their location from the top-left to bottom-right corners. In this screenshot, focus is given to the "2009-11" folder, which brings the brightness of this box up, and simultaneously dims the other boxes in the field. Clicking on a box causes it to display a recursive Treemap of the folder contents inside of it.

This process is demonstrated the screenshot below, which is a zoom in on the folder "Snapshots" and then a highlighting the folder "2009_05_24", which has a Treemap on all of the files found within that folder. Emphasis is on the file IMG_0087.JPG.

One final touch that I want to mention is that the value of each hue is adjusted based on the most recent modification date of each folder/file. The timescale on that, moreover, uses an algorithm that evaluates all objects displayed on the screen and computes a logarithmic approximation of the set. This is displayed in the following screenshot: (mouse emphasis on the folder "2008-12-15")

Tuesday, December 1, 2009

Zip Code Project

A couple days ago I finished reading the Processing Handbook, which gave a very solid overview of the Processing language. I now feel like I have a good feel for what the language is capable of doing, and thus have a better understanding of what all I can put to use in my own visualizations.

After the diversion into theory, I finished the Zip Code project in Visualizing Data. In this example, we create a faux-population density map by mapping the latitude and longitudes of the center of each zip code in the US. The data pre-processing that went into this exact project was quite interesting. In addition to putting the data in a friendly text format (no commas, reformatting the city names from all-caps), we had to convert the latitude and longitude points to a projective view of the US, since this is the view that most people are used to seeing. (Map below and algorithm from here)

After formatting the data, we arrive with a scatterplot of zip codes. This example gets interesting, however, by allowing user input in typing in numbers, which in turn highlight the zip codes that start with that number(s). Below is the map for "9":

If one types in all five digits of a zip code, the name of the town with that zipcode is displayed on the screen:

The final component of this code is a zoom function that can be activated by clicking on the "zoom" on the bottom right. This zooms the viewing window in to see the zip codes containing the numbers typed. Here is "4":

Wednesday, November 18, 2009

Processing Interim

I decided that since I have now created several examples of fully-fledged programs in processing, I should go and learn about the full capabilities of the language. To do so, I am taking a break to read through "Processing: A Programming Handbook," the official guide to Processing, as written by its creators.

I'm learning a lot about the fully capabilities of the language, and perhaps equally important, getting a solid review of key concepts in computer programming.

Thursday, November 12, 2009

Real World Data

The next Chapter of "Visualizing Data" address real life data, and how it can often be messy and difficult to parse. As such, most of the information in this chapter deals with background processes. Topics included: Sifting through website source code to find the files that actually contained the desired data, regular expressions to parse data files, and creating strings to store the relevant information for future use.

The project for this chapter dealt with correlating baseball teams' win-loss record to their salaries. After going through all of the data, we reached this basic sketch:

There wasn't a whole lot of work put into refining the image, but the following image does use color to differentiate the sign of the correlation, width for the magnitude, and a couple of spacing/typography improvements:

Tuesday, November 3, 2009

Time Series Graphs

Today I worked through Chapter 4 of Visualizing Data, titled "Time Series." In this Chapter, we got into the actual mechanics of creating a solid graph, in that in contains all of the necessary components to clearly illustrate the data. Half of the chapter covered nitty-gritty details, such as axis labels, tick marks on the axis, or small lines on the graph to give the viewer a sense of scale.

In addition, the chapter covered various methods of actually presenting the data in question: a series of points, a series of line segments, a smoothed line, a combination of points and lines, a solid-color area, or a bar graph.

We even added in a mouseover function to display the value of data points when moused over.

The original graphs were drawing from a table of three different subjects; Coffee, Tea, and Milk. Originally the user was able to access the different graphs by scrolling through via the "[" and "]" keys, but in the final exercise we added tabs up top that responded to the user's mouse clicks.