Wednesday, October 14, 2009

Today I Met Processing

Today I started on the O'Reilly book, "Visualizing Data," written by Ben Fry. I breezed through the first couple introductory chapters, and then started to get my feet wet in Processing. Fry's approach is to provide the reader data, files, and source code to get an example project up and running, and then to teach new aspects of programming via adding to this example. Processing is neat! Its creators were attempting to create a visual programming language that follows the form of a scripting language. I've never done anything involving scripting languages, so it was fun to see a different approach to code.

At the end of my reading today, I was up to a map of the US (provided) that had a data point plotted in the center of each state (data provided), with the data point's color and size coordinating to the sign and magnitude, respectively, of a (provided) table of random numbers. When the user mouses over a circle, it displays the value of that point, in addition to the name of the date (table provided). Neat!


Fry also explained the approach that he takes to creating a data visualization, and categorized several of the Processing functions used according to these categories. I look forward to learning more of them!

Tuesday, October 13, 2009

To Clarify, Add Detail

Today I read Edward Tufte's "Envisioning Information," which is intended as a sequel of sorts to his "Visual Display of Quantitative Information." While largely addressing the same body of information, EI tended to be less axiomatic than VDQI, and tended towards the "immersion technique" of teaching. In short, it was the coffee-table version.

Nevertheless, EI did provide a solid review of the concepts presented in VDQI, and provided me with more examples of Tufte's principles in action. For the purposes of this thesis, I imagine it will largely serve as a source of information in my visualizations. In learning Tufte's ideas a second time, I do feel like I reached a deeper understanding. Some highlights:

- The utmost importance of multivariable comparisons of data, and the way that a sparse display raises questions about the intentions of the creator. To create the most effective visualizations, I should strive to increase the data density. (I liked that this idea was presented as "escaping flatland," a nice reference to a well-known math book.)

- The role of color within a visualization. Bright colors should be used sparingly, to avoid visually overwhelming the viewer. While this was fairly intuitive, Tufte also recommends against white backgrounds and relying too heavily on black, opting instead for a neutral or grey color scheme. Bright colors, finally, should be used sparingly to provide emphasis on top of this base color scheme. I personally lean towards clean black and white designs, so this is something new that I will have to incorporate into my work. The muted color scheme does strike me as a little outdated, however, so I will also have to find a way to reconcile these two ideas.

- Given the pop nature of this book, I did find several visualizations that completely captivated me. I am intrigued by the idea of movement notation, which is a symbolic representation of dance choreography. My absolute favorite, though, was learning about Oliver Byrne's visual reinterpretation of Euclid's Elements, available online here.

Thursday, October 8, 2009

The Revelation of the Complex

While the first half of "The Visual Display of Quantitative Information" addressed data visualization in the practical realm, the second half of the book approached it from the theoretical. The chapters addressed the process of creating visualizations from the particular (eliminate Moire Vibrations, avoid grids) to the philosophical (maximize data density to working with large data sets). I wanted to point out a few particularly interesting points:

- Quartile Plots
The "box plot" is a standard sight in statistics classes, but Tufte provides an alternative way of depicting the same idea - all in the name of cutting down on non-data-ink. The alternative, a quartile plot, astounded me with its visual simplicity, while still effectively conveying all of the information.
- Different levels of depth found in graphics
"Graphics can be designed to have at least three viewing depths:

1) what is seen from a distance, on overall structure usually aggregrated from an underlying microstructure

2) what is seen up close and in detail, the fine structure of the data

3) what is seen implicitly, underlying the graphic - that which is behind the graphic"


- Shrink Principle
The idea of the shrink principle is that effective graphics can be shrunk way down and still retain their information. Tufte also included an illustration from famed visualist Bertin, who demonstrated several techniques of shrinking data while maintaining the given relationship between variables. I find this beautiful.

Monday, October 5, 2009

Data Visualization Standards

To start off my project in Data Visualization, I started reading Edward Tufte's "The Visual Display of Quantitative Information." The book is split into two halves, Graphical Practice and Theory of Data Graphics. Graphical Practice dealt with data visualizations from a practical and real-world perspective, specifically addressing their unique ability to convey complex relationships between data variables, and how often this is exploited in modern journalism. For a student who was planning on approaching this project from a purely theoretical perspective (primarily due to time restraints), I was pleasantly surprised to encounter these ideas as my introduction. Tufte lists six guidelines for data visualization integrity, as follows:


55 - Graphical Integrity - The representation of number, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.


55 - Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.


60 - Show data variation, not design variation.


67 - In time-series displays of money, deflated and standardized units of monetary measurements are nearly always better than nominal units.


70 - The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.


73 - Graphics must not quote data out of context.


I have a friend working on his Masters in Journalism, who is currently trying to create a set of minimum standards for journalistic integrity. (Jonathan Stray, Journalism Commons) My friends' work and Tufte's guidelines dovetail nicely together, and I was pleased to have a reminder of the real-world application and repercussions of data visualization before heading into the theory.