As my final project for CSC 572 in Winter Quarter 2011 I have elected to implement a scientific visuallization project. The purpose of such a project is to convey information that may be overly technical or some other way difficult to comprehend in such a manor that the common person can draw conclusions based on the information but without the prerequisite knowledge. For the purposes of this project I have chosen to visual the 2007 Toxics Release Inventory Data for the State of California created by the Environmental Protection Agency. The source data for this information can be found at data.gov.
The goal of this project was to take this specific set of data and present it in such a manner that with minimal explanation people could take what they were seeing and draw appropriate conclusions based on them. You will note for instance, on the image at the right, that while there are numerous small white dots across the map there are three relatively large squares in three distinct locations. Given the information that the dots show locations of toxic releases and that the largest of them is 4.4 million pounds then you could draw the conclusion that we should be concerned about what happened at those three locations to deal with this release.
The data you are looking at is the Environmental Protection Agencies report about documented toxic releases for California released in 2007. The data can be found at the above link and contains much more detail then will be presented here. Each entry is broken down by geographic location, chemical released, amount, and numerous other flags and subdivisions that could be used to parse the data however you see fit. I have taken this data and modified by removing many of the unnecessary details such as physical address, breakdown of releases into air, water, etc, and basically anything that wasn't needed for my plotter.
An example line from my data document:
33.857955 -118.299213 0.0025 YES PBT YES NO
While they are seperated by spaces here they are in fact tab delimited in the actual document making conversion to excel simple.
This project has elected to use the X and Y axis as the longitude and lattitude respectively for each othese locations and dot size to show the relative amount released by each location. The minimum point size is two so they can be clearly seen when looking at the full map and the upper bound is set at ten. All the points are scaled using the formula:
float size = std::max((10/zoom*(data[i].getTotalRelease() - minTR) / maxTR),2.0);
Note that the minimum point size is 2.0, chosen so the points are easy to view even when their releases are minor (less than a pound in some cases).
Using the arrow keys to shift the image arround and the Z and X keys to zoom in and out you can produce images similar to the one at the left where you can focus on specific areas of california. As you zoom int he particles become more distinguishable and the relative size becomes easier to see as the size value is also scaled by the zoom factor. This lets you move about the data and focus in on areas that are of interest to you or your purposes.
In addition to zoom I found that the chemicals released had different flags associated with them and so I thought it would be useful to be able to highlight all the chemicals that fit those flags. By pressing the appropriate key you can choose to only plot the chemicals with a certain flag as true. The keys and codes are: a - Clean Air Act, c - Carcinogen, m - Metal, p - Persisstant, Bioaccumulative, and Toxic.
From this project I have learned quite a bit about the challenges that can come with presenting scientific or really any data to the masses. The challenge is not just to make pretty pictures that can quickly persuade people to draw the same conclusions as you but also to avoid misrepresenting the data.
For instance in this project I had to make the choice between having the 3 large squares with a ton of fairly indistinguishable smaller sets or dots or using a log scale that would have had a more gradual scaling from the max to the min. I chose not to use the log scaling simply because in this case I feel that people would have gotten the wrong impression that there would be numerous heavily polluting sites rather than the three key ones that were outpacing the rest. You will also note that in the scaling function shown everyone is scaled according to this massive number to ensure a relative size but this caused everything to be almost undetectable which is why I added the ten scale factor so the max point size was clearly visible and easily distinguished.
I believe that this project was very successful in helping me to understand the challenges faced by people trying to present scientific data. Given my lack of experience with graphics the use of openGL to create the program that generated the images you see before you took a lot of patience and research in addition to the cannibalism of the in class labs for key segments of code. This code could easily be taken and expanded to plot similar data or even serve as a reference for a more "professional" product
The sizes of the points are linearly related to the amount released from each site. It is possible, as mentioned previously, to apply a non-linear scalling that would be more appropriate and give the user more useful information when looking at the full map.
Also noticed that when the image is converted into a jpg for this website the red dots become hard to distinguish in some places. This is less so the case when I am running the program probably due to my larger viewing size and possbile my laptop screen but it might make sense to play with the colors a bit more thoroughly.
Speaking of color while I do like the use of the background image as a reference for the dots and the detailed geography that it presents the user to localize key locations, such as the bay area and the centeral valley, etc, These colors do make it harder to distinguish small points on the map esepcially when they are sperate from surrounding clumps. I wonder if it might have been prudent to go with a plain outline map of california but then some of the impact and information would have been lost. The image below illustrates this problem nicely as you can see the circled dots are hard to tell and would be impossible to see if the image were any smaller. Perhaps a way to toggle the terrain detail on and off would be prudent or useful.
Everything you need to run this program can be downloaded by clicking on this link:572_final_project.zip. This was developed on a windows machine using Visual Studio 2008 (not the express version) so I am not sure on whether or not it will run on other platforms. Sorry.
Scientific Visualization Lab created by Zoe Wood for CSC 572 Winter 2011 served as the primary code base for this lab. Especially the code for the file parser was adapted to use more variables and some minor changes to variable names but was held in tact.
The tutorials on openGL at http://www.swiftless.com/opengltuts.html was very helpful for getting me up to speed with using openGL and especially with getting the map of california to appear in the background.
John Hartquist, fellow student in the class, he was a great help with the original sci viz lab and his help throughout the quarter made this final project much easier.