Difference between revisions of "Visualizations"

From Earlham CS Department
Jump to navigation Jump to search
(First Reading and Making Visualizations Tour)
(Assignments)
 
(106 intermediate revisions by 9 users not shown)
Line 1: Line 1:
== First Reading and Making Visualizations Tour ==
+
== Course Overview ==
Listed below are the assignments for each chunk, note that everyone should read the startup materials.
+
Math/CS 484 -- The goal of our Ford/Knight project is to distill and organize the principles of visualizing large data sets. Modern science is often done by small groups of people that come from diverse backgrounds, e.g. a mathematician, a biologist, and a computer scientist. We plan to solicit input in the form of example data sets to work with from each of the natural and social science departments on campus. This work will provide a foundation for a course, or course module, which we hope to offer in the future. Must see instructor for registration.
* Startup - Everyone
 
* Web site - Leif
 
* Making presentations - Mikel
 
* News graphics - Ivan
 
* Financial data - Elena
 
      • Avoid visual distortion in data graphics  (''“The Visual Display of Quantitative Information”'', Chapter 2)
 
        o table - best way to show numbers (20 numbers or less>prefer table to a graph) (p.56)
 
        o representation of numbers should be directly proportional to the numerical quantities represented (p.56)
 
        o clear, detailed labeling to defeat graphical distortion and ambiguity (p.56)
 
        o show data variation, not design variation (p.61)
 
      • Avoid inaccurate reflection of reality (''“The Visual Display of Quantitative Information”'', Chapter 2)
 
        o if plotting government spending and dept over the years, take population and inflation into account (p.68)
 
        o in time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units (p.68)
 
      • A use of 2 or 3 varying dimensions to show one-dimensional data is a weak and inefficient technique: the number of information should not exceed the number of dimensions in the data (p.71)
 
      • Context is essential for graphical integrity (''“The Visual Display of Quantitative Information”'', Chapter 2)
 
        o in the quantitative thinking data graphics should answer “Compared to What?” question (p.74)
 
        o graphics must not quote data out of context (p.74)
 
      • Multiples help to monitor and analyse typical to finance multi-variable processes, combining overview with detail (''“Visual Explanations”'', p.110-111)
 
        o Blending quantitative multiples, narrative text and images is useful for monitoring data-rich processes (p.110)
 
      • Consider sparklines: datawords – data-intense, data-simple, word-sized graphics (''“Beautiful Evidence”'', p.46-63)
 
        o Tracks and compares changes over time, by showing overall trend along with local detail (p.50)
 
        o Should often be embedded in text and tables : possibility of writing with data graphics(p.49)
 
        o Daily sparkline data can be standardized and scaled depending on the content: by the range of price, inflation-adjusted price, percent change off of a market baseline (p.50)
 
        o Shows recent change in relation to many pat changes, sparklines gives a context for nuanced analysis and better decisions (p.50)
 
        o efficiently displays and narrate binary data – presence/absence, occurrence/non-occurrence, win/loss) (p.54)
 
        o Shows intensity/frequency of occurrence (p.55)
 
        o Improves conventional statistical graphics within univariate and bivariate marginal distributions, as the univariate sparklines link up 2-D plots(p.57)
 
    • When designing sparkline think about (''“Beautiful Evidence”'', p.46-63):
 
        o Variation in slopes are best detected when slopes are 45 average (p.60)
 
        o Moderately greater in length than in height (p.60)
 
        o Using the maximum reasonable vertical space available under the word-like constraint, then adjust the horizontal stretch of the time-scale to meet the lumpy criterion (p.60)
 
        o Contextual methods for quantifying sparklines – choose encoding (p.61)
 
        o Changing the relative weight of the data-lines and also muting the contrast between the data and background to reduce optical noise (p.62)
 
        o Printing in single color/ judicious mix of 2/ flat color/ stochastic color methods to avoid color dots (p.62)
 
        o Avoiding strong frames around that create unintentional optical clutter (p.62)
 
        o Resolution of sparklines is better on the paper than on the computer screens (p.63)
 
        o printing and viewing data density of 500 spraklines on A3 size paper (about 25X45 cm, or 11X17 in) – adjacent in space result, which assists comparison, search, patter-finding, exploration, replication, review (p.63)
 
  
* Decision making - Emily
+
== Assignments ==
* Narrative - Dee
 
* Aesthetics - Tristan
 
* Graphic design - Alex
 
* Scientific and engineering - Mobeen
 
* Animations - Ryan
 
  
As you read your chunks look for bits of guidance, advice, technique, etc. that you feel are useful.  Summarize each of these in our [[making-visualizations|Making Visualizations]] page, make sure each entry contains an appropriate citation.
+
* [[student-solutions|Student Solutions]]
  
== First Lab (DRAFT) ==  
+
==== 16) Course Reflection ====
Measuring the real world.
+
In addition to the standard evaluation form please reflect for a bit and write-up a short bit that addresses these questions.
 +
* How did the course compare to your expectations of the course?
 +
* What did you find most interesting/useful?  Least interesting/useful?
 +
* What do we need to "package" so that other students or faculty could gain from what we've done? 
 +
* What's the best format for delivering this material?  In-situ for a class or classes?  1 credit class, etc. On-demand sessions?
  
== Overview ==
+
Please turn this in with your course evaluation form to the envelope in Bobbi's office before the end of the day on Tuesday 11 December.
Math/CS 484 -- The goal of our Ford/Knight project is to distill and organize the principles of visualizing large data sets. Modern science is often done by small groups of people that come from diverse backgrounds, e.g. a mathematician, a biologist, and a computer scientist. We plan to solicit input in the form of example data sets to work with from each of the natural and social science departments on campus. This work will provide a foundation for a course, or course module, which we hope to offer in the future. Must see instructor for registration.
 
  
== Course Schedule (DRAFT) ==
+
==== 15) Third Visualization Project ====
* Week 1 -- Visualization Basics
+
Find a story and build a visualization to support it.  You may choose the data sets, although you must incorporate at least three.  You can choose to analyze/visualize one variable over multiple data sets or multiple variables over multiple data sets, include geocoding or not, etc.  Find the common thread(s) that tie your data sets together and tells the story you want to tell.
# lab on data collection
 
# begin work on course products
 
## guide -- do's and don'ts for good infographics
 
## transferable vignettes
 
## ??
 
  
* Week 2 -- Visualization Basics
+
Work in pairs:
# lab on turning reports into data into information
+
* Mobeen and Ivan
# continue work on course products
+
* Dee and Leif
 +
* Emily and Tristan
 +
* Ryan and Elena
 +
* Mikel and Alex
  
* Week 3 -- Exploratory Data Analysis
+
Use one or more of these toolchains:
# lab on EDA -- numerical and graphical summaries
+
* R
# continue work on course products
+
* gnuplot
  
* Week 4 -- Exploratory Data Analysis
+
Write-up a plan for your work, include a short description of the story you are telling, the specific data sets employed, and a sketch of the visualization.  This is due in class on Tuesday 4 December.  Please bring a printout of your plan to class.  Come to class on Thursday 29 November with questions, ideas, etc.
# lab on EDA
 
# continue work on course products
 
  
* Week 5 -- Visualization Tools (notice the links below)
+
The final visualization (PDF, etc. and script(s)) is due in class on Thursday 6 December.  Come to class prepared to give a crisp (< 8 minute) presentation about your visualization.  We will be advertising this class session to science students and faculty and encouraging them to attend by bribing them with free pizza.
# Tools assignment -- low tech, high tech
 
# continue work on course products
 
  
* Week 6 -- Visualization Tools
+
==== 14) Second Visualization Project Redux ====
# Tools assignment -- critical reviews of existing visualizations
+
Take the feedback you received on Tuesday morning and working with your partner improve your ice core data set visualization.  Due in class on Thursday 15 November.  Remember to upload your modified script and PDF, PNG, etc. to the wiki.
# continue work on course products
 
  
* Week 7 -- Visualization Tools
+
==== 13) Second Visualization Project ====
# Tools assignment
+
Find a story and build a visualization to support it based on ice core data sets.  There are many available, e.g. from multiple locations in Antarctica and other locations.  These data sets typically contain depth, measurements of particulate matter, atmospheric chemical compositions, and various climate and date proxies.  You can choose to analyze/visualize one variable over multiple locations or multiple variables over a single location.  Include at least three dimensions, e.g. location on the earth, depth/date, and climate proxy, or depth/date, chemical marker, and climate proxy, etc. 
# continue work on course products
 
  
* Week 8 -- Visualization Tools
+
Work in pairs:
# Tools assignment
+
* Mobeen and Mikel
# continue work on course products
+
* Dee and Leif
 +
* Emily and Tristan
 +
* Ryan and Elena
 +
* Ivan and Alex
  
* Week 9 -- Projects
+
Use one or more of these toolchains:
# Projects assignment
+
* R
# continue work on course products
+
* gnuplot
  
* Week 10 -- Projects
+
Write-up a plan for your work, include a short description of the story you are telling, the specific data sets employed, and a sketch of the visualization.  This is due in class on Tuesday 6 November.  Please bring a printout of your plan to class.
# Projects assignment -- documenting choices and assumptions
 
# continue work on course products
 
  
* Week 11 -- Projects
+
The final visualization (PDF, etc. and script(s)) is due in class on Tuesday 13 November.  Come to class prepared to give a crisp (< 5 minute) presentation about your visualization.
# Projects assignment
 
# continue work on course products
 
  
* Week 12 -- Projects
+
==== 12a) gnuplot Redux ====
# Projects assignment
+
Take the feedback you received on your gnuplot visualization and re-work it.  Put the updated output and the script in the usual place appropriately labeled.  Due before the start of class on Tuesday 6 November.
# continue work on course products
 
  
* Week 13 -- Projects
+
==== 12) Getting Started with gnuplot ====
# Projects assignment
+
Due in class on Tuesday 30 October
# continue work on course products
+
# Identify 3 (or more) data sets that you can use to tell a story with an environmental theme.
 +
# Develop your visualization using at least 25 '''unique''' commands in your gnuplot script.
 +
# Use color, bonus points for 3D.
 +
# Post your script and the output (PNG, JPG, etc.) on the student solutions wiki page /before class/ on Tuesday 30 October.
 +
# Come to class on Tuesday prepared to give a < 5 minute crisp presentation about your visualization.
 +
# <b>You should know what your theme/data sets are by class on Thursday 25 October.</b>
  
* Week 14 -- Projects
+
==== 11) Science Magazine Review ====
# Projects assignment
+
Due in class on Thursday 25 October
# continue work on course products
+
# Browse the issue of Science that is on reserve for this class in Wildman.  Find what you believe is a really well done viz, and a really poorly done one.  Come to class prepared to give a short (< 5 minute) tour of the two of them explaining what they are, why they are good, and why they are bad.
  
* Week 15 -- Projects
+
==== 10) Getting Started with R ====
# Projects presentation
+
Due in class on Thursday 18 October.
# complete work on course products
+
# [[first-r-lab|First R lab]] - Post your first R visualizations /before 12p on Thursday/ to the student solutions page on the wiki, and then during class on Thursday you should briefly describe/discuss each in turn (a maximum of 5 minutes each).  Make sure you watch the time so all of you have an opportunity to present your work.
 +
# Explore, or re-explore as the case may be, the R galleries.  Look at the scripts that produce the visualizations and figure-out how you might leverage some of those patterns.
  
== Short-term To Do List ==
+
==== 9) Reading ====
# Figure-out books for the library to purchase, probably put them on reserve through the fall (charlie)
+
Due in class on Tuesday 16 October.
# Look at on-line courses in this area (mic)
+
# Chapters 1 and 2 in Designing Data Visualizations (previously assigned)
 +
# Chapters 1 and 2 in Visualize This (previously assigned)
 +
# Overview, Form and Structure, Process and Time in Visual Strategies (previously assigned)
 +
# Part II (chapters 3, 4, 5, 6) in Designing Data Visualizations
  
== Examples ==
+
==== 8) First Visualization (redux) ====
* Good and Bad Statistical Graphs -- http://www.datavis.ca/gallery/
+
Due in class on Tuesday 9 October. Use the feedback you received from the class and the professors to refine and improve your first visualization. Post the revised version using your placeholder on the [[student-solutions|Student Solutions]] page and bring a printout of it to class. Come to class prepared to give a crisp 4 minute before and after presentation to the class.
* Eurozone debt - http://www.bbc.co.uk/news/business-15748696
 
* Wikileaks US embassy cables - http://datavisualization.ch/datasets/wikileaks-us-embassy-cables/
 
* Stopping SOPA and PIPA - http://visual.ly/stop-sopa
 
* Auto accident statistics in Britain - http://www.bbc.co.uk/news/magazine-16631597
 
* A snapshot of the rapidly changing world of computing, communications and technology - http://www.nytimes.com/interactive/2011/12/06/science/1206-world.html?ref=science
 
* Words by the millions - http://www.nytimes.com/2012/03/25/business/words-by-the-millions-sorted-by-software.html?_r=1&ref=technology
 
* county health ratings - http://www.countyhealthrankings.org/app
 
* live wind map - http://hint.fm/wind/index.html
 
* Factual - http://www.nytimes.com/2012/03/25/business/factuals-gil-elbaz-wants-to-gather-the-data-universe.html?ref=technology
 
* worldwide health data - http://www.youtube.com/watch?v=jbkSRLYSojo&feature=player_embedded
 
* Obama's budget proposal - http://www.nytimes.com/interactive/2012/02/13/us/politics/2013-budget-proposal-graphic.html?emc=eta1
 
* Interactive earthquake map - http://pnsn.org/tremor
 
* http://visual.ly/education-vs-incarceration - and their tool for building vizs
 
* shot analysis for NBA finals - http://www.nytimes.com/interactive/2012/06/11/sports/basketball/nba-shot-analysis.html
 
* European debt -- http://www.aljazeera.com/indepth/interactive/2012/06/20126127221845926.html
 
* Map of the Market (link behaves oddly, but you can get there) -- http://www.smartmoney.com/map-­of-­the-­market/
 
* Gallery of R Visualizations -- http://addictedtor.free.fr/graphiques/
 
* nice quicktime example of the "starchart" Filmfinder -- http://hcil2.cs.umd.edu/video/1994/1994_visualinfo.mpg -- dated but very good
 
* 2010 U.S. Election Visualizations -- http://www.csc.ncsu.edu/faculty/healey/US_election/
 
* Gun-related deaths by US State -- http://www.aljazeera.com/indepth/interactive/2012/07/2012726141159587596.html
 
* Minard's Map of French Wine -- http://en.wikipedia.org/wiki/File:Minard%E2%80%99s_map_of_French_wine_exports_for_1864.jpg#file
 
* Minard's Map of Napoleon's Russian Invasion -- http://en.wikipedia.org/wiki/File:Minard.png#file
 
* Krulwich - http://www.npr.org/blogs/krulwich/2012/03/21/149095154/mirror-mirror-on-the-wall-do-the-data-tell-it-all?sc=fb&cc=fp
 
* Defections of Syrian Leaders -- http://www.aljazeera.com/indepth/interactive/syriadefections/2012730840348158.html
 
  
== Press ==
+
Finish the reading that was assigned earlier.
NPR did a couple of interesting segments on Big Data, visualizations, and the search of mathematicians and others who can do that stuff. (December, 2011)
 
* Part 1 - http://www.npr.org/2011/11/29/142521910/the-digital-breadcrumbs-that-lead-to-big-data?ps=rs
 
* Part 2 - http://www.npr.org/2011/11/30/142893065/the-search-for-analysts-to-make-sense-of-big-data
 
  
New York Times article from December, 2011 on bioinformatics and visualization, MicJ
+
==== 7) First Visualization ====
 +
Due in class on Tuesday 2 October, both a printout and the visualization posted on the wiki. Come to class prepared to spend about 5 minutes presenting your viz to the class on Tuesday morning.
  
== Other ==
+
==== 6) Plan for First Visualization ====
* http://www.r-bloggers.com/how-the-new-york-times-uses-r-for-data-visualization/
+
The write-up of the plan for your first visualization project is due in class on Tuesday 25 September.  This should include:
* At some point nyt.com supported a "viz lab" where people could use their data sets to build their own visualizations.  I can't find a current reference to this now.  20 January 2012
+
* The question you are going to answer or story you are going to tell
* IBM's Many Eyes -
+
* The data sets you will use (including URLs if available)
* http://www.cc.gatech.edu/~stasko/7450/syllabus.html
+
* Any numerical summaries you will produce
 +
* A hand drawn draft of the visualization
  
== Presentations ==
+
To prepare for this you should read/watch the following items <b>before</b> you design your visualization or write-up your plan.
* David McCandless: The beauty of data visualizations (TED) - http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html
+
* David McCandless:  
 +
** The beauty of data visualizations (TED) - http://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization.html
 
** Military spending - http://www.guardian.co.uk/news/datablog/2010/apr/01/information-is-beautiful-military-spending
 
** Military spending - http://www.guardian.co.uk/news/datablog/2010/apr/01/information-is-beautiful-military-spending
* What we learned from 5 million books (TED) - http://www.ted.com/talks/what_we_learned_from_5_million_books.html
+
* Chapters 1 and 2 in Designing Data Visualizations (on reserve in the science library)
** Google's ngram interface: http://books.google.com/ngrams/
+
* Chapters 1 and 2 in Visualize This (on reserve in the science library)
* Baby names -- NameVoyager (http://www.babynamewizard.com/voyager)
 
* Wordle (http://www.wordle.net/ )
 
* Raw Milk Laws in the US (http://farmtoconsumer.org/raw_milk_map.htm)
 
* International Milk Production (http://chartsbin.com/view/1492)
 
* Perception in Visualization -- http://www.csc.ncsu.edu/faculty/healey/PP/
 
  
== Keywords ==
+
==== 5) Second Critique Tour ====
* infographics
+
* For this critique tour we will use IBM's Many Eyes project, http://www-958.ibm.com/software/data/cognos/manyeyes/  Before you start spend a minute looking around the site and explore the data sets, tools, etc. that are available. 
* Big data
+
* Browse the visualizations focusing on ones based on scientific data/questions, http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations?sort=rating 
* work flow(s)
+
* Identify three (or more) visualizations that share a theme, question, or underlying data set(s).  Use the evolving guidelines, [[evaluating-infographics|Evaluating Infographics]] to produce a critique of each of the visualizations that you choose.  Write-up each of those critiques.
 +
* Due in class on Thursday 20 September.
  
== The People ==
+
==== 4) First Critique Tour ====
* Mic Jackson, Mathematics & Environmental Science
+
This assignment is to be done in-class on Tuesday 11 September, 2012.  In pairs review/critique one of these infographics from http://visual.ly/ 
* Charlie Peck, Computer Science
+
# [http://visual.ly/chinese-new-dominant-language-internet Human Languages on the Internet] - Ivan, Mikel
 +
# [http://visual.ly/internet-2015 The Internet in 2015] - Leif, Dee
 +
# [http://visual.ly/internet-usage-worldwide Worldwide Internet Usage] - Elena, Emily
 +
# [http://visual.ly/how-technology-has-boosted-ecommerce-over-past-25-years Technology and eCommerce] - Tristan, Alex
 +
# [http://visual.ly/responsive-web-design-0 Responsive Web Design] - Mobeen, Ryan
  
# Diana Ainembabazi
+
Each group should:
# Ivan Babic
+
* Evaluate the infographic using the criteria listed below. 
# Leif DeJong
+
* Locate a second infographic, on Visual.ly or elsewhere,  that covers roughly the same ground and evaluate it similarly. 
# Ryan Lake
+
* Prepare and deliver a 4 minute presentation which summarizes your findings during the last portion of class this morning.
# Mobeen Ludin
 
# Emily Pavlovic
 
# Mikel Qafa
 
# Alex Reid
 
# Elena Sergienko
 
# Tristan Wright
 
  
== Tools ==
+
Consider the guidelines we are developing, [[evaluating-infographics|Evaluating Infographics]], as you examine the infographics.
* GPlates - plate tectonics visualizations, multi-platform (http://www.gplates.org/)
 
* open source visualization toolkits
 
** Prefuse ( http://prefuse.org/ ),  
 
** Flare ( http://flare.prefuse.org/ )
 
** Protovis ( http://vis.stanford.edu/protovis/ )
 
  
* groundbreaking visualization projects
+
==== 3) First Workshop - Histograms ====
** Many Eyes ( http://www.many­-eyes.com )
+
This assignment is designed to consolidate your knowledge with histograms and give you experience generating one with a modest data set.  You <b>must</b> do the work by hand, you can <b>optionally</b> use a software tool to produce it as well. Make sure you document each step of your work. This workshop is due Thursday 13 September.
** IBM Visualization and Behavior Group (http://researcher.watson.ibm.com/researcher/view_project.php?id=3419)
 
  
* a review of Tableau software (http://infosthetics.com/archives/2010/06/social_visualization_software_review_tableau_public.html)
+
==== 2) First Lab - Measuring the Real World ====
* another (http://bitools.org/tableau-software/)
+
Measuring the real world, [http://cs.earlham.edu/~charliep/area.pdf the PDF].  This lab is due Sunday 9 September at 3p US-ET.  Turn in a (BW) printout of your writeup and visualization, along with the URL of the on-line (color) version of the visualization if it is available. Put the paper copy in Charlie's Box A in the wooden tower in the Math/CS/Physics lounge on the West end of second floor of Dennis Hall at Earlham College in Richmond, IN, US (planet Earth).
* a Tableau competitor (http://www.inetsoft.com/info/alternative_to_tableau_visualization_dashboards/?utm_vendor=google&utm_source=northamerica&utm_campaign=visual&utm_medium=search&utm_content=12577228682&utm_term=tableau%20software%20review&gclid=CKPZmvbyoLECFQ8CQAody2v2bg)
 
* Polaris interactive database visualization (http://www.graphics.stanford.edu/projects/polaris/)
 
* Spotfire (http://www.cs.umd.edu/hcil/spotfire/)
 
  
== Topics ==
+
==== 1) First Reading and Tips and Techniques Tour ====
# Long-term turtle size, sex, age, climate by year from Western Nebraska (JohnI)
+
Listed below are the assignments for each chunk, note that everyone should read the startup materials.
#* Von Bertalanthy (sp) growth model, special case of Fisher models?
+
* Startup - Everyone
# Long-term iguana size, sex, age, climate (8 years only) from Bahamas (Exumas island) (JohnI)
+
* Web site - Leif
#* Von Bertalanthy (sp) growth model, special case of Fisher models?
+
* Making presentations - Mikel
# Why do turtles lay the number, size, type and frequency of eggs that they do?
+
* News graphics - Ivan
#* What are the common patterns?
+
* Financial Data - Elena
#* Which dimensions aren't accounted for? 
+
* Decision making - Emily
#** Latitude and longitude? 
+
* Narrative - Dee
#** Habitat? 
+
* Aesthetics - Tristan
#** Phylogeny?
+
* Graphic design - Alex
#** Climate?
+
* Scientific and engineering - Mobeen
#** What other data sets are available?
+
* Animations - Ryan
# How to distinguish between variations within a species vs different species
 
#* Standardized morphometric data (AOT moristic data, e.g. counts of number of scales between body parts), size standardized
 
#* Currently using multivariate statistics, about 25 variables
 
#* Looking for one image with all populations and variables
 
#* Looking for structure
 
# Phylogenetic reconstruction, visualizing trees with multiple models (JohnI)
 
  
== Techniques ==
+
As you read your chunks look for bits of guidance, advice, technique, etc. that you feel are useful.  Summarize each of these in our [https://docs.google.com/document/d/1Lfv0PBmG4Y_50BesrMb_8UEq-Ehe6BF9IG9Vi9LK2SE/edit Tips and Techniques Google Doc], make sure each entry contains an appropriate citation and follows the pattern/example at the top of the document.  This tour is due Sunday 2 September.
# Principle component analysis
 
# Discriminate function analysis
 
# Data conditioning and translation, CSV and XML
 
# Gridded and non-gridded data
 
# Ideas that Michael suggested
 
  
== Sources ==
+
== Resources ==
# Mic's books
+
Visualization Galleries (some with embedded tools, e.g. Many Eyes and Gapminder)
# Charlie's books
+
* Visually - http://visual.ly/ 
# Dave's viz workshop at Kean
+
* IBM's Many Eyes - http://www-958.ibm.com/software/data/cognos/manyeyes/
# Web sources
+
* Tableau Public Visualization Software - http://www.tableausoftware.com/public/
* The Organisation for Economic Co-operation and Development (OECD) statistics -- http://www.oecd.org/statistics/
+
* R Gallery - http://gallery.r-enthusiasts.com/
 +
* R codes for figures in  the book _R Graphics_ -- http://www.stat.auckland.ac.nz/~paul/RGraphics/rgraphics.html
 +
* Hans Rosling's Gapminder - http://www.gapminder.org/
 +
* Thinking with Google - http://www.thinkwithgoogle.com/insights/library/infographics/
 +
* R graphics tutorials from the author of Visualize This - http://flowingdata.com/category/tutorials/
 +
* A very useful R blog:
 +
** general, with some excellent examples - http://blog.revolutionanalytics.com/graphics/
 +
** geographic maps - http://blog.revolutionanalytics.com/2009/10/geographic-maps-in-r.html
 +
**
 +
* Download a pdf copy of A Practical Guide to Geostatistical Mapping -- http://spatial-analyst.net/book/
  
== Schedule ==
+
Data Sets
* Looking for 2-3 hours of meeting time, possibly one shorter and one longer
+
* Amazon - http://aws.amazon.com/datasets
* Noon on Monday, Thursday, or Friday
+
* Google - http://www.google.com/publicdata/directory
* 4p-7p Monday, Wednesday, Thursday, Friday (modulo sport practice)
+
* US Census - http://www.census.gov/main/www/access.html
 +
* Project Gutenberg - http://www.gutenberg.org/
 +
* US Government public data - http://www.data.gov/
 +
* UK Government public data - http://data.gov.uk/
 +
* IBM's Many Eyes - http://www-958.ibm.com/software/data/cognos/manyeyes/datasets/
  
== The Plan ==
+
Advice and Technique
1) Planning items
+
* [http://visual-strategies.org/ Visual Strategies] (book website) and the [http://www.nytimes.com/2012/09/04/science/visual-strategies-transforms-data-into-art-that-speaks.html review] of it from the NY Times Science section.
* Are there any field trip opportunities?
+
 
* Figure-out what books to order
+
R
* Figure-out what are the likely conference opportunities?
+
* http://mazamascience.com/WorkingWithData/?p=958 - A script based introduction to R
* Are there any other tools besides R that we should be considering?
+
 
** GRASS?
+
gnuplot
**  
+
* Sourceforge, examples, manual - http://gnuplot.sourceforge.net/
 +
* Wikipedia gnuplot diagrams (many with source) - http://commons.wikimedia.org/wiki/Category:Gnuplot_diagrams
 +
* Tutorial, FAQ - http://t16web.lanl.gov/Kawano/gnuplot/index-e.html
 +
* Project - http://www.gnuplot.info/
 +
 
 +
Course Specific
 +
* [[viz-feedback-legend|Feedback Legend]]
 +
 
 +
== Bread Crumbs ==
 +
* Thursday 23 August
 +
*# Anscombe's data sets - http://en.wikipedia.org/wiki/Anscombe's_quartet
 +
 
 +
* Sunday 26 August (retrieve notes from board pictures)
 +
*# Relative error, absolute error, systematic error, and related topics
 +
*# Standard deviation
 +
*# Precision and accuracy
 +
 
 +
* Thursday 30 August (harvest from Mic)
 +
 
 +
* Tuesday 4 September (harvest notes from board picture)
 +
*# Histograms
 +
 
 +
* Thursday 6 September
 +
*# Answered questions about first lab.
 +
*# Demonstrated how to upload files to the wiki, used for lab reports in PDF form.
 +
 
 +
* Tuesday 11 September
 +
*# Discussion about when to aggregate, how many readings to take and related issues
 +
*# First critique tour (in-class)
 +
 
 +
* Thursday 13 September
 +
*# Last of the first critique tour presentations
 +
*# Discuss next critique tour
 +
 
 +
* Tuesday 18 September
 +
*# [http://www.radiolab.org/2010/oct/08/ Radiolab - Cities episode]
 +
*# Hans Rosling Videos
 +
*## [http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html TED Talks: Hans Rosling: Stats that reshape your worldview]
 +
*##[http://www.youtube.com/watch?v=jbkSRLYSojo Hans Rosling's 200 Countries, 200 Years, 4 Minutes - The Joy of Stats]
 +
 
 +
* Thursday 20 September
 +
 
 +
* Tuesday 25 September
 +
*# In-class review and critique lab
 +
 
 +
* Thursday 27 September
 +
*# Return and review first lab
 +
*# Q and A about first visualization project
 +
 
 +
* Tuesday 2 October
 +
*# First visualization presentations
 +
 
 +
* Thursday 4 October
 +
*# First visualization presentations (two stragglers)
 +
 
 +
* Tuesday 9 October
 +
*# First visualization presentations (redux)
  
2) Things to learn
+
* Tuesday 16 October
* Is there a somewhat canonical process or technique that one can reliably apply to go from readings -> data -> information?  At which stage(s) is/are a visualization helpful?
+
*# R tour
* How to utilize geocoding attributes?
 
* How to utilize timestamp attributes?
 
  
3) Things to read
+
* Thursday 18 October
*  
+
*# Mic and Charlie at the board meeting, class reviewed R stuff
  
4) Things to do during the class
+
* Tuesday 23 October
*  
 
  
5) Questions
+
* Thursday 25 October
* Which parts of statistics do people need to know?
 
** correlation for PCA
 
* What linear algebra do people need to know?
 
** matrix operations for PCA
 
  
6) Tools
+
* Tuesday 30 October
* R under Linux/OSX
+
*# Reviewed gnuplot lab, distributed [[second-viz-project|Second Viz Project]]
  
7) Possible sources for data sets
+
== Notes ==
* John Iverson
+
[[fk-vizscidat-notes|Mic and Charlie's notes]]
** turtle birthing data
 
** phylogenetic reconstruction
 
* Mike Deibel
 
* Kathy Milar
 
* Meg Streepy
 
** GPlates - visualizing plate tectonics
 

Latest revision as of 11:12, 5 December 2012

Course Overview

Math/CS 484 -- The goal of our Ford/Knight project is to distill and organize the principles of visualizing large data sets. Modern science is often done by small groups of people that come from diverse backgrounds, e.g. a mathematician, a biologist, and a computer scientist. We plan to solicit input in the form of example data sets to work with from each of the natural and social science departments on campus. This work will provide a foundation for a course, or course module, which we hope to offer in the future. Must see instructor for registration.

Assignments

16) Course Reflection

In addition to the standard evaluation form please reflect for a bit and write-up a short bit that addresses these questions.

  • How did the course compare to your expectations of the course?
  • What did you find most interesting/useful? Least interesting/useful?
  • What do we need to "package" so that other students or faculty could gain from what we've done?
  • What's the best format for delivering this material? In-situ for a class or classes? 1 credit class, etc. On-demand sessions?

Please turn this in with your course evaluation form to the envelope in Bobbi's office before the end of the day on Tuesday 11 December.

15) Third Visualization Project

Find a story and build a visualization to support it. You may choose the data sets, although you must incorporate at least three. You can choose to analyze/visualize one variable over multiple data sets or multiple variables over multiple data sets, include geocoding or not, etc. Find the common thread(s) that tie your data sets together and tells the story you want to tell.

Work in pairs:

  • Mobeen and Ivan
  • Dee and Leif
  • Emily and Tristan
  • Ryan and Elena
  • Mikel and Alex

Use one or more of these toolchains:

  • R
  • gnuplot

Write-up a plan for your work, include a short description of the story you are telling, the specific data sets employed, and a sketch of the visualization. This is due in class on Tuesday 4 December. Please bring a printout of your plan to class. Come to class on Thursday 29 November with questions, ideas, etc.

The final visualization (PDF, etc. and script(s)) is due in class on Thursday 6 December. Come to class prepared to give a crisp (< 8 minute) presentation about your visualization. We will be advertising this class session to science students and faculty and encouraging them to attend by bribing them with free pizza.

14) Second Visualization Project Redux

Take the feedback you received on Tuesday morning and working with your partner improve your ice core data set visualization. Due in class on Thursday 15 November. Remember to upload your modified script and PDF, PNG, etc. to the wiki.

13) Second Visualization Project

Find a story and build a visualization to support it based on ice core data sets. There are many available, e.g. from multiple locations in Antarctica and other locations. These data sets typically contain depth, measurements of particulate matter, atmospheric chemical compositions, and various climate and date proxies. You can choose to analyze/visualize one variable over multiple locations or multiple variables over a single location. Include at least three dimensions, e.g. location on the earth, depth/date, and climate proxy, or depth/date, chemical marker, and climate proxy, etc.

Work in pairs:

  • Mobeen and Mikel
  • Dee and Leif
  • Emily and Tristan
  • Ryan and Elena
  • Ivan and Alex

Use one or more of these toolchains:

  • R
  • gnuplot

Write-up a plan for your work, include a short description of the story you are telling, the specific data sets employed, and a sketch of the visualization. This is due in class on Tuesday 6 November. Please bring a printout of your plan to class.

The final visualization (PDF, etc. and script(s)) is due in class on Tuesday 13 November. Come to class prepared to give a crisp (< 5 minute) presentation about your visualization.

12a) gnuplot Redux

Take the feedback you received on your gnuplot visualization and re-work it. Put the updated output and the script in the usual place appropriately labeled. Due before the start of class on Tuesday 6 November.

12) Getting Started with gnuplot

Due in class on Tuesday 30 October

  1. Identify 3 (or more) data sets that you can use to tell a story with an environmental theme.
  2. Develop your visualization using at least 25 unique commands in your gnuplot script.
  3. Use color, bonus points for 3D.
  4. Post your script and the output (PNG, JPG, etc.) on the student solutions wiki page /before class/ on Tuesday 30 October.
  5. Come to class on Tuesday prepared to give a < 5 minute crisp presentation about your visualization.
  6. You should know what your theme/data sets are by class on Thursday 25 October.

11) Science Magazine Review

Due in class on Thursday 25 October

  1. Browse the issue of Science that is on reserve for this class in Wildman. Find what you believe is a really well done viz, and a really poorly done one. Come to class prepared to give a short (< 5 minute) tour of the two of them explaining what they are, why they are good, and why they are bad.

10) Getting Started with R

Due in class on Thursday 18 October.

  1. First R lab - Post your first R visualizations /before 12p on Thursday/ to the student solutions page on the wiki, and then during class on Thursday you should briefly describe/discuss each in turn (a maximum of 5 minutes each). Make sure you watch the time so all of you have an opportunity to present your work.
  2. Explore, or re-explore as the case may be, the R galleries. Look at the scripts that produce the visualizations and figure-out how you might leverage some of those patterns.

9) Reading

Due in class on Tuesday 16 October.

  1. Chapters 1 and 2 in Designing Data Visualizations (previously assigned)
  2. Chapters 1 and 2 in Visualize This (previously assigned)
  3. Overview, Form and Structure, Process and Time in Visual Strategies (previously assigned)
  4. Part II (chapters 3, 4, 5, 6) in Designing Data Visualizations

8) First Visualization (redux)

Due in class on Tuesday 9 October. Use the feedback you received from the class and the professors to refine and improve your first visualization. Post the revised version using your placeholder on the Student Solutions page and bring a printout of it to class. Come to class prepared to give a crisp 4 minute before and after presentation to the class.

Finish the reading that was assigned earlier.

7) First Visualization

Due in class on Tuesday 2 October, both a printout and the visualization posted on the wiki. Come to class prepared to spend about 5 minutes presenting your viz to the class on Tuesday morning.

6) Plan for First Visualization

The write-up of the plan for your first visualization project is due in class on Tuesday 25 September. This should include:

  • The question you are going to answer or story you are going to tell
  • The data sets you will use (including URLs if available)
  • Any numerical summaries you will produce
  • A hand drawn draft of the visualization

To prepare for this you should read/watch the following items before you design your visualization or write-up your plan.

5) Second Critique Tour

4) First Critique Tour

This assignment is to be done in-class on Tuesday 11 September, 2012. In pairs review/critique one of these infographics from http://visual.ly/

  1. Human Languages on the Internet - Ivan, Mikel
  2. The Internet in 2015 - Leif, Dee
  3. Worldwide Internet Usage - Elena, Emily
  4. Technology and eCommerce - Tristan, Alex
  5. Responsive Web Design - Mobeen, Ryan

Each group should:

  • Evaluate the infographic using the criteria listed below.
  • Locate a second infographic, on Visual.ly or elsewhere, that covers roughly the same ground and evaluate it similarly.
  • Prepare and deliver a 4 minute presentation which summarizes your findings during the last portion of class this morning.

Consider the guidelines we are developing, Evaluating Infographics, as you examine the infographics.

3) First Workshop - Histograms

This assignment is designed to consolidate your knowledge with histograms and give you experience generating one with a modest data set. You must do the work by hand, you can optionally use a software tool to produce it as well. Make sure you document each step of your work. This workshop is due Thursday 13 September.

2) First Lab - Measuring the Real World

Measuring the real world, the PDF. This lab is due Sunday 9 September at 3p US-ET. Turn in a (BW) printout of your writeup and visualization, along with the URL of the on-line (color) version of the visualization if it is available. Put the paper copy in Charlie's Box A in the wooden tower in the Math/CS/Physics lounge on the West end of second floor of Dennis Hall at Earlham College in Richmond, IN, US (planet Earth).

1) First Reading and Tips and Techniques Tour

Listed below are the assignments for each chunk, note that everyone should read the startup materials.

  • Startup - Everyone
  • Web site - Leif
  • Making presentations - Mikel
  • News graphics - Ivan
  • Financial Data - Elena
  • Decision making - Emily
  • Narrative - Dee
  • Aesthetics - Tristan
  • Graphic design - Alex
  • Scientific and engineering - Mobeen
  • Animations - Ryan

As you read your chunks look for bits of guidance, advice, technique, etc. that you feel are useful. Summarize each of these in our Tips and Techniques Google Doc, make sure each entry contains an appropriate citation and follows the pattern/example at the top of the document. This tour is due Sunday 2 September.

Resources

Visualization Galleries (some with embedded tools, e.g. Many Eyes and Gapminder)

Data Sets

Advice and Technique

R

gnuplot

Course Specific

Bread Crumbs

  • Sunday 26 August (retrieve notes from board pictures)
    1. Relative error, absolute error, systematic error, and related topics
    2. Standard deviation
    3. Precision and accuracy
  • Thursday 30 August (harvest from Mic)
  • Tuesday 4 September (harvest notes from board picture)
    1. Histograms
  • Thursday 6 September
    1. Answered questions about first lab.
    2. Demonstrated how to upload files to the wiki, used for lab reports in PDF form.
  • Tuesday 11 September
    1. Discussion about when to aggregate, how many readings to take and related issues
    2. First critique tour (in-class)
  • Thursday 13 September
    1. Last of the first critique tour presentations
    2. Discuss next critique tour
  • Thursday 20 September
  • Tuesday 25 September
    1. In-class review and critique lab
  • Thursday 27 September
    1. Return and review first lab
    2. Q and A about first visualization project
  • Tuesday 2 October
    1. First visualization presentations
  • Thursday 4 October
    1. First visualization presentations (two stragglers)
  • Tuesday 9 October
    1. First visualization presentations (redux)
  • Tuesday 16 October
    1. R tour
  • Thursday 18 October
    1. Mic and Charlie at the board meeting, class reviewed R stuff
  • Tuesday 23 October
  • Thursday 25 October

Notes

Mic and Charlie's notes