Visualizations

From Earlham CS Department
Revision as of 16:23, 25 August 2012 by Elena (talk | contribs) (First Reading and Making Visualizations Tour)
Jump to navigation Jump to search

First Reading and Making Visualizations Tour

Listed below are the assignments for each chunk, note that everyone should read the startup materials.

  • Startup - Everyone
  • Web site - Leif
  • Making presentations - Mikel
  • News graphics - Ivan
  • Financial data - Elena
     •	Avoid visual distortion in data graphics  (“The Visual Display of Quantitative Information”, Chapter 2)
        o table - best way to show numbers (20 numbers or less>prefer table to a graph) (p.56)
        o representation of numbers should be directly proportional to the numerical quantities represented (p.56)
        o clear, detailed labeling to defeat graphical distortion and ambiguity (p.56)
        o show data variation, not design variation (p.61)
     •	Avoid inaccurate reflection of reality (“The Visual Display of Quantitative Information”, Chapter 2)
       o if plotting government spending and dept over the years, take population and inflation into account (p.68)
       o in time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units (p.68)
     •	A use of 2 or 3 varying dimensions to show one-dimensional data is a weak and inefficient technique: the number of information should not exceed the number of dimensions in the data (p.71)
     •	Context is essential for graphical integrity (“The Visual Display of Quantitative Information”, Chapter 2)
       o in the quantitative thinking data graphics should answer “Compared to What?” question (p.74)
       o graphics must not quote data out of context (p.74)
     •	Multiples help to monitor and analyse typical to finance multi-variable processes, combining overview with detail (“Visual Explanations”, p.110-111)
       o Blending quantitative multiples, narrative text and images is useful for monitoring data-rich processes (p.110)
     •	Consider sparklines: datawords – data-intense, data-simple, word-sized graphics (“Beautiful Evidence”, p.46-63)
       o Tracks and compares changes over time, by showing overall trend along with local detail (p.50)
       o Should often be embedded in text and tables : possibility of writing with data graphics(p.49)
       o Daily sparkline data can be standardized and scaled depending on the content: by the range of price, inflation-adjusted price, percent change off of a market baseline (p.50)
       o Shows recent change in relation to many pat changes, sparklines gives a context for nuanced analysis and better decisions (p.50)
       o efficiently displays and narrate binary data – presence/absence, occurrence/non-occurrence, win/loss) (p.54)
       o Shows intensity/frequency of occurrence (p.55)
       o Improves conventional statistical graphics within univariate and bivariate marginal distributions, as the univariate sparklines link up 2-D plots(p.57)
    •	When designing sparkline think about (“Beautiful Evidence”, p.46-63):
       o Variation in slopes are best detected when slopes are 45 average (p.60)
       o Moderately greater in length than in height (p.60)
       o Using the maximum reasonable vertical space available under the word-like constraint, then adjust the horizontal stretch of the time-scale to meet the lumpy criterion (p.60)
       o Contextual methods for quantifying sparklines – choose encoding (p.61)
       o Changing the relative weight of the data-lines and also muting the contrast between the data and background to reduce optical noise (p.62)
       o Printing in single color/ judicious mix of 2/ flat color/ stochastic color methods to avoid color dots (p.62)
       o Avoiding strong frames around that create unintentional optical clutter (p.62)
       o Resolution of sparklines is better on the paper than on the computer screens (p.63)
       o printing and viewing data density of 500 spraklines on A3 size paper (about 25X45 cm, or 11X17 in) – adjacent in space result, which assists comparison, search, patter-finding, exploration, replication, review (p.63)
  • Decision making - Emily
  • Narrative - Dee
  • Aesthetics - Tristan
  • Graphic design - Alex
  • Scientific and engineering - Mobeen
  • Animations - Ryan

As you read your chunks look for bits of guidance, advice, technique, etc. that you feel are useful. Summarize each of these in our Making Visualizations page, make sure each entry contains an appropriate citation.

First Lab (DRAFT)

Measuring the real world.

Overview

Math/CS 484 -- The goal of our Ford/Knight project is to distill and organize the principles of visualizing large data sets. Modern science is often done by small groups of people that come from diverse backgrounds, e.g. a mathematician, a biologist, and a computer scientist. We plan to solicit input in the form of example data sets to work with from each of the natural and social science departments on campus. This work will provide a foundation for a course, or course module, which we hope to offer in the future. Must see instructor for registration.

Course Schedule (DRAFT)

  • Week 1 -- Visualization Basics
  1. lab on data collection
  2. begin work on course products
    1. guide -- do's and don'ts for good infographics
    2. transferable vignettes
    3.  ??
  • Week 2 -- Visualization Basics
  1. lab on turning reports into data into information
  2. continue work on course products
  • Week 3 -- Exploratory Data Analysis
  1. lab on EDA -- numerical and graphical summaries
  2. continue work on course products
  • Week 4 -- Exploratory Data Analysis
  1. lab on EDA
  2. continue work on course products
  • Week 5 -- Visualization Tools (notice the links below)
  1. Tools assignment -- low tech, high tech
  2. continue work on course products
  • Week 6 -- Visualization Tools
  1. Tools assignment -- critical reviews of existing visualizations
  2. continue work on course products
  • Week 7 -- Visualization Tools
  1. Tools assignment
  2. continue work on course products
  • Week 8 -- Visualization Tools
  1. Tools assignment
  2. continue work on course products
  • Week 9 -- Projects
  1. Projects assignment
  2. continue work on course products
  • Week 10 -- Projects
  1. Projects assignment -- documenting choices and assumptions
  2. continue work on course products
  • Week 11 -- Projects
  1. Projects assignment
  2. continue work on course products
  • Week 12 -- Projects
  1. Projects assignment
  2. continue work on course products
  • Week 13 -- Projects
  1. Projects assignment
  2. continue work on course products
  • Week 14 -- Projects
  1. Projects assignment
  2. continue work on course products
  • Week 15 -- Projects
  1. Projects presentation
  2. complete work on course products

Short-term To Do List

  1. Figure-out books for the library to purchase, probably put them on reserve through the fall (charlie)
  2. Look at on-line courses in this area (mic)

Examples

Press

NPR did a couple of interesting segments on Big Data, visualizations, and the search of mathematicians and others who can do that stuff. (December, 2011)

New York Times article from December, 2011 on bioinformatics and visualization, MicJ

Other

Presentations

Keywords

  • infographics
  • Big data
  • work flow(s)

The People

  • Mic Jackson, Mathematics & Environmental Science
  • Charlie Peck, Computer Science
  1. Diana Ainembabazi
  2. Ivan Babic
  3. Leif DeJong
  4. Ryan Lake
  5. Mobeen Ludin
  6. Emily Pavlovic
  7. Mikel Qafa
  8. Alex Reid
  9. Elena Sergienko
  10. Tristan Wright

Tools

Topics

  1. Long-term turtle size, sex, age, climate by year from Western Nebraska (JohnI)
    • Von Bertalanthy (sp) growth model, special case of Fisher models?
  2. Long-term iguana size, sex, age, climate (8 years only) from Bahamas (Exumas island) (JohnI)
    • Von Bertalanthy (sp) growth model, special case of Fisher models?
  3. Why do turtles lay the number, size, type and frequency of eggs that they do?
    • What are the common patterns?
    • Which dimensions aren't accounted for?
      • Latitude and longitude?
      • Habitat?
      • Phylogeny?
      • Climate?
      • What other data sets are available?
  4. How to distinguish between variations within a species vs different species
    • Standardized morphometric data (AOT moristic data, e.g. counts of number of scales between body parts), size standardized
    • Currently using multivariate statistics, about 25 variables
    • Looking for one image with all populations and variables
    • Looking for structure
  5. Phylogenetic reconstruction, visualizing trees with multiple models (JohnI)

Techniques

  1. Principle component analysis
  2. Discriminate function analysis
  3. Data conditioning and translation, CSV and XML
  4. Gridded and non-gridded data
  5. Ideas that Michael suggested

Sources

  1. Mic's books
  2. Charlie's books
  3. Dave's viz workshop at Kean
  4. Web sources

Schedule

  • Looking for 2-3 hours of meeting time, possibly one shorter and one longer
  • Noon on Monday, Thursday, or Friday
  • 4p-7p Monday, Wednesday, Thursday, Friday (modulo sport practice)

The Plan

1) Planning items

  • Are there any field trip opportunities?
  • Figure-out what books to order
  • Figure-out what are the likely conference opportunities?
  • Are there any other tools besides R that we should be considering?
    • GRASS?

2) Things to learn

  • Is there a somewhat canonical process or technique that one can reliably apply to go from readings -> data -> information? At which stage(s) is/are a visualization helpful?
  • How to utilize geocoding attributes?
  • How to utilize timestamp attributes?

3) Things to read

4) Things to do during the class

5) Questions

  • Which parts of statistics do people need to know?
    • correlation for PCA
  • What linear algebra do people need to know?
    • matrix operations for PCA

6) Tools

  • R under Linux/OSX

7) Possible sources for data sets

  • John Iverson
    • turtle birthing data
    • phylogenetic reconstruction
  • Mike Deibel
  • Kathy Milar
  • Meg Streepy
    • GPlates - visualizing plate tectonics