Difference between revisions of "Leif-big-data"

From Earlham CS Department
Jump to navigation Jump to search
Line 3: Line 3:
  
 
===== Project Tasks =====
 
===== Project Tasks =====
#Identifying and downloading the target data set
+
*Identifying and downloading the target data set
This project uses Google Ngrams - 1gram (English) which can be downloaded from [http://books.google.com/ngrams/datasets] 0-10 CSV files.
+
**This project uses Google Ngrams - 1gram (English) which can be downloaded from [http://books.google.com/ngrams/datasets] 0-10 CSV files.
 
#Data cleaning and pre-processing
 
#Data cleaning and pre-processing
 
The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv  
 
The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv  

Revision as of 13:49, 2 December 2011

  • Project title: Influence of the Hippie Movement Bringing Indian Themes into Western Literature
  • Project data set: Google Ngrams - 1gram (English)
Project Tasks
  • Identifying and downloading the target data set
    • This project uses Google Ngrams - 1gram (English) which can be downloaded from [1] 0-10 CSV files.
  1. Data cleaning and pre-processing

The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv

  1. Load the data into your Postgres instance

I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.

  1. Develop queries to explore your ideas in the data
  2. Develop and document the model function you are exploring in the data
  3. Develop a visualization to show the model/patterns in the data
Tech Details
  • Node: as6
  • Path to storage space: /scratch/big-data/leif
Results
  • The visualization(s)
  • The story