Difference between revisions of "Leif-big-data"

From Earlham CS Department
Jump to navigation Jump to search
Line 5: Line 5:
 
*Identifying and downloading the target data set
 
*Identifying and downloading the target data set
 
**This project uses Google Ngrams - 1gram (English) which can be downloaded from [http://books.google.com/ngrams/datasets] 0-10 CSV files.
 
**This project uses Google Ngrams - 1gram (English) which can be downloaded from [http://books.google.com/ngrams/datasets] 0-10 CSV files.
#Data cleaning and pre-processing
+
*Data cleaning and pre-processing
The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv  
+
**The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv  
#Load the data into your Postgres instance
+
*Load the data into your Postgres instance
I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.  
+
**I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.  
#Develop queries to explore your ideas in the data
+
*Develop queries to explore your ideas in the data
#Develop and document the model function you are exploring in the data
+
*Develop and document the model function you are exploring in the data
#Develop a visualization to show the model/patterns in the data
+
*Develop a visualization to show the model/patterns in the data
  
 
===== Tech Details =====
 
===== Tech Details =====

Revision as of 14:49, 2 December 2011

  • Project title: Influence of the Hippie Movement Bringing Indian Themes into Western Literature
  • Project data set: Google Ngrams - 1gram (English)
Project Tasks
  • Identifying and downloading the target data set
    • This project uses Google Ngrams - 1gram (English) which can be downloaded from [1] 0-10 CSV files.
  • Data cleaning and pre-processing
    • The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
  • Load the data into your Postgres instance
    • I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
  • Develop queries to explore your ideas in the data
  • Develop and document the model function you are exploring in the data
  • Develop a visualization to show the model/patterns in the data
Tech Details
  • Node: as6
  • Path to storage space: /scratch/big-data/leif
Results
  • The visualization(s)
  • The story