Difference between revisions of "Leif-big-data"

Revision as of 14:25, 3 December 2011

Identifying and downloading the target data set
- This project uses Google Ngrams - 1gram (English) which can be downloaded from [1] 0-10 CSV files.
Data cleaning and pre-processing
- The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
Load the data into your Postgres instance
- I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
Develop queries to explore your ideas in the data
Develop and document the model function you are exploring in the data
Develop a visualization to show the model/patterns in the data

Revision as of 15:01, 2 December 2011 (view source) Lnulric09 (talk \| contribs) ← Older edit		Revision as of 14:25, 3 December 2011 (view source) Lnulric09 (talk \| contribs) Newer edit →
Line 1:		Line 1:
−	* Project title: ~~Influence of the Hippie Movement Bringing Indian Themes into Western Literature~~	+	* Project title: Stories in a Word
	* Project data set: Google Ngrams - 1gram (English)		* Project data set: Google Ngrams - 1gram (English)