Leif-big-data

Identifying and downloading the target data set
- This project uses Google Ngrams - 1gram (English) which can be downloaded from Google Books at [1] 0-10 CSV files.
Data cleaning and pre-processing
- The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
Load the data into your Postgres instance
- I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
Develop queries to explore your ideas in the data
- I wrote a script to fish my database for the data I specify and that is included in my shared directory
Develop and document the model function you are exploring in the data
- Exploring what stories I can say about graphing key words
Develop a visualization to show the model/patterns in the data
- I have included a keynote presentation in my public directory

Navigation menu