Difference between revisions of "Leif-big-data"
Jump to navigation
Jump to search
Line 5: | Line 5: | ||
*Identifying and downloading the target data set | *Identifying and downloading the target data set | ||
**This project uses Google Ngrams - 1gram (English) which can be downloaded from [http://books.google.com/ngrams/datasets] 0-10 CSV files. | **This project uses Google Ngrams - 1gram (English) which can be downloaded from [http://books.google.com/ngrams/datasets] 0-10 CSV files. | ||
− | + | *Data cleaning and pre-processing | |
− | The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv | + | **The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv |
− | + | *Load the data into your Postgres instance | |
− | I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables. | + | **I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables. |
− | + | *Develop queries to explore your ideas in the data | |
− | + | *Develop and document the model function you are exploring in the data | |
− | + | *Develop a visualization to show the model/patterns in the data | |
===== Tech Details ===== | ===== Tech Details ===== |
Revision as of 13:49, 2 December 2011
- Project title: Influence of the Hippie Movement Bringing Indian Themes into Western Literature
- Project data set: Google Ngrams - 1gram (English)
Project Tasks
- Identifying and downloading the target data set
- This project uses Google Ngrams - 1gram (English) which can be downloaded from [1] 0-10 CSV files.
- Data cleaning and pre-processing
- The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
- Load the data into your Postgres instance
- I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
- Develop queries to explore your ideas in the data
- Develop and document the model function you are exploring in the data
- Develop a visualization to show the model/patterns in the data
Tech Details
- Node: as6
- Path to storage space: /scratch/big-data/leif
Results
- The visualization(s)
- The story