Difference between revisions of "Leif-big-data"
Jump to navigation
Jump to search
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | * Project title: Stories in | + | * Project title: Stories in Words |
* Project data set: Google Ngrams - 1gram (English) | * Project data set: Google Ngrams - 1gram (English) | ||
===== Project Tasks ===== | ===== Project Tasks ===== | ||
#Identifying and downloading the target data set | #Identifying and downloading the target data set | ||
− | #*This project uses Google Ngrams - 1gram (English) which can be downloaded from [http://books.google.com/ngrams/datasets] 0-10 CSV files. | + | #*This project uses Google Ngrams - 1gram (English) which can be downloaded from Google Books at [http://books.google.com/ngrams/datasets] 0-10 CSV files. |
#Data cleaning and pre-processing | #Data cleaning and pre-processing | ||
#*The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv | #*The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv | ||
Line 10: | Line 10: | ||
#*I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables. | #*I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables. | ||
#Develop queries to explore your ideas in the data | #Develop queries to explore your ideas in the data | ||
+ | #* I wrote a script to fish my database for the data I specify and that is included in my shared directory | ||
#Develop and document the model function you are exploring in the data | #Develop and document the model function you are exploring in the data | ||
+ | #* Exploring what stories I can say about graphing key words | ||
#Develop a visualization to show the model/patterns in the data | #Develop a visualization to show the model/patterns in the data | ||
+ | #* I have included a keynote presentation in my public directory | ||
===== Tech Details ===== | ===== Tech Details ===== | ||
* Node: as6 | * Node: as6 | ||
* Path to storage space: local machine | * Path to storage space: local machine | ||
− | * Path to | + | * Path to project files: ~lnulric09/public/big_data/ |
− | |||
− | |||
− | |||
− | |||
===== Results ===== | ===== Results ===== | ||
* The visualization(s) | * The visualization(s) | ||
* The story | * The story |
Latest revision as of 11:43, 12 December 2011
- Project title: Stories in Words
- Project data set: Google Ngrams - 1gram (English)
Project Tasks
- Identifying and downloading the target data set
- This project uses Google Ngrams - 1gram (English) which can be downloaded from Google Books at [1] 0-10 CSV files.
- Data cleaning and pre-processing
- The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
- Load the data into your Postgres instance
- I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
- Develop queries to explore your ideas in the data
- I wrote a script to fish my database for the data I specify and that is included in my shared directory
- Develop and document the model function you are exploring in the data
- Exploring what stories I can say about graphing key words
- Develop a visualization to show the model/patterns in the data
- I have included a keynote presentation in my public directory
Tech Details
- Node: as6
- Path to storage space: local machine
- Path to project files: ~lnulric09/public/big_data/
Results
- The visualization(s)
- The story