Difference between revisions of "Leif-big-data"

Latest revision as of 11:43, 12 December 2011

Project title: Stories in Words
Project data set: Google Ngrams - 1gram (English)

Project Tasks

Identifying and downloading the target data set
- This project uses Google Ngrams - 1gram (English) which can be downloaded from Google Books at [1] 0-10 CSV files.
Data cleaning and pre-processing
- The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
Load the data into your Postgres instance
- I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
Develop queries to explore your ideas in the data
- I wrote a script to fish my database for the data I specify and that is included in my shared directory
Develop and document the model function you are exploring in the data
- Exploring what stories I can say about graphing key words
Develop a visualization to show the model/patterns in the data
- I have included a keynote presentation in my public directory

Tech Details

Node: as6
Path to storage space: local machine
Path to project files: ~lnulric09/public/big_data/

Results

The visualization(s)
The story

@@ Line 1: / Line 1: @@
-* Project title: Stories in a Word
+* Project title: Stories in Words
 * Project data set: Google Ngrams - 1gram (English)
 ===== Project Tasks =====
 #Identifying and downloading the target data set
-#*This project uses Google Ngrams - 1gram (English) which can be downloaded from [http://books.google.com/ngrams/datasets] 0-10 CSV files.
+#*This project uses Google Ngrams - 1gram (English) which can be downloaded from Google Books at [http://books.google.com/ngrams/datasets] 0-10 CSV files.
 #Data cleaning and pre-processing
 #*The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
@@ Line 10: / Line 10: @@
 #*I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
 #Develop queries to explore your ideas in the data
+#* I wrote a script to fish my database for the data I specify and that is included in my shared directory
 #Develop and document the model function you are exploring in the data
+#* Exploring what stories I can say about graphing key words
 #Develop a visualization to show the model/patterns in the data
+#* I have included a keynote presentation in my public directory
 ===== Tech Details =====
 * Node: as6
 * Path to storage space: local machine
-* Path to Data Setup Script
+* Path to project files: ~lnulric09/public/big_data/
-* Path to Data Extraction Script
-* Path to GNUPLOT Script
-* Path to Raw Extracted Data
-* Path to Presentation
 ===== Results =====
 * The visualization(s)
 * The story

Difference between revisions of "Leif-big-data"

Latest revision as of 11:43, 12 December 2011

Project Tasks

Tech Details

Results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

websites

wiki

applied groups

Tools