Difference between revisions of "Leif-big-data"

Latest revision as of 12:43, 12 December 2011

Project title: Stories in Words
Project data set: Google Ngrams - 1gram (English)

Project Tasks

Identifying and downloading the target data set
- This project uses Google Ngrams - 1gram (English) which can be downloaded from Google Books at [1] 0-10 CSV files.
Data cleaning and pre-processing
- The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
Load the data into your Postgres instance
- I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
Develop queries to explore your ideas in the data
- I wrote a script to fish my database for the data I specify and that is included in my shared directory
Develop and document the model function you are exploring in the data
- Exploring what stories I can say about graphing key words
Develop a visualization to show the model/patterns in the data
- I have included a keynote presentation in my public directory

Tech Details

Node: as6
Path to storage space: local machine
Path to project files: ~lnulric09/public/big_data/

Results

The visualization(s)
The story

@@ Line 1: / Line 1: @@
-* Project title
+* Project title: Stories in Words
-* Project data set
+* Project data set: Google Ngrams - 1gram (English)
 ===== Project Tasks =====
 #Identifying and downloading the target data set
-#Data cleaning and pre-processing
+#*This project uses Google Ngrams - 1gram (English) which can be downloaded from Google Books at [http://books.google.com/ngrams/datasets] 0-10 CSV files.
-#Load the data into your Postgres instance
+#Data cleaning and pre-processing
-#Develop queries to explore your ideas in the data
+#*The raw CSV file values are separated by TABS so I had to use a script to replace TABS with COMMAS as follows: tr '\t' ',' <input_file.csv>output_file.csv
+#Load the data into your Postgres instance
+#*I used a script which when piped into postgres drops existing tables, creates the tables, copies the data in, and then indexes the tables.
+#Develop queries to explore your ideas in the data
+#* I wrote a script to fish my database for the data I specify and that is included in my shared directory
 #Develop and document the model function you are exploring in the data
+#* Exploring what stories I can say about graphing key words
 #Develop a visualization to show the model/patterns in the data
+#* I have included a keynote presentation in my public directory
 ===== Tech Details =====
 * Node: as6
-* Path to storage space: /scratch/big-data/leif
+* Path to storage space: local machine
+* Path to project files: ~lnulric09/public/big_data/
 ===== Results =====
 * The visualization(s)
 * The story

Difference between revisions of "Leif-big-data"

Latest revision as of 12:43, 12 December 2011

Project Tasks

Tech Details

Results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

websites

wiki

applied groups

Tools