Difference between revisions of "Tristan-big-data"

From Earlham CS Department
Jump to navigation Jump to search
(Project Tasks)
(Project Tasks)
Line 4: Line 4:
 
===== Project Tasks =====
 
===== Project Tasks =====
 
-Identifying and downloading the target data set
 
-Identifying and downloading the target data set
::The WCA Dataset was easily downloaded as a set of SQL inserts. The file can be downloaded from [http://worldcubeassociation.org/results/misc/export.html here].
+
:The WCA Dataset was easily downloaded as a set of SQL inserts. The file can be downloaded from [http://worldcubeassociation.org/results/misc/export.html here].
  
 
-Data cleaning and pre-processing  
 
-Data cleaning and pre-processing  
::The issue was that the .sql file was in MS-SQL or OracleSQL, so some mass modifications to the file had to be made. Primarily it was with changing smallint(n) to int, and `tablename` without the `.
+
:The issue was that the .sql file was in MS-SQL or OracleSQL, so some mass modifications to the file had to be made. Primarily it was with changing smallint(n) to int, and `tablename` without the `.
  
 
-Load the data into your Postgres instance  
 
-Load the data into your Postgres instance  
::It took a few times to get everything from the script all working, but the script was successfully run on my directory on BigFe.
+
:It took a few times to get everything from the script all working, but the script was successfully run on my directory on BigFe.
  
 
#Develop queries to explore your ideas in the data  
 
#Develop queries to explore your ideas in the data  
 +
:
 +
 
#Develop and document the model function you are exploring in the data
 
#Develop and document the model function you are exploring in the data
 +
 
#Develop a visualization to show the model/patterns in the data
 
#Develop a visualization to show the model/patterns in the data
  

Revision as of 10:31, 3 December 2011

  • Examining Trends in a Performance Sport
  • Data set: WCA Database
Project Tasks

-Identifying and downloading the target data set

The WCA Dataset was easily downloaded as a set of SQL inserts. The file can be downloaded from here.

-Data cleaning and pre-processing

The issue was that the .sql file was in MS-SQL or OracleSQL, so some mass modifications to the file had to be made. Primarily it was with changing smallint(n) to int, and `tablename` without the `.

-Load the data into your Postgres instance

It took a few times to get everything from the script all working, but the script was successfully run on my directory on BigFe.
  1. Develop queries to explore your ideas in the data
  1. Develop and document the model function you are exploring in the data
  1. Develop a visualization to show the model/patterns in the data
Tech Details
  • Node: as3
  • Path to storage space: /scratch/big-data/tristan
Results
  • The visualization(s)
  • The story