Difference between revisions of "Tristan-big-data"
Jump to navigation
Jump to search
(→Project Tasks) |
|||
Line 1: | Line 1: | ||
* Examining Trends in a Performance Sport | * Examining Trends in a Performance Sport | ||
* Data set: WCA Database | * Data set: WCA Database | ||
+ | |||
+ | =====Question===== | ||
+ | :What can we discover about how people improve in a field over time? To explore this I looked through the World Cubing Association database. A database with tens of thousands of Rubik's cube solves for thousands of people. | ||
===== Project Tasks ===== | ===== Project Tasks ===== | ||
Line 13: | Line 16: | ||
#Develop queries to explore your ideas in the data | #Develop queries to explore your ideas in the data | ||
− | : | + | :People with more than 100 3x3 averages of 5. |
+ | <syntaxhighlight lang="sql"> | ||
+ | SELECT personname, count(average) FROM results | ||
+ | WHERE eventid = '333' GROUP BY personname HAVING count(average) > 100 | ||
+ | ORDER BY count(average); | ||
+ | }</syntaxhighlight> | ||
#Develop and document the model function you are exploring in the data | #Develop and document the model function you are exploring in the data |
Revision as of 11:08, 3 December 2011
- Examining Trends in a Performance Sport
- Data set: WCA Database
Contents
Question
- What can we discover about how people improve in a field over time? To explore this I looked through the World Cubing Association database. A database with tens of thousands of Rubik's cube solves for thousands of people.
Project Tasks
-Identifying and downloading the target data set
- The WCA Dataset was easily downloaded as a set of SQL inserts. The file can be downloaded from here.
-Data cleaning and pre-processing
- The issue was that the .sql file was in MS-SQL or OracleSQL, so some mass modifications to the file had to be made. Primarily it was with changing smallint(n) to int, and `tablename` without the `.
-Load the data into your Postgres instance
- It took a few times to get everything from the script all working, but the script was successfully run on my directory on BigFe.
- Develop queries to explore your ideas in the data
- People with more than 100 3x3 averages of 5.
<syntaxhighlight lang="sql"> SELECT personname, count(average) FROM results WHERE eventid = '333' GROUP BY personname HAVING count(average) > 100 ORDER BY count(average); }</syntaxhighlight>
- Develop and document the model function you are exploring in the data
- Develop a visualization to show the model/patterns in the data
Tech Details
- Node: as3
- Path to storage space: /scratch/big-data/tristan
Results
- The visualization(s)
- The story