Difference between revisions of "Mobeen-big-data"

From Earlham CS Department
Jump to navigation Jump to search
Line 1: Line 1:
* Project title  
+
* Project title:
* Project data set  
+
==MovieLens Data Sets ==
 +
 
 +
== Project data set ==
 +
*This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens.
 +
*Link to data set:  http://www.grouplens.org/node/12 
  
 
===== Project Tasks =====
 
===== Project Tasks =====
 
#Identifying and downloading the target data set
 
#Identifying and downloading the target data set
 +
*The downloaded data is on cluster at:  /cluster/home/mmludin08/Big-Data-M
 +
 
#Data cleaning and pre-processing  
 
#Data cleaning and pre-processing  
 +
* The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.
 +
 
#Load the data into your Postgres instance  
 
#Load the data into your Postgres instance  
 +
* After the cleaning the data was uploaded to cluster and laptop machine.
 +
 
#Develop queries to explore your ideas in the data  
 
#Develop queries to explore your ideas in the data  
 +
* SQL statements with results are on cluster:  /cluster/home/mmludin08/Big-Data-M
 +
 
#Develop and document the model function you are exploring in the data
 
#Develop and document the model function you are exploring in the data
 
#Develop a visualization to show the model/patterns in the data
 
#Develop a visualization to show the model/patterns in the data

Revision as of 12:57, 2 December 2011

  • Project title:

MovieLens Data Sets

Project data set

  • This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens.
  • Link to data set: http://www.grouplens.org/node/12
Project Tasks
  1. Identifying and downloading the target data set
  • The downloaded data is on cluster at: /cluster/home/mmludin08/Big-Data-M
  1. Data cleaning and pre-processing
  • The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.
  1. Load the data into your Postgres instance
  • After the cleaning the data was uploaded to cluster and laptop machine.
  1. Develop queries to explore your ideas in the data
  • SQL statements with results are on cluster: /cluster/home/mmludin08/Big-Data-M
  1. Develop and document the model function you are exploring in the data
  2. Develop a visualization to show the model/patterns in the data
Tech Details
  • Node: as7
  • Path to storage space: /scratch/big-data/mobeen
Results
  • The visualization(s)
  • The story