Difference between revisions of "Mobeen-big-data"

Revision as of 13:57, 2 December 2011

This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens.
Link to data set: http://www.grouplens.org/node/12

The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.

@@ Line 1: / Line 1: @@
-* Project title
+* Project title:
-* Project data set
+==MovieLens Data Sets ==
+== Project data set ==
+*This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens.
+*Link to data set:  http://www.grouplens.org/node/12
 ===== Project Tasks =====
 #Identifying and downloading the target data set
+*The downloaded data is on cluster at:  /cluster/home/mmludin08/Big-Data-M
 #Data cleaning and pre-processing
+* The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.
 #Load the data into your Postgres instance
+* After the cleaning the data was uploaded to cluster and laptop machine.
 #Develop queries to explore your ideas in the data
+* SQL statements with results are on cluster:  /cluster/home/mmludin08/Big-Data-M
 #Develop and document the model function you are exploring in the data
 #Develop a visualization to show the model/patterns in the data