Difference between revisions of "Mobeen-big-data"

Revision as of 08:38, 14 December 2011

This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens.
Link to data set: http://www.grouplens.org/node/12

The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.

For this project my aim was to discover the movie genres time line. In more words, I wanted to find out at what period of time people watch what type of movies. I also tried to look for the pattern

@@ Line 5: / Line 5: @@
 *Link to data set:  http://www.grouplens.org/node/12
-== Project Tasks ==
+=== Project Tasks ===
-=== 1. Identifying and downloading the target data set ===
+==== 1. Identifying and downloading the target data set ====
 *The downloaded data is on cluster at:  /cluster/home/mmludin08/Big-Data-M
-=== 2. Data cleaning and per-processing ===
+==== 2. Data cleaning and per-processing ====
 * The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.
-=== 3. Load the data into your Postgres instance ===
+==== 3. Load the data into your Postgres instance ====
 * After the cleaning the data was uploaded to cluster and laptop machine.
-=== 4. Develop queries to explore your ideas in the data ===
+==== 4. Develop queries to explore your ideas in the data ====
 * SQL statements with results are on cluster:  /cluster/home/mmludin08/Big-Data-M
-=== 5. Develop and document the model function you are exploring in the data  ===
+==== 5. Develop and document the model function you are exploring in the data  ====
 *For this project my aim was to discover the movie genres time line. In more words, I wanted to find out at what period of time people watch what type of movies. I also tried to look for the pattern
-=== 6. Develop a visualization to show the model/patterns in the data  ===
+==== 6. Develop a visualization to show the model/patterns in the data  ====
 ===== Tech Details =====