Difference between revisions of "Mobeen-big-data"

From Earlham CS Department
Jump to navigation Jump to search
(Project Tasks)
Line 5: Line 5:
 
*Link to data set:  http://www.grouplens.org/node/12   
 
*Link to data set:  http://www.grouplens.org/node/12   
  
== Project Tasks ==
+
=== Project Tasks ===
=== 1. Identifying and downloading the target data set ===
+
==== 1. Identifying and downloading the target data set ====
 
*The downloaded data is on cluster at:  /cluster/home/mmludin08/Big-Data-M
 
*The downloaded data is on cluster at:  /cluster/home/mmludin08/Big-Data-M
  
=== 2. Data cleaning and per-processing ===
+
==== 2. Data cleaning and per-processing ====
 
* The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.  
 
* The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.  
  
=== 3. Load the data into your Postgres instance ===  
+
==== 3. Load the data into your Postgres instance ====
 
* After the cleaning the data was uploaded to cluster and laptop machine.  
 
* After the cleaning the data was uploaded to cluster and laptop machine.  
  
=== 4. Develop queries to explore your ideas in the data ===
+
==== 4. Develop queries to explore your ideas in the data ====
 
* SQL statements with results are on cluster:  /cluster/home/mmludin08/Big-Data-M  
 
* SQL statements with results are on cluster:  /cluster/home/mmludin08/Big-Data-M  
  
=== 5. Develop and document the model function you are exploring in the data  ===
+
==== 5. Develop and document the model function you are exploring in the data  ====
 
   
 
   
 
*For this project my aim was to discover the movie genres time line. In more words, I wanted to find out at what period of time people watch what type of movies. I also tried to look for the pattern  
 
*For this project my aim was to discover the movie genres time line. In more words, I wanted to find out at what period of time people watch what type of movies. I also tried to look for the pattern  
  
=== 6. Develop a visualization to show the model/patterns in the data  ===
+
==== 6. Develop a visualization to show the model/patterns in the data  ====
  
 
===== Tech Details =====
 
===== Tech Details =====

Revision as of 08:38, 14 December 2011

Project title: MovieLens Data Sets

Project data set

  • This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens.
  • Link to data set: http://www.grouplens.org/node/12

Project Tasks

1. Identifying and downloading the target data set

  • The downloaded data is on cluster at: /cluster/home/mmludin08/Big-Data-M

2. Data cleaning and per-processing

  • The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.

3. Load the data into your Postgres instance

  • After the cleaning the data was uploaded to cluster and laptop machine.

4. Develop queries to explore your ideas in the data

  • SQL statements with results are on cluster: /cluster/home/mmludin08/Big-Data-M

5. Develop and document the model function you are exploring in the data

  • For this project my aim was to discover the movie genres time line. In more words, I wanted to find out at what period of time people watch what type of movies. I also tried to look for the pattern

6. Develop a visualization to show the model/patterns in the data

Tech Details
  • Node: as7
  • Path to storage space: /scratch/big-data/mobeen
Results
  • The visualization(s)
  • The story