Mobeen-big-data

From Earlham CS Department

Revision as of 08:32, 14 December 2011 by Mmludin08 (talk | contribs) (→‎Project data set)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Project title:

Contents

1 MovieLens Data Sets
2 Project data set
- 2.1 Project Tasks
3 #Identifying and downloading the target data set
4 #Data cleaning and per-processing
- 4.1 Tech Details
- 4.2 Results

MovieLens Data Sets

Project data set

This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens.
Link to data set: http://www.grouplens.org/node/12

Project Tasks

#Identifying and downloading the target data set

The downloaded data is on cluster at: /cluster/home/mmludin08/Big-Data-M

#Data cleaning and per-processing

The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.

Load the data into your Postgres instance

After the cleaning the data was uploaded to cluster and laptop machine.

Develop queries to explore your ideas in the data

SQL statements with results are on cluster: /cluster/home/mmludin08/Big-Data-M

Develop and document the model function you are exploring in the data

For this project my aim was to discover the movie genres time line. In more words, I wanted to find out at what period of time people watch what type of movies. I also tried to look for the pattern

Develop a visualization to show the model/patterns in the data

Tech Details

Node: as7
Path to storage space: /scratch/big-data/mobeen

Results

The visualization(s)
The story

Retrieved from "https://wiki.cs.earlham.edu/index.php?title=Mobeen-big-data&oldid=12587"