Difference between revisions of "Mobeen-big-data"
Jump to navigation
Jump to search
Line 1: | Line 1: | ||
− | * Project title | + | * Project title: |
− | * | + | ==MovieLens Data Sets == |
+ | |||
+ | == Project data set == | ||
+ | *This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens. | ||
+ | *Link to data set: http://www.grouplens.org/node/12 | ||
===== Project Tasks ===== | ===== Project Tasks ===== | ||
#Identifying and downloading the target data set | #Identifying and downloading the target data set | ||
+ | *The downloaded data is on cluster at: /cluster/home/mmludin08/Big-Data-M | ||
+ | |||
#Data cleaning and pre-processing | #Data cleaning and pre-processing | ||
+ | * The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data. | ||
+ | |||
#Load the data into your Postgres instance | #Load the data into your Postgres instance | ||
+ | * After the cleaning the data was uploaded to cluster and laptop machine. | ||
+ | |||
#Develop queries to explore your ideas in the data | #Develop queries to explore your ideas in the data | ||
+ | * SQL statements with results are on cluster: /cluster/home/mmludin08/Big-Data-M | ||
+ | |||
#Develop and document the model function you are exploring in the data | #Develop and document the model function you are exploring in the data | ||
#Develop a visualization to show the model/patterns in the data | #Develop a visualization to show the model/patterns in the data |
Revision as of 12:57, 2 December 2011
- Project title:
MovieLens Data Sets
Project data set
- This data set contains 10000054 ratings and 95580 tags applied to 10681 movies by 71567 users of the online movie recommender service MovieLens.
- Link to data set: http://www.grouplens.org/node/12
Project Tasks
- Identifying and downloading the target data set
- The downloaded data is on cluster at: /cluster/home/mmludin08/Big-Data-M
- Data cleaning and pre-processing
- The original data was in the .dat format. one perl script and a python script was written to change the formate and clean the data.
- Load the data into your Postgres instance
- After the cleaning the data was uploaded to cluster and laptop machine.
- Develop queries to explore your ideas in the data
- SQL statements with results are on cluster: /cluster/home/mmludin08/Big-Data-M
- Develop and document the model function you are exploring in the data
- Develop a visualization to show the model/patterns in the data
Tech Details
- Node: as7
- Path to storage space: /scratch/big-data/mobeen
Results
- The visualization(s)
- The story