Difference between revisions of "Elena-big-data"

From Earlham CS Department
Jump to navigation Jump to search
Line 1: Line 1:
*Title: '''Stereotypes and Discrimination Through Statistics'''
+
*Title: '''Stereotypes Through Statistics'''
*Dataset used: A Profile of Immigrant Population in the 21st century in OECD Countries  
+
*Dataset used: A Profile of Immigrant Population in the 21st century in OECD Countries
 +
*Aims and Ideas: Having a dataset about immigrants' population, I had a chance to create different population profiles, with the aim to verify and/or disprove certain stereotypical knowledge about the immigrants, as well as different nations. This includes me looking at occupations, countries of birth and labour force status.
 +
*Complications: Unfortunatly, my data didn't include any unique identifiers, which made it hard to work with the dataset, as well as made it not possible to answer some of wanted queries. Also, data didn't have range of years, which limited me in the ways of exploring the data. When viewing my results, please, keep in mind that the data was collected for the year 2000, and is limited only for OECD countries.
  
 
===== Project Tasks =====
 
===== Project Tasks =====
#Identifying and downloading the target data set-------->DONE
+
#Identifying and downloading the target data set
#Data cleaning and pre-processing
+
The dataset can be downloaded from here: [http://www.oecd.org/document/51/0,3746,en_2649_33931_40644339_1_1_1_1,00.html]
#Load the data into your Postgres instance  
+
#Data cleaning and pre-processing:
 +
Data is in CSV format. Ihad to illuminate few charcters. I erased ^M by using - dos2unix file1 > file2
 +
#Load the data into your Postgres instance:
 +
 
 
#Develop queries to explore your ideas in the data  
 
#Develop queries to explore your ideas in the data  
 
#Develop and document the model function you are exploring in the data
 
#Develop and document the model function you are exploring in the data

Revision as of 21:10, 7 December 2011

  • Title: Stereotypes Through Statistics
  • Dataset used: A Profile of Immigrant Population in the 21st century in OECD Countries
  • Aims and Ideas: Having a dataset about immigrants' population, I had a chance to create different population profiles, with the aim to verify and/or disprove certain stereotypical knowledge about the immigrants, as well as different nations. This includes me looking at occupations, countries of birth and labour force status.
  • Complications: Unfortunatly, my data didn't include any unique identifiers, which made it hard to work with the dataset, as well as made it not possible to answer some of wanted queries. Also, data didn't have range of years, which limited me in the ways of exploring the data. When viewing my results, please, keep in mind that the data was collected for the year 2000, and is limited only for OECD countries.
Project Tasks
  1. Identifying and downloading the target data set

The dataset can be downloaded from here: [1]

  1. Data cleaning and pre-processing:

Data is in CSV format. Ihad to illuminate few charcters. I erased ^M by using - dos2unix file1 > file2

  1. Load the data into your Postgres instance:
  1. Develop queries to explore your ideas in the data
  2. Develop and document the model function you are exploring in the data
  3. Develop a visualization to show the model/patterns in the data
Tech Details
  • Node: as5
  • Path to storage space: /scratch/big-data/elena
Results
  • The visualization(s)
  • The story