Difference between revisions of "Ivan-big-data"
Jump to navigation
Jump to search
Line 3: | Line 3: | ||
===== Project Tasks ===== | ===== Project Tasks ===== | ||
− | + | ==Identifying and downloading the target data set== | |
Data sets can be founded here: | Data sets can be founded here: | ||
* http://data.un.org/Data.aspx?q=gdp&d=SNAAMA&f=grID%3a101%3bcurrID%3aUSD%3bpcFlag%3a1 | * http://data.un.org/Data.aspx?q=gdp&d=SNAAMA&f=grID%3a101%3bcurrID%3aUSD%3bpcFlag%3a1 |
Revision as of 17:07, 4 December 2011
- Project title: Relationship between Homicide, Education, Abortion, HIV Incidence, Population and GDP for countries around the globe
- Project data set: United Nations DB (UNdata)
Contents
Project Tasks
Identifying and downloading the target data set
Data sets can be founded here:
- http://data.un.org/Data.aspx?q=gdp&d=SNAAMA&f=grID%3a101%3bcurrID%3aUSD%3bpcFlag%3a1
- http://data.un.org/Data.aspx?d=UNAIDS&f=inID%3a32
- http://data.un.org/Data.aspx?q=population&d=PopDiv&f=variableID%3a12
- http://data.un.org/Data.aspx?q=abortion&d=GenderStat&f=inID%3a12
- http://data.un.org/Data.aspx?q=education&d=UNESCO&f=series%3aXGOVEXP
- http://data.un.org/Data.aspx?d=UNODC&f=tableCode%3a1
- Data cleaning and pre-processing
The first obstacle I faced with cleaning and pre-processing was inconsistency in countries naming. For example name China in education and name People's Republic of China in homicide... So when I did full join of country columns I realized that not all of them are in one line (things that are supposed to be in one line). So I changed names and made it unique through all 6 data sets.
- Load the data into your Postgres instance
Data-sets I downloaded were in CSV files.
Here is an example for inserting data-set homicide into my PQSL:
- drop table homicide;
- create TABLE homicide (COUNTRY varchar primary key, YEAR int, RATE float);
- COPY homicide FROM '/home/postgres/HOMICIDE.csv' DELIMITER ';' CSV;
- Develop queries to explore your ideas in the data
- Develop and document the model function you are exploring in the data
- Develop a visualization to show the model/patterns in the data
Tech Details
- Node: as2
- Path to storage space: /scratch/big-data/ivan
Results
- The visualization(s)
- The story