Difference between revisions of "Ivan-big-data"

Revision as of 18:08, 4 December 2011

Project title: Relationship between Homicide, Education, Abortion, HIV Incidence, Population and GDP for countries around the globe
Project data set: United Nations DB (UNdata)

1 Project Tasks
2 Identifying and downloading the target data set
3 Data cleaning and pre-processing
4 Load the data into your Postgres instance
5 Develop queries to explore your ideas in the data
6 Develop and document the model function you are exploring in the data
7 Develop a visualization to show the model/patterns in the data
- 7.1 Tech Details
- 7.2 Results

Project Tasks

Identifying and downloading the target data set

Data sets can be founded here:

Data cleaning and pre-processing

The first obstacle I faced with cleaning and pre-processing was inconsistency in countries naming. For example name China in education and name People's Republic of China in homicide... So when I did full join of country columns I realized that not all of them are in one line (things that are supposed to be in one line). So I changed names and made it unique through all 6 data sets.

Load the data into your Postgres instance

Data-sets I downloaded were in CSV files.
Here is an example for inserting data-set homicide into my PQSL:

drop table homicide;
create TABLE homicide (COUNTRY varchar primary key, YEAR int, RATE float);
COPY homicide FROM '/home/postgres/HOMICIDE.csv' DELIMITER ';' CSV;

Develop queries to explore your ideas in the data

Develop and document the model function you are exploring in the data

Develop a visualization to show the model/patterns in the data

Tech Details

Node: as2
Path to storage space: /scratch/big-data/ivan

Results

The visualization(s)
The story

@@ Line 12: / Line 12: @@
 * http://data.un.org/Data.aspx?d=UNODC&f=tableCode%3a1
-***Data cleaning and pre-processing
+==Data cleaning and pre-processing==
 The first obstacle I faced with cleaning and pre-processing was inconsistency in countries naming. For example name China in education and name People's Republic of China in homicide... So when I did full join of country columns I realized that not all of them are in one line (things that are supposed to be in one line). So I changed names and made it unique through all 6 data sets.
-***Load the data into your Postgres instance
+==Load the data into your Postgres instance==
 Data-sets I downloaded were in CSV files. <br/>
@@ Line 25: / Line 25: @@
 * COPY homicide FROM '/home/postgres/HOMICIDE.csv' DELIMITER ';' CSV;
-***Develop queries to explore your ideas in the data
+==Develop queries to explore your ideas in the data==
-***Develop and document the model function you are exploring in the data
+==Develop and document the model function you are exploring in the data==
-***Develop a visualization to show the model/patterns in the data
+==Develop a visualization to show the model/patterns in the data==
 ===== Tech Details =====

Difference between revisions of "Ivan-big-data"

Revision as of 18:08, 4 December 2011

Contents

Project Tasks

Identifying and downloading the target data set

Data cleaning and pre-processing

Load the data into your Postgres instance

Develop queries to explore your ideas in the data

Develop and document the model function you are exploring in the data

Develop a visualization to show the model/patterns in the data

Tech Details

Results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

websites

wiki

applied groups

Tools