Difference between revisions of "Cs430-2015"

From Earlham CS Department
Jump to navigation Jump to search
(Big Data Project)
(CS430 - Database Systems)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
= CS430 - Database Systems =  
 
= CS430 - Database Systems =  
 +
 +
= Why isn't this in a GDrive document? (with a table?) =
  
 
== Big Data Project ==  
 
== Big Data Project ==  
Line 12: Line 14:
 
Stake your claim to entries sooner rather than later.  The protocol is to check to see if something is taken, if not then fill-in a title, etc. as a placeholder until you complete the entry.
 
Stake your claim to entries sooner rather than later.  The protocol is to check to see if something is taken, if not then fill-in a title, etc. as a placeholder until you complete the entry.
  
Be sure to edit by section rather than page to minimize update conflicts.   
+
Be sure to '''edit by item''' (rather than section or page) to minimize update conflicts.   
  
 
* [[annotated-directory-big-data|Annotated Directory of Data Sets]]
 
* [[annotated-directory-big-data|Annotated Directory of Data Sets]]
 
<!--  * [[annotated-directory-interfaces|Annotated Directory of Query Interfaces]]  -->
 
<!--  * [[annotated-directory-interfaces|Annotated Directory of Query Interfaces]]  -->
 
* [[student-projects-big-data-2015|Student Projects (2015)]]
 
* [[student-projects-big-data-2015|Student Projects (2015)]]

Latest revision as of 07:20, 2 March 2015

CS430 - Database Systems

Why isn't this in a GDrive document? (with a table?)

Big Data Project

The Big Data Project is built of components, each of which will be built during a future assignment. The first component is curating the Annotated Directory of Data Sets, it has been two years since this was reviewed and extended and it's well past time. Your first task is to improve each of the existing entries and add additional ones:

  • Last update date and person
  • Sections by topical area

The data sets should be public and freely available, and large under some definition of that word. Query interfaces are web based tools that allow people to explore one or more datasets, usually with visualizations. Social Explorer and Google's ngrams are good examples of query interfaces.

For the first assignment you should identify and describe two data sets and one query interface. The data set entries can only be related to the query interface entry if they are separately available as stand-alone entities. This is an assignment by template, I've provided one of each type of entry in the documents linked below.

Stake your claim to entries sooner rather than later. The protocol is to check to see if something is taken, if not then fill-in a title, etc. as a placeholder until you complete the entry.

Be sure to edit by item (rather than section or page) to minimize update conflicts.