From Earlham CS Department
Jump to navigation Jump to search

CS430 - Database Systems

Why isn't this in a GDrive document? (with a table?)

Big Data Project

The Big Data Project is built of components, each of which will be built during a future assignment. The first component is curating the Annotated Directory of Data Sets, it has been two years since this was reviewed and extended and it's well past time. Your first task is to improve each of the existing entries and add additional ones:

  • Last update date and person
  • Sections by topical area

The data sets should be public and freely available, and large under some definition of that word. Query interfaces are web based tools that allow people to explore one or more datasets, usually with visualizations. Social Explorer and Google's ngrams are good examples of query interfaces.

For the first assignment you should identify and describe two data sets and one query interface. The data set entries can only be related to the query interface entry if they are separately available as stand-alone entities. This is an assignment by template, I've provided one of each type of entry in the documents linked below.

Stake your claim to entries sooner rather than later. The protocol is to check to see if something is taken, if not then fill-in a title, etc. as a placeholder until you complete the entry.

Be sure to edit by item (rather than section or page) to minimize update conflicts.