Annotated-directory-big-data

From Earlham CS Department
Revision as of 23:34, 6 October 2011 by Babic91 (talk | contribs) (Another Data Set)
Jump to navigation Jump to search

This is an annotated directory of public, freely available, "large" data sets. For now they are in no particular order.

Google ngrams

  • URL - http://books.google.com/ngrams/datasets
  • Description - The ngram databases on which Google's ngram viewer is built. A variety of corpora are available, e.g. by language, the "Google Million", English fiction, etc. Each set contains a list of ngrams, frequency, and date information.
  • Curator - CharlieP

MusicBrainz

World Cubing Association Database

Large Data Sets on AWS

Another Data Set

Another Data Set

Another Data Set

Another Data Set