Difference between revisions of "Annotated-directory-big-data"

From Earlham CS Department
Jump to navigation Jump to search
(Another Data Set)
(Another Data Set)
Line 33: Line 33:
 
* Curator - ibabic09
 
* Curator - ibabic09
  
==== Another Data Set ====
+
==== The AOL Search Data ====
 
* URL - http://www.infochimps.com/datasets/aol-search-data/downloads/70079
 
* URL - http://www.infochimps.com/datasets/aol-search-data/downloads/70079
 
* Description - The AOL Search Data is a collection of real query log data that is based on real users. The data set consists of 20M web queries collected from 650k users over three months.
 
* Description - The AOL Search Data is a collection of real query log data that is based on real users. The data set consists of 20M web queries collected from 650k users over three months.

Revision as of 00:39, 7 October 2011

This is an annotated directory of public, freely available, "large" data sets. For now they are in no particular order.

Google ngrams

  • URL - http://books.google.com/ngrams/datasets
  • Description - The ngram databases on which Google's ngram viewer is built. A variety of corpora are available, e.g. by language, the "Google Million", English fiction, etc. Each set contains a list of ngrams, frequency, and date information.
  • Curator - CharlieP

MusicBrainz

World Cubing Association Database

Large Data Sets on AWS

Another Data Set

Another Data Set

The AOL Search Data

Another Data Set