Difference between revisions of "Annotated-directory-big-data"

From Earlham CS Department
Jump to navigation Jump to search
(Google ngrams)
Line 3: Line 3:
  
 
==== Google ngrams ====
 
==== Google ngrams ====
* URL - http://books.google.com/
+
* URL - http://books.google.com/ngrams/datasets
* Description - Started in 2002, and based on work that Google co-founders Sergey Brin and Larry Page did as CS graduate students at Stanford, this ambitious project aims to digitize and make available the full contents of the world's booksThere is a detailed history of the long-running project and it's offshoots at http://books.google.com/intl/en/googlebooks/history.html
+
* Description - The ngram databases on which Google's ngram viewer is builtA variety of corpora are available, e.g. by language, the "Google Million", etc.
 
* Curator - CharlieP
 
* Curator - CharlieP
  

Revision as of 10:23, 4 October 2011

This is an annotated directory of public, freely available, "large" data sets. For now they are in no particular order.

Google ngrams

  • URL - http://books.google.com/ngrams/datasets
  • Description - The ngram databases on which Google's ngram viewer is built. A variety of corpora are available, e.g. by language, the "Google Million", etc.
  • Curator - CharlieP

Another Data Set

Another Data Set

Another Data Set

Another Data Set

Another Data Set

Another Data Set

Another Data Set