Difference between revisions of "Annotated-directory-big-data"

From Earlham CS Department
Jump to navigation Jump to search
(CGI 60 Genomes)
(Research and Innovative Technology Administration)
Line 43: Line 43:
 
* Curator - ibabic09
 
* Curator - ibabic09
  
==== Research and Innovative Technology Administration ====
+
==== Freebase ====
*URL: http://www.rita.dot.gov/
+
*URL: http://wiki.freebase.com/wiki/Data_dumps
*Description: RITA coordinates the U.S. Department of Transportation's research and education programs. RITA also offers vital transportation statistics and analysis, and supports national efforts to improve education and training in transportation-related fields.
+
*Description: Full data dumps of every fact and assertion in Freebase,an open database of the world's information, covering millions of topics in hundreds of categories.
 
*Curator: eosergi10
 
*Curator: eosergi10
  

Revision as of 02:19, 14 October 2011

This is an annotated directory of public, freely available, "large" data sets. For now they are in no particular order.

Google ngrams

  • URL - http://books.google.com/ngrams/datasets
  • Description - The ngram databases on which Google's ngram viewer is built. A variety of corpora are available, e.g. by language, the "Google Million", English fiction, etc. Each set contains a list of ngrams, frequency, and date information.
  • Curator - CharlieP

MusicBrainz

World Cubing Association Database

Large Data Sets on AWS

Starcraft 2 Hit Analysis

Starcraft 2 Combat Analysis

Twitter Users by Location

The AOL Search Data

Freebase

  • URL: http://wiki.freebase.com/wiki/Data_dumps
  • Description: Full data dumps of every fact and assertion in Freebase,an open database of the world's information, covering millions of topics in hundreds of categories.
  • Curator: eosergi10

"DBpedia"

  • URL:http://blog.dbpedia.org/2011/09/11/dbpedia-37-released-including-15-localized-editions/
  • Description: The dataset release is based on Wikipedia dumps dating from late July 2011.DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.
  • Curator: eosergi10

IMDB

  • URL: http://www.imdb.com/interfaces
  • Description: All the data used to create IMDB, available from any of the 3 ftp sites listed under "Plain Text Data Files"
  • Curator: gaschue08