
上QQ阅读APP看书,第一时间看更新
See also
Other sources for machine learning data:
- SMS spam data: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
- Financial dataset from Lending Club https://www.lendingclub.com/info/download-data.action
- Research data from Yahoo http://webscope.sandbox.yahoo.com/index.php
- Amazon AWS public dataset http://aws.amazon.com/public-data-sets/
- Labeled visual data from Image Net http://www.image-net.org
- Census datasets http://www.census.gov
- Compiled YouTube dataset http://netsg.cs.sfu.ca/youtubedata/
- Collected rating data from the MovieLens site http://grouplens.org/datasets/movielens/
- Enron dataset available to the public http://www.cs.cmu.edu/~enron/
- Dataset for the classic book elements of statistical learning http://statweb.stanford.edu/~tibs/ElemStatLearn/data.htmlIMDB
- Movie dataset http://www.imdb.com/interfaces
- Million Song dataset http://labrosa.ee.columbia.edu/millionsong/
- Dataset for speech and audio http://labrosa.ee.columbia.edu/projects/
- Face recognition data http://www.face-rec.org/databases/
- Social science data http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies
- Bulk datasets from Cornell University http://arxiv.org/help/bulk_data_s3
- Project Guttenberg datasets http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs
- Datasets from World Bank http://data.worldbank.org
- Lexical database from World Net http://wordnet.princeton.edu
- Collision data from NYPD http://nypd.openscrape.com/#/
- Dataset for congressional row calls and others http://voteview.com/dwnl.htm
- Large graph datasets from Stanford http://snap.stanford.edu/data/index.html
- Rich set of data from datahub https://datahub.io/dataset
- Yelp's academic dataset https://www.yelp.com/academic_dataset
- Source of data from GitHub https://github.com/caesar0301/awesome-public-datasets
- Dataset archives from Reddit https://www.reddit.com/r/datasets/
There are some specialized datasets (for example, text analytics in Spanish, and gene and IMF data) that might be of some interest to you:
- Datasets from Colombia (in Spanish): http://www.datos.gov.co/frm/buscador/frmBuscador.aspx
- Dataset from cancer studies http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
- Research data from Pew http://www.pewinternet.org/datasets/
- Data from the state of Illinois/USA https://data.illinois.gov
- Data from freebase.com http://www.freebase.com
- Datasets from the UN and its associated agencies http://data.un.org
- International Monetary Fund datasets http://www.imf.org/external/data.htm
- UK government data https://data.gov.uk
- Open data from Estonia http://pub.stat.ee/px-web.2001/Dialog/statfile1.asp
- Many ML libraries in R containing data that can be exported as CSV https://www.r-project.org
- Gene expression datasets http://www.ncbi.nlm.nih.gov/geo/