Apache Spark 2.x Machine Learning Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it...

The following is a list of open source data worth exploring if you would like to develop applications in this field:

  • UCI machine learning repository: This is an extensive library with search functionality. At the time of writing, there were more than 350 datasets. You can click on the https://archive.ics.uci.edu/ml/index.html link to see all the datasets or look for a specific set using a simple search (Ctrl + F).
  • Kaggle datasets: You need to create an account, but you can download any sets for learning as well as for competing in machine learning competitions. The https://www.kaggle.com/competitions link provides details for exploring and learning more about Kaggle, and the inner workings of machine learning competitions.
  • MLdata.org: A public site open to all with a repository of datasets for machine learning enthusiasts.
  • Google Trends: You can find statistics on search volume (as a proportion of total search) for any given term since 2004 on http://www.google.com/trends/explore.
  • The CIA World Factbook: The https://www.cia.gov/library/publications/the-world-factbook/ link provides information on the history, population, economy, government, infrastructure, and military of 267 countries.