上QQ阅读APP看书，第一时间看更新

Big data

Big data is a hot topic. When data is regarded too big for traditional databases to be analyzed, one can set up multiple clusters of servers for processing such data. Analyzing data in this context can, for example, refer to searching for something specific, looking for patterns, and calculating statistics.

This data could be obtained from the data collected from web servers (for example, logged visitors' clicks), the output obtained from external sensors at a manufacturer's plant, legacy servers that have been producing log files for many years, and so forth. Data sizes can vary wildly as well, but often, they take up multiple terabytes in total.

Two popular technologies in the big data arena are the following:

Apache Hadoop (provides storage of data and takes care of data distribution to other servers)
Apache Spark (uses Hadoop to stream data and makes it possible to analyze incoming data)

Both Hadoop and Spark are for the most part written in Java. While both offer interfaces to a lot of programming languages and platforms, it will not be a surprise that JVM is one among them.

The functional programming paradigm focuses on creating code that would run safely on multiple CPU cores, so languages that are fully specialized in this style, such as Scala or Clojure, are appropriate candidates to be used with either Spark or Hadoop.