data:image/s3,"s3://crabby-images/06062/060626a6f14f35def807cce00e65e7004bf0abc7" alt="Apache Spark 2.x Machine Learning Cookbook"
上QQ阅读APP看书,第一时间看更新
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Import the necessary packages for vector and matrix manipulation:
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.distributed.{IndexedRow, IndexedRowMatrix}
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}
import org.apache.spark.sql.{SparkSession}
import org.apache.spark.mllib.linalg._
import breeze.linalg.{DenseVector => BreezeVector}
import Array._
import org.apache.spark.mllib.linalg.DenseMatrix
import org.apache.spark.mllib.linalg.SparseVector
- Set up the Spark session and application parameters so Spark can run:
val spark = SparkSession
.builder
.master("local[*]")
.appName("myVectorMatrix")
.config("spark.sql.warehouse.dir", ".")
.getOrCreate()
- We create the Vectors:
val w1 = Vectors.dense(1,2,3)
val w2 = Vectors.dense(4,-5,6)
- We convert the vectors from the Spark public interface to a Breeze (library) artifact so we can use a rich set of operators provided for Vector manipulation:
val w1 = Vectors.dense(1,2,3)
val w2 = Vectors.dense(4,-5,6)
val w3 = new BreezeVector(w1.toArray)//w1.asBreeze
val w4= new BreezeVector(w2.toArray)// w2.asBreeze
println("w3 + w4 =",w3+w4)
println("w3 - w4 =",w3+w4)
println("w3 * w4 =",w3.dot(w4))
- Let's look at the output and understand the results. For an operational understanding of vector addition, subtraction, and multiplication, see the How it works... section in this recipe.
The output is as follows:
w3 + w4 = DenseVector(5.0, -3.0, 9.0)
w3 - w4 = DenseVector(5.0, -3.0, 9.0)
w3 * w4 =12.0
- Vector operations using both sparse and dense vectors with the Breeze library conversion are:
val sv1 = Vectors.sparse(10, Array(0,2,9), Array(5, 3, 13))
val sv2 = Vectors.dense(1,0,1,1,0,0,1,0,0,13)
println("sv1 - Sparse Vector = ",sv1)
println("sv2 - Dense Vector = ",sv2)
println("sv1 * sv2 =", new BreezeVector(sv1.toArray).dot(new BreezeVector(sv2.toArray)))
This is an alternate way, but it has the drawback of using a private function (see the actual source code for Spark 2.x.x itself). We recommend the method presented previously:
println("sv1 * sve2 =", sv1.asBreeze.dot(sv2.asBreeze))
We take a look at the output:
sv1 - Sparse Vector = (10,[0,2,9],[5.0,3.0,13.0]) sv2 - Dense Vector = [1.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,13.0]
sv1 * sv2 = 177.0