
上QQ阅读APP看书,第一时间看更新
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Import the necessary packages for vector and matrix manipulation:
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.distributed.{IndexedRow, IndexedRowMatrix}
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}
import org.apache.spark.sql.{SparkSession}
import org.apache.spark.mllib.linalg._
import breeze.linalg.{DenseVector => BreezeVector}
import Array._
import org.apache.spark.mllib.linalg.DenseMatrix
import org.apache.spark.mllib.linalg.SparseVector
- Set up the Spark session and application parameters so Spark can run:
val spark = SparkSession
.builder
.master("local[*]")
.appName("myVectorMatrix")
.config("spark.sql.warehouse.dir", ".")
.getOrCreate()
- We create the matrices:
val sparseMat33= Matrices.sparse(3,3 ,Array(0, 2, 3, 6) ,Array(0, 2, 1, 0, 1, 2),Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))
val denseFeatureVector= Vectors.dense(1,2,1)
val denseVec13 = Vectors.dense(5,3,0)
- Multiply the matrix and vector and print the results. This is an extremely useful operation which becomes a common theme in most Spark ML cases. We use a SparseMatrix to demonstrate the fact that the Dense, Sparse, and Matrix are interchangeable and only the density (for example, the percent of non-zero elements) and performance should be the criteria for selection:
val result0 = sparseMat33.multiply(denseFeatureVector)
println("SparseMat33 =", sparseMat33)
println("denseFeatureVector =", denseFeatureVector)
println("SparseMat33 * DenseFeatureVector =", result0)
The output is as follows:
(SparseMat33 =,3 x 3 CSCMatrix
(0,0) 1.0
(2,0) 2.0
(1,1) 3.0
(0,2) 4.0
(1,2) 5.0
(2,2) 6.0)
denseFeatureVector =,[1.0,2.0,1.0]
SparseMat33 * DenseFeatureVector = [5.0,11.0,8.0]
- Multiplying a DenseMatrix with DenseVector.
This is provided for completeness and will help the user to follow the matrix and vector multiplication more easily without worrying about sparsity:
println("denseVec2 =", denseVec13)
println("denseMat1 =", denseMat1)
val result3= denseMat1.multiply(denseVec13)
println("denseMat1 * denseVect13 =", result3)
The output is as follows:
denseVec2 =,[5.0,3.0,0.0] denseMat1 = 23.0 34.3 21.3 11.0 33.0 22.6 17.0 24.5 22.2 denseMat1 * denseVect13 =,[217.89,154.0,158.5]
- We demonstrate the transposing of a Matrix, which is an operation to swap rows with columns. It is an important operation and used almost on a daily basis if you are involved in Spark ML or data engineering.
Here we demonstrate two steps:
-
- Transposing a SparseMatrix and examining the new resulting matrix via the output:
val transposedMat1= sparseMat1.transpose
println("transposedMat1=\n",transposedMat1)
The output is as follows:
Original sparseMat1 =,3 x 2 CSCMatrix (0,0) 11.0 (1,1) 22.0 (2,1) 33.0)
(transposedMat1=,2 x 3 CSCMatrix (0,0) 11.0 (1,1) 22.0 (1,2) 33.0)
1.0 4.0 7.0 2.0 5.0 8.0 3.0 6.0 9.0
-
- Demonstrating that the transpose of a transpose yields the original matrix:
val transposedMat1= sparseMat1.transpose
println("transposedMat1=\n",transposedMat1) println("Transposed twice", denseMat33.transpose.transpose) // we get the original back
The output is as follows:
Matrix transposed twice=
1.0 4.0 7.0 2.0 5.0 8.0 3.0 6.0 9.0
Transposing a dense matrix and examining the new resulting matrix via the output:
This makes it easier to see how row and column indexes are swapped:
val transposedMat2= denseMat1.transpose
println("Original sparseMat1 =", denseMat1)
println("transposedMat2=" ,transposedMat2)
Original sparseMat1 =
23.0 34.3 21.3
11.0 33.0 22.6
17.0 24.5 22.2
transposedMat2=
23.0 11.0 17.0
34.3 33.0 24.5
21.3 22.6 22.2
-
- We now look at matrix multiplication and how it would look in code.
We declare two 2x2 Dense Matrices:
// Matrix multiplication
val dMat1: DenseMatrix= new DenseMatrix(2, 2, Array(1.0, 3.0, 2.0, 4.0))
val dMat2: DenseMatrix = new DenseMatrix(2, 2, Array(2.0,1.0,0.0,2.0))
println("dMat1 * dMat2 =", dMat1.multiply(dMat2)) //A x B
println("dMat2 * dMat1 =", dMat2.multiply(dMat1)) //B x A not the same as A xB
The output is as follows:
dMat1 =,1.0 2.0 3.0 4.0 dMat2 =,2.0 0.0 1.0 2.0 dMat1 * dMat2 =,4.0 4.0 10.0 8.0 //Note: A x B is not the same as B x A dMat2 * dMat1 = 2.0 4.0 7.0 10.0