當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Machine Learning On Spark——基础数据结构（二)

發(fā)布時(shí)間：2024/1/23 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 Machine Learning On Spark——基础数据结构（二) 小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

本節(jié)主要內(nèi)容

IndexedRowMatrix

BlockMatrix

1. IndexedRowMatrix的使用

IndexedRowMatrix，顧名思義就是帶索引的RowMatrix，它采用case class IndexedRow(index: Long, vector: Vector)類來表示矩陣的一行，index表示的就是它的索引，vector表示其要存儲(chǔ)的內(nèi)容。其使用方式如下：

package cn.ml.datastructimport org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.linalg.distributed.RowMatrix import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix import org.apache.spark.mllib.stat.MultivariateStatisticalSummary import org.apache.spark.mllib.linalg.Matrix import org.apache.spark.mllib.linalg.SingularValueDecomposition import org.apache.spark.mllib.linalg.Matrices import org.apache.spark.mllib.linalg.distributed.IndexedRow import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrixobject IndexRowMatrixDemo extends App {val sparkConf = new SparkConf().setAppName("IndexRowMatrixDemo ").setMaster("spark://sparkmaster:7077") val sc = new SparkContext(sparkConf)//定義一個(gè)隱式轉(zhuǎn)換函數(shù)implicit def double2long(x:Double)=x.toLong//數(shù)據(jù)中的第一個(gè)元素為IndexedRow中的index，剩余的映射到vector//f.take(1)(0)獲取到第一個(gè)元素并自動(dòng)進(jìn)行隱式轉(zhuǎn)換，轉(zhuǎn)換成Long類型val rdd1= sc.parallelize(Array(Array(1.0,2.0,3.0,4.0),Array(2.0,3.0,4.0,5.0),Array(3.0,4.0,5.0,6.0))).map(f => IndexedRow(f.take(1)(0),Vectors.dense(f.drop(1))))val indexRowMatrix = new IndexedRowMatrix(rdd1)//計(jì)算拉姆矩陣var gramianMatrix:Matrix=indexRowMatrix.computeGramianMatrix()//轉(zhuǎn)換成行矩陣RowMatrixvar rowMatrix:RowMatrix=indexRowMatrix.toRowMatrix()//其它方法例如computeSVD計(jì)算奇異值、multiply矩陣相乘等操作，方法使用與RowMaxtrix相同}

2. BlockMatrix的使用

分塊矩陣將一個(gè)矩陣分成若干塊，例如：

可以將其分成四塊

從而矩陣P有如下形式

更多分塊矩陣的相關(guān)內(nèi)容包括分塊矩陣的轉(zhuǎn)置、分塊矩陣的相乘操作可以參見https://en.wikipedia.org/wiki/Block_matrix

package cn.ml.datastructimport org.apache.spark.mllib.linalg.distributed.BlockMatrix import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix import org.apache.spark.mllib.linalg.distributed.MatrixEntry import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix import org.apache.spark.SparkContext import org.apache.spark.mllib.linalg.distributed.IndexedRow import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.SparkConfobject BlockMatrixDemo extends App {val sparkConf = new SparkConf().setAppName("BlockMatrixDemo").setMaster("spark://sparkmaster:7077") //這里指在本地運(yùn)行，2個(gè)線程 val sc = new SparkContext(sparkConf)implicit def double2long(x:Double)=x.toLongval rdd1= sc.parallelize(Array(Array(1.0,20.0,30.0,40.0),Array(2.0,50.0,60.0,70.0),Array(3.0,80.0,90.0,100.0))).map(f => IndexedRow(f.take(1)(0),Vectors.dense(f.drop(1))))val indexRowMatrix = new IndexedRowMatrix(rdd1)//將IndexedRowMatrix轉(zhuǎn)換成BlockMatrix，指定每塊的行列數(shù)val blockMatrix:BlockMatrix=indexRowMatrix.toBlockMatrix(2, 2)//執(zhí)行后的打印內(nèi)容：//Index:(0,0)MatrixContent:2 x 2 CSCMatrix//(1,0) 20.0//(1,1) 30.0//Index:(1,1)MatrixContent:2 x 1 CSCMatrix//(0,0) 70.0//(1,0) 100.0//Index:(1,0)MatrixContent:2 x 2 CSCMatrix//(0,0) 50.0//(1,0) 80.0//(0,1) 60.0//(1,1) 90.0//Index:(0,1)MatrixContent:2 x 1 CSCMatrix//(1,0) 40.0//從打印內(nèi)容可以看出：各分塊矩陣采用的是稀疏矩陣CSC格式存儲(chǔ)blockMatrix.blocks.foreach(f=>println("Index:"+f._1+"MatrixContent:"+f._2))//轉(zhuǎn)換成本地矩陣//0.0 0.0 0.0 //20.0 30.0 40.0 //50.0 60.0 70.0 //80.0 90.0 100.0 //從轉(zhuǎn)換后的內(nèi)容可以看出，在indexRowMatrix.toBlockMatrix(2, 2)//操作時(shí)，指定行列數(shù)與實(shí)際矩陣內(nèi)容不匹配時(shí)，會(huì)進(jìn)行相應(yīng)的零值填充blockMatrix.toLocalMatrix()//塊矩陣相加blockMatrix.add(blockMatrix)//塊矩陣相乘blockMatrix*blockMatrix^T（T表示轉(zhuǎn)置）blockMatrix.multiply(blockMatrix.transpose)//轉(zhuǎn)換成CoordinateMatrixblockMatrix.toCoordinateMatrix()//轉(zhuǎn)換成IndexedRowMatrixblockMatrix.toIndexedRowMatrix()//驗(yàn)證分塊矩陣的合法性blockMatrix.validate() }

總結(jié)

以上是生活随笔為你收集整理的Machine Learning On Spark——基础数据结构（二)的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Machine Learning On
下一篇： Machine Learning on