scala – 如何将ML稀疏矢量类型的变量转换为MLlib稀疏矢量类型?

当我尝试从Vector Transformer的输出创建标记点时,我面临以下问题:

  val realout = output.select("label","features").rdd.map(row => LabeledPoint
   row.getAs[Double]("label"),
row.getAs[org.apache.spark.mllib.linalg.SparseVector]("features")
))

我得到的错误是:

enter [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 1 times, most recent failure: Lost task 0.0 in stage 13.0 (TID 13, localhost): java.lang.ClassCastException: org.apache.spark.ml.linalg.SparseVector cannot be cast to org.apache.spark.mllib.linalg.Vector
[error]     at DataCleaning$$anonfun$1.apply(DataCleaning.scala:107
[error]     at DataCleaning$$anonfun$1.apply(DataCleaning.scala:105)
[error] 
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
[error]
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462
[error]
atorg.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:213)

我检查了link1中提供的解决方案,它解释了spark 2.0.0中向量的转换,但面临如下所述的编译错误,

object linalg is not a member of package org.apache.spark.ml

请帮助.谢谢 !

最佳答案 org.apache.spark.mllib.linalg.SparseVector中有一个静态方法,用于将新的linalg类型转换为名为fromML的spark.mllib类型.它可以用于将ML稀疏向量转换为MLlib稀疏向量.请记住,它只复制引用.

您可以按如下方式使用它:

   val realout : RDD[LabeledPoint] = features1.rdd.map(row => LabeledPoint(row.getAs[Double]("label"), 
  SparseVector.fromML(row.getAs[org.apache.spark.ml.linalg.SparseVector]("features"))))

请参阅Spark文档:https://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/mllib/linalg/SparseVector.html

附: – :这个文档直接指向Java,但我的示例代码是在Scala中.但是,它没有问题,因为Scala与Java兼容.这意味着你可以从另一种方法中调用任何一种语言的方法.

点赞