逻辑回归原理及spark例子

之前在乐视网的时候组内有同事的挖掘工作用到逻辑回归,最近利用零散时间看了下逻辑回归的原理。主要参考了https://www.cnblogs.com/pinard/p/6029432.html 这篇文章,感觉写的比较清晰。
例子中对K元逻辑回归没有详细推导,我自己推导了一下,过程也比较简单。(太长时间不写字,感觉已经不会拿笔了。。。)

《逻辑回归原理及spark例子》 IMG_1265.jpg

然后运行了一下spark自带的LogisticRegressionWithLBFGSExample例子。

源码如下:
import org.apache.spark.{SparkConf, SparkContext}
// 《逻辑回归原理及spark例子》
import org.apache.spark.mllib.classification.{LogisticRegressionModel, LogisticRegressionWithLBFGS}
import org.apache.spark.mllib.evaluation.MulticlassMetrics
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.util.MLUtils
// 《逻辑回归原理及spark例子》

object LogisticRegressionWithLBFGSExample {

def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName(“LogisticRegressionWithLBFGSExample”)
val sc = new SparkContext(conf)

// $example on$
// Load training data in LIBSVM format.
val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")

// Split data into training (60%) and test (40%).
val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
val training = splits(0).cache()
val test = splits(1)

// Run training algorithm to build the model
val model = new LogisticRegressionWithLBFGS()
  .setNumClasses(10)
  .run(training)

// Compute raw scores on the test set.
val predictionAndLabels = test.map { case LabeledPoint(label, features) =>
  val prediction = model.predict(features)
  (prediction, label)
}

// Get evaluation metrics.
val metrics = new MulticlassMetrics(predictionAndLabels)
val accuracy = metrics.accuracy

println(s"Accuracy = $accuracy")

// Save and load model
model.save(sc, "target/tmp/scalaLogisticRegressionWithLBFGSModel")
val sameModel = LogisticRegressionModel.load(sc,
  "target/tmp/scalaLogisticRegressionWithLBFGSModel")
// $example off$
println(s"Accuracy  = aaaaaaaaaaaaaaaaa  $accuracy")
sc.stop()

}
}
// scalastyle:on println

点赞