Spark - 直接操作数据源 MySQL

> 如果我们的Mysql服务器性能不咋滴,但是硬盘很够,如何才能做各种复杂的聚合操作?答案就是使用spark的计算能力的,我们可以将mysql数据源接入到spark中。

## 读取

“`

val mysqlDF = spark

  .read

  .format(“jdbc”)

  .option(“driver”,”com.mysql.jdbc.Driver”)

  .option(“url”,”jdbc:mysql://localhost:3306/ttable”)

  .option(“user”,”root”)

  .option(“password”,”root”)

  .option(“dbtable”,”(select * from ttt where userId >1 AND userId < 10) as log”)//条件查询出想要的表

  //.option(“dbtable”,”ttable.ttt”)//整张表

  .option(“fetchsize”,”100″)

  .option(“useSSL”,”false”)

  .load()

“`

分区读取

“`

spark

  .read

  .format(“jdbc”)

  .option(“url”, url)

  .option(“dbtable”, “ttt”)

  .option(“user”, user)

  .option(“password”, password)

  .option(“numPartitions”, 10)

  .option(“partitionColumn”, “userId”)

  .option(“lowerBound”, 1)

  .option(“upperBound”, 10000)

  .load()

“`

实际会生成如下查询语句,(所有分区会一直查询,直到整张表数据查询完为止)

“`

SELECT * FROM ttt WHERE userId >= 1 and userId < 1000

SELECT * FROM ttt WHERE userId >= 1000 and userId < 2000

SELECT * FROM ttt WHERE userId >= 2000 and userId < 3000

“`

## 写入

“`

mysqlDF.createTempView(“log”)

spark

  .sql(“select * from log”)

  .toDF()

  .write

  .mode(SaveMode.Overwrite)

  .format(“jdbc”)

  .option(“driver”,”com.mysql.jdbc.Driver”)

  .option(“url”,”jdbc:mysql://localhost:3306/ttable”)

  .option(“dbtable”,”a”)

  .option(“user”,”root”)

  .option(“password”,”root”)

  .option(“fetchsize”,”100″)

  .option(“useSSL”,”false”)

  .save()

“`

![](https://upload-images.jianshu.io/upload_images/9028759-3c0e86bf567a8fb7.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

![](https://upload-images.jianshu.io/upload_images/9028759-07315bb8dadcd082.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

点赞