spark-2.4.2
kudu-1.7.0
开始尝试
1)自己手工将jar加到classpath
spark-2.4.2-bin-hadoop2.6
+
kudu-spark2_2.11-1.7.0-cdh5.16.1.jar
# bin/spark-shell scala> val df = spark.read.options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test.tbl_test")).format("kudu").load java.lang.ClassNotFoundException: Failed to find data source: kudu. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:660) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) ... 49 elided Caused by: java.lang.ClassNotFoundException: kudu.DefaultSource at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:72) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:634) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:634) at scala.util.Failure.orElse(Try.scala:224) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:634) ... 51 more
2)采用官方的方式(将kudu版本改为1.7.0)
spark-2.4.2-bin-hadoop2.6
# bin/spark-shell –packages org.apache.kudu:kudu-spark2_2.11:1.7.0
same error
3)采用官方的方式(不修改)
spark-2.4.2-bin-hadoop2.6
# bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.9.0 scala> val df = spark.read.options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test.tbl_test")).format("kudu").load java.lang.NoClassDefFoundError: scala/Product$class at org.apache.kudu.spark.kudu.Upsert$.<init>(OperationType.scala:41) at org.apache.kudu.spark.kudu.Upsert$.<clinit>(OperationType.scala) at org.apache.kudu.spark.kudu.DefaultSource$$anonfun$getOperationType$2.apply(DefaultSource.scala:217) at org.apache.kudu.spark.kudu.DefaultSource$$anonfun$getOperationType$2.apply(DefaultSource.scala:217) at scala.Option.getOrElse(Option.scala:138) at org.apache.kudu.spark.kudu.DefaultSource.getOperationType(DefaultSource.scala:217) at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:104) at org.apache.kudu.spark.kudu.DefaultSource.createRelation(DefaultSource.scala:87) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167) ... 49 elided Caused by: java.lang.ClassNotFoundException: scala.Product$class at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 61 more
看起来是scala版本冲突,到spark下载页面发现一句话:
Note that, Spark is pre-built with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12.
4)kudu-spark改为scala2.12
spark-2.4.2-bin-hadoop2.6
# bin/spark-shell –packages org.apache.kudu:kudu-spark2_2.12:1.9.0
:::::::::::::::::::::::::::::::::::::::::::::: :: UNRESOLVED DEPENDENCIES :: :::::::::::::::::::::::::::::::::::::::::::::: :: org.apache.kudu#kudu-spark2_2.12;1.9.0: not found ::::::::::::::::::::::::::::::::::::::::::::::
好吧,下载2.4.3
5)采用官方的方式(继续)
spark-2.4.3-bin-hadoop2.6
# bin/spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.9.0 scala> val df = spark.read.options(Map("kudu.master" -> "master:7051", "kudu.table" -> "impala::test.tbl_test")).format("kudu").load df: org.apache.spark.sql.DataFrame = [order_no: string, id: bigint ... 28 more fields]
正常了
6)采用官方的方式(将kudu版本改为1.7.0)
spark-2.4.3-bin-hadoop2.6
# bin/spark-shell –packages org.apache.kudu:kudu-spark2_2.11:1.7.0
same error
看来spark连接kudu只能采用scala2.11+kudu-spark2_2.11:1.9.0
参考:
https://kudu.apache.org/docs/developing.html
http://spark.apache.org/downloads.html