Spark踩坑记 https://zhuanlan.zhihu.com/p/60657101
【1号坑】spark-submit提交 jar 失败
提示:spark-submit /bin/spark-class: No such file or directory
解决:在 ~/.bash_profile 文件添加环境 SPARK_HOME
export SPARK_HOME=/Users/chenjinying/apps/spark-2.4.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
【2号坑】提交job到线上异常 class not found
因为线上集群不包含第三方依赖,需要手工提交 –jars xxx.jar,xxx.jar,xxx.jar
【3号坑】执行命令本地提交job提示没有main方法
$ spark-submit target/hello-spark-java-1.0-SNAPSHOT.jar --class HelloJava
SparkException: No main class set in JAR; please specify one with —class
解决:被执行的 xxx.jar 要放在最后,命令改成
$ spark-submit --class HelloJava target/hello-spark-java-1.0-SNAPSHOT.jar
【4号坑】运行jar包中的 scala主方法失败
$ spark-submit –class hello.scala.HelloScala target/hello-spark-java-1.0-SNAPSHOT.jar
2019-03-27 11:23:21 WARN SparkSubmit$$anon$2:87 – Failed to load hello.scala.HelloScala.
java.lang.ClassNotFoundException: hello.scala.HelloScala
分析:解压 hello-spark-java-1.0-SNAPSHOT.jar 文件,发现里面没有scala 文件
解决:参考:【5号坑】
【5号坑】运行 $ mvn clean scala:compile compile package 失败
提示:[ERROR] No plugin found for prefix ‘scala’ in the current project and in the plugin groups
原因:mvn clean package 默认只处理java源代码的编译、打包,而不管scala,所以没有scala的class文件
解决:在 pom.xml 文件添加插件 maven-scala-plugin
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
然后重新执行 $ mvn clean scala:compile compile package 打包,完后 jar 里会包含scala的class文件
【6号坑】kafka.cluster.BrokerEndPoint cannot be cast to kafka.cluster.Broker
原因:版本问题 spark-streaming-kafka-0-10_2.11 导致序列化失败
解决:换成 spark-streaming-kafka-0-8_2.11 即可
【7号坑】JavaStreamingContextFactory无法加载
原因:这个类在 spark 2.1.1废掉了
解决:使用 new Function0<JavaStreamingContext>() 替换
JavaStreamingContext jssc = JavaStreamingContext.getOrCreate(CHECKPOINT_DIR, new Function0<JavaStreamingContext>() {
@Override
public JavaStreamingContext call() throws Exception {
return createContext();
}
});
【8号坑】如何 checkpoint 使用本地文件测试
private static String CHECKPOINT_DIR = "hdfs://myhdfs/checkpoint";
private static String CHECKPOINT_DIR = "/Users/chenjinying/tmp/checkpoint";
【9号坑】Caused by: java.lang.ClassNotFoundException: kafka.serializer.StringDecoder
分析:IEDA运行正确,打包运行时才出现,应该是依赖问题
这个类所在包是 org.apache.kafka:kafka_2.11:jar:0.8.2.1:compile 它在 org.apache.spark:spark-streaming-kafka-0-8_2.11:jar:2.1.1:compile 内被引用
看一下 org.apache.spark:spark-streaming-kafka-0-8_2.11:jar:2.1.1:compile 的依赖关系
[INFO] +- org.apache.spark:spark-streaming-kafka-0-8_2.11:jar:2.1.1:compile
[INFO] | \- org.apache.kafka:kafka_2.11:jar:0.8.2.1:compile
[INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile
[INFO] | +- com.yammer.metrics:metrics-core:jar:2.2.0:compile
[INFO] | +- org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile
[INFO] | +- com.101tec:zkclient:jar:0.3:compile
[INFO] | \- org.apache.kafka:kafka-clients:jar:0.8.2.1:compile
解决:在提交job时指定添加 –jars xxx.jar,xxx.jar 缺哪个加哪个
【10号坑】Caused by: java.lang.ClassNotFoundException: com.yammer.metrics.Metrics
解决:参看【9号坑】
【11号坑】UnsupportedClassVersionError: com/jychan/easycode/recommend/warehouse/job/UserItemCollectJob : Unsupported major.minor version 52.0
原因:spark平台所用jdk版本不支持 jdk8
解决:打包指定jdk版本7
【12号坑】集群运行job失败
19/03/28 14:54:01 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
org.apache.spark.SparkException: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:385)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:385)
at scala.util.Either.fold(Either.scala:98)
at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:384)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at com.jychan.easycode.recommend.warehouse.job.UserItemCollectJob.createContext(UserItemCollectJob.java:64)
at com.jychan.easycode.recommend.warehouse.job.UserItemCollectJob$1.call(UserItemCollectJob.java:35)
at com.jychan.easycode.recommend.warehouse.job.UserItemCollectJob$1.call(UserItemCollectJob.java:32)
at org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:627)
at org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:626)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:826)
at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:626)
at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at com.jychan.easycode.recommend.warehouse.job.UserItemCollectJob.main(UserItemCollectJob.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:638)
解决:我这次原因是spark无法链接kafka,权限问题,当然你们可能是其他原因