Q: Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.

问题: 在yarn集群上训练Word2Vec模型数据保存在hadfs上的报错:

w2cModel.write.overwrite.save(path)

ERROR datasources.FileFormatWriter: Aborting job null.org.apache.spark.SparkException:
Job aborted due to stage failure: Serialized task 5829:0 was 354127887 bytes, which exceeds max allowed:
spark.rpc.message.maxSize (134217728 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values.

在小数据量上没有出错, slone模式下也没问题,参考问题:
http://stackoverflow.com/questions/40842736/spark-word2vecmodel-exceeds-max-rpc-size-for-saving

spark rpc传输序列化数据应该是有大小的限制,此错误消息意味着将一些较大的对象从driver端发送到executors。
尝试增大partition数目没有奏效 考虑修改spark.rpc.message.maxSize的值
spark.rpc.message.maxSize 默认值是128, 也就是131072K, 134217728 bytes
修改spark.rpc.message.maxSize 值为512, 并尽量增大partiton数

--conf "spark.rpc.message.maxSize=512"
点赞