Spark Streaming 避坑的注意事项

2024年1月28日 339次阅读来源: 枫华絮语

两个RDD不能嵌套计算：

Caused by: org.apache.spark.SparkException: RDD transformations and actions can only be 
invoked by the driver, not inside of other transformations; for example, rdd1.map(x => 
rdd2.values.count() * x) is invalid because the values transformation and count action 
cannot be performed inside of the rdd1.map transformation. For more information, see 
SPARK-5063.

解决方案：将一个rdd进行action转换后，保存在内存中。

计算中内存溢出

原因：
Spark Streaming执行一个流处理的时候，这个流处理还没有处理完成，又接入了下一流；

Exception in thread "JobGenerator" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3236)

解决方案：加快每个流的处理速率，调整流处理的时间间隔，保证在下个流到来之前，当前流就处理完成。

资源配置不足或者过多


根据服务器核心数配置: executor-cores
根据服务器内存配置: executor-memory

例如：
--executor-memory 20G \
--executor-cores 20 \

    原文作者：枫华絮语
    原文地址: https://segmentfault.com/a/1190000007750206
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。