- 基础环境
vi /etc/hosts
192.168.74.10 host196
192.168.74.29 host197
192.168.74.30 host198
安装jdk,zookeeper,hadoop
- 安装步骤
tar -zxvf spark-2.3.2-bin-hadoop2.7.tgz -C /opt/
cd /opt/spark-2.3.2-bin-hadoop2.7/
cd conf/
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
JAVA_HOME=/usr/local/jdk1.8.0_111
HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=192.168.74.10:2181-Dspark.deploy.zookeeper.dir=/spark"
cp slaves.template slaves
vi slaves
host196
host197
host198
scp -r spark-2.3.2-bin-hadoop2.7/ host197:/opt/
scp -r spark-2.3.2-bin-hadoop2.7/ host198:/opt/
- 启动、停止服务
- 启动
ssh host196
sbin/start-all.sh
ssh host197
sbin/start-master.sh
- 停止
ssh host196
sbin/stop-all.sh
ssh host197
sbin/stop-master.sh
- 基本测试
使用spark-shell来测试
[root@host198 spark-2.3.2-bin-hadoop2.7]# bin/spark-shell
2018-10-23 10:27:08 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://host198:4040
Spark context available as 'sc' (master = local[*], app id = local-1540261648120).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.2
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val file=sc.textFile("hdfs://192.168.74.10:9000/opt/hdfs_test/input/words.txt")
file: org.apache.spark.rdd.RDD[String] = hdfs://192.168.74.10:9000/opt/hdfs_test/input/words.txt MapPartitionsRDD[1] at textFile at <console>:24
scala> val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:25
scala> rdd.collect()
res0: Array[(String, Int)] = Array((zhangsan,1), (wangwu,1), (hello,3), (lisi,1))
scala> rdd.foreach(println)
(zhangsan,1)
(wangwu,1)
(hello,3)
(lisi,1)
- FAQ
- 参考资料
- 提交任务到Spark-(https://www.cnblogs.com/zengxiaoliang/p/6508330.html)
- Spark集群环境搭建-(https://blog.csdn.net/h249059945/article/details/82356927)
- spark基础之基于yarn两种提交模式分析-(https://blog.csdn.net/zhanglh046/article/details/78360812)