spark 2.3.1入门学习

  1. 基础环境
vi /etc/hosts
192.168.74.10  host196
192.168.74.29  host197
192.168.74.30  host198

安装jdk,zookeeper,hadoop

  1. 安装步骤
tar -zxvf spark-2.3.2-bin-hadoop2.7.tgz -C /opt/
cd /opt/spark-2.3.2-bin-hadoop2.7/
cd conf/

cp spark-env.sh.template spark-env.sh
vi spark-env.sh
JAVA_HOME=/usr/local/jdk1.8.0_111
HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=192.168.74.10:2181-Dspark.deploy.zookeeper.dir=/spark"

cp slaves.template slaves
vi slaves
host196
host197
host198

scp -r spark-2.3.2-bin-hadoop2.7/ host197:/opt/
scp -r spark-2.3.2-bin-hadoop2.7/ host198:/opt/

  1. 启动、停止服务
  • 启动
ssh host196
sbin/start-all.sh

ssh host197
sbin/start-master.sh

访问:http://host196:8080

  • 停止
ssh host196
sbin/stop-all.sh

ssh host197
sbin/stop-master.sh

  1. 基本测试

使用spark-shell来测试

[root@host198 spark-2.3.2-bin-hadoop2.7]# bin/spark-shell 
2018-10-23 10:27:08 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://host198:4040
Spark context available as 'sc' (master = local[*], app id = local-1540261648120).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.2
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val file=sc.textFile("hdfs://192.168.74.10:9000/opt/hdfs_test/input/words.txt")
file: org.apache.spark.rdd.RDD[String] = hdfs://192.168.74.10:9000/opt/hdfs_test/input/words.txt MapPartitionsRDD[1] at textFile at <console>:24

scala> val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:25

scala> rdd.collect()
res0: Array[(String, Int)] = Array((zhangsan,1), (wangwu,1), (hello,3), (lisi,1))

scala> rdd.foreach(println)
(zhangsan,1)
(wangwu,1)
(hello,3)
(lisi,1)

  1. FAQ
  2. 参考资料
    原文作者:大奇的改变
    原文地址: https://www.jianshu.com/p/3d4f11d20b83
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞