安装spark前先安装scala
然后下载spark-1.6.3-bin-without-hadoop.tgz
解压到/usr/local/spark-1.6.3-bin-without-hadoop
配置环境变量
sudo nano /etc/profile
添加以下内容:
export SPARK_HOME=/usr/local/spark-1.6.3-bin-without-hadoop
export PATH=$SPARK_HOME/bin:$PATH
复制conf文件夹里面template一份,改名为spark-env.sh,在这里修改spark集群的参数
cp conf/spark-env.sh.template conf/spark-env.sh
最下面加入以下几行
export JAVA_HOME=/usr/local/jdk1.7.0_80
export SCALA_HOME=/usr/local/scala-2.11.11
export SPARK_MASTER_IP=master1
export HADOOP_CONF_DIR=/usr/local/hadoop-2.6.5/etc/Hadoop
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop-2.6.5/bin/Hadoop classpath)
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=10
export SPARK_WORKER_MEMORY=10g
export SPARK_WORKER_INSTANCES=1
export SPARK_EXECUTOR_CORES=5
export SPARK_EXECUTOR_MEMORY=7g
export SPARK_EXECUTOR_INSTANCES=2
export SPARK_DRIVER_MEMORY=4g
export SPARK_WORKER_DIR=/usr/local/spark-1.6.3-bin-without-hadoop/worker_dir
在conf下面新建一个叫slaves的文件,这个文件里存放的是spark集群子节点的hostname
添加以下几行
master1
master2
slave1
slave2
slave3
修改spark文件夹的权限
sudo chown –R hadoop-sna /usr/local/spark-1.6.3-bin-without-hadoop
sudo chgrp –R hadoop-sna /usr/local/spark-1.6.3-bin-without-hadoop
在所有的节点(slaves和主节点)上都按以上操作,
然后使用sbin目录下的start-master.sh启动主节点,使用start-slaves.sh启动子节点