Hadoop on Docker

docker安装

安装

yum install -y epel-releas
yum install docker-io

加入开机启动

chkconfig docker on

启动docker

service docker start

拉取基础镜像 centos

centos镜像

sudo docker pull insaneworks/centos

制作Hadoop镜像

进入centos容器

sudo docker run -it -h master --name master insaneworks/centos /bin/bash

装gcc

yum install -y gcc

装vim

yum install -y vim

装lrzsz

yum install -y lrzsz

装ssh

yum -y install openssh-server

yum -y install openssh-clients

修改ssh配置

vim /etc/ssh/sshd_config

放开 PermitEmptyPasswords no

更改 UsePAM no

放开 PermitRootLogin yes

启动sshd

service sshd start

ssh密码设置

ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

ssh连master

ssh master    

安装java

在docker容器中安装Java(从宿主机向docker容器中拷贝文件)

安装tar

yum install -y tar

下载Hadoop

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.8.2/hadoop-2.8.2.tar.gz    

tar.gz是已经编译好的了,解压改改配置文件就能用。src.tar.gz是源码,要编译才能用。tar.gz的是给32位机器用的,想在64位机器上用只能用编译的。

解压

tar zxvf hadoop-2.8.2.tar.gz

配置环境变量

export HADOOP_HOME=/home/hadoop/hadoop-2.8.2
export PATH=$JAVAHOME/bin:$HADOOP_HOME/bin:$PATH

hadoop-env.shyarn-env.sh中添加环境变量

vim hadoop-env.sh
vim yarn-env.sh

添加环境变量
export JAVA_HOME=/usr/java/jdk1.7.0_75

修改 hadoop core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/mnt/hadoop-2.8.2/tmp</value>
</property>
</configuration>   

修改hdfs-site.xml

<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/mnt/hadoop-2.8.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/mnt/hadoop-2.8.2/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

修改mapred-site.xml文件

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>

修改yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
</configuration>

slaves文件中添加

slave1
slave2
slave3

ldd

yum install -y wget

wget http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.gz

tar zxvf glibc-2.14.tar.gz

cd glibc-2.14

mkdir build

cd build

../configure --prefix=/usr/local/glibc-2.14

make

make install

ln -sf /usr/local/glibc-2.14/lib/libc-2.14.so /lib64/libc.so.6

ldd /home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0

commit镜像

docker commit master hadoop

启动hadoop集群

docker rm master

sudo docker run -it -p 50070:50070 -p 19888:19888 -p 8088:8088 -h master --name master hadoop /bin/bash

sudo docker run -it -h slave1 --name slave1 hadoop /bin/bash

sudo docker run -it -h slave2 --name slave2 hadoop /bin/bash

sudo docker run -it -h slave3 --name slave3 hadoop /bin/bash
    

进入到每个节点执行

source /etc/profile

service sshd start

每个节点配置hosts

查看IP地址
docker inspect --format='{{.NetworkSettings.IPAddress}}' master

配置hosts
172.42.0.42 master

172.42.0.46  slave1

172.42.0.47  slave2

172.42.0.48  slave3

启动hadoop

./sbin/start-all.sh

查看

jps  

参考:

http://blog.csdn.net/xu470438000/article/details/50512442
http://www.tashan10.com/yong-dockerda-jian-hadoopwei-fen-bu-shi-ji-qun/
http://www.thebigdata.cn/Hadoop/30208.html
http://www.cnblogs.com/songfy/p/4716431.html
    原文作者:afra
    原文地址: https://segmentfault.com/a/1190000011988283
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞