前期准备(版本匹配):
Hadoop 2.x is faster and includes features, such as short-circuit reads, which will help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes that will improve your overall HBase experience. HBase 0.98 deprecates use of Hadoop 1.x, and HBase 1.0 will not support Hadoop 1.x.
Use the following legend to interpret this table:
S = supported and tested,
X = not supported,
NT = it should run, but not tested enough.
| HBase-0.92.x | HBase-0.94.x | HBase-0.96.x | HBase-0.98.x[a] | HBase-1.0.x[b] |
Hadoop-0.20.205 | S | X | X | X | X |
Hadoop-0.22.x | S | X | X | X | X |
Hadoop-1.0.0-1.0.2[c] | X | X | X | X | X |
Hadoop-1.0.3+ | S | S | S | X | X |
Hadoop-1.1.x | NT | S | S | X | X |
Hadoop-0.23.x | X | S | NT | X | X |
Hadoop-2.0.x-alpha | X | NT | X | X | X |
Hadoop-2.1.0-beta | X | NT | S | X | X |
Hadoop-2.2.0 | X | NT [d] | S | S | NT |
Hadoop-2.3.x | X | NT | S | S | NT |
Hadoop-2.4.x | X | NT | S | S | S |
Hadoop-2.5.x | X | NT | S | S | S |
具体内容参见:https://hbase.apache.org/book/configuration.html#hadoop |
hive与hadoop的版本匹配:
6 June, 2014: release 0.13.1 available
This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y
21 April, 2014: release 0.13.0 available
This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y
15 October, 2013: release 0.12.0 available
This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y
15 May, 2013: release 0.11.0 available
This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y
March, 2013: HCatalog merges into Hive
具体参见:http://hive.apache.org/downloads.html
版本选取:
Hadoop: hadoop-2.2.0.tar.gz
HBase : hbase-0.98.4-hadoop2-bin.tar.gz
JDK: jdk-7u65-linux-i586.gz
Linux环境: CentOS-6.5-x86_64
Hive: apache-hive-0.13.1-bin.tar.gz
Zookeeper: zookeeper-3.4.6.tar.gz
5个节点,各节点角色安排:
角色 | ip地址 | NameNode | DataNode | secondarynamenode | resourcemanager | nodemanager | HMaster | HRegionServer | zookeeper | hive |
master | 192.168.1.94 | Y |
| Y | Y |
|
|
|
|
|
slave1 | 192.168.1.105 |
| Y |
|
| Y | Y | Y |
| Y |
slave2 | 192.168.1.95 |
| Y |
|
| Y |
| Y | Y |
|
salve3 | 192.168.1.112 |
| Y |
|
| Y |
| Y | Y |
|
slave4 | 192.168.1.111 |
| Y |
|
| Y |
| Y | Y |
|
1、Hadoop的安装环境准备
1.1在所有节点上创建用户名为admin的用户,并设置密码为“password”
#useradd admin-d /home/admin
密码修改:
#echo “password” |passwd –stdin admin
1.2 在所有节点上修改/etc/hosts(root权限)
192.168.1.94 centos94
192.168.1.105 centos105
192.168.1.95 centos95
192.168.1.112 centos112
192.168.1.111 centos111
1.3 复制hadoop-2.2.0.tar.gz,hbase-0.98.4-hadoop2-bin.tar.gz,jdk-7u65-linux-i586.gz 到/home/admin
安装jdk(admin用户权限)
$tar -zxvf jdk-7u65-linux-i586.gz
删除centos自带的openjdk
查看版本
#rpm -qa |grep java
显示如下信息:
tzdata-java-2013g-1.el6.noarch
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.i686
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.i686
卸载:
yum -y remove java java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.i686
yum -y remove java java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.i686
yum -y remove java tzdata-java-2013g-1.el6.noarch
或者:
rpm -e –nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.i686
rpm -e –nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.i686
rpm -e –nodeps tzdata-java-2013g-1.el6.noarch
1.4 配置环境变量
编辑/etc/profile文件
export JAVA_HOME=/usr/local/jdk1.7.0_65
export CLASSPATH=.;%JAVA_HOME%\lib;%JAVA_HOME%\lib\tools.jar
export PATH=%JAVA_HOME%\bin;%PATH%
1.5 关闭防火墙和SELinux(root权限)
service iptables status
service iptables stop
chkconfig iptables off
1.6 设置静态ip地址(root权限)
vi /etc/sysconfig/network-scripts/ifcfg-eth0
IPADDR=192.168.1.94根据每台机器的IP来改)
GATEWAY=192.168.1.255
NETMASK=255.255.255.0
vi /etc/network
HOSTNAME=centos94(根据各个节点不同来改,并且跟hosts内一致)
1.7 SSH无密码访问设置(admin用户权限)
(在这个过程完成之后可能出现ssh访问仍然需要输入密码的问题,需要修改权限
chmod 700 .ssh
chmod 600 .ssh/*)
在每个节点上设置:
ssh-keygen -t rsa -P “”
一直确认键
将centos95/centos105/centos111/centos112的.ssh目录下的id_rsa.pub都复制到centos94节点的.ssh目录下
scp id_rsa.pub admin@centos94:/home/hadoop/.ssh/id_rsa.pub.centos95
scp id_rsa.pub admin@centos94:/home/hadoop/.ssh/id_rsa.pub.centos105
scp id_rsa.pub admin@centos94:/home/hadoop/.ssh/id_rsa.pub.centos112
scp id_rsa.pub admin@centos94:/home/hadoop/.ssh/id_rsa.pub.centos111
在centos94节点的.ssh目录下将id_rsa.pub、id_rsa.pub.centos95、id_rsa.pub.centos105、id_rsa.pub.centos111、id_rsa.pub.centos112合并为authorized_keys
将合并后的authorized_keys复制到slave节点的.ssh目录下
scp authorized_keys admin@centos95:/home/admin/.ssh/
scp authorized_keys admin@centos105:/home/admin/.ssh/
scp authorized_keys admin@centos111:/home/admin/.ssh/
scp authorized_keys admin@centos112:/home/admin/.ssh/
2、Hadoop的安装和配置(admin用户权限)
(安装成功以后可能出现启动仍然有问题,这时候查看hadoop和jdk安装之后文件的所属用户,如果不是admin用户,需要修改为admin用户:chown admin:admin hadoop-2.2.0)
2.1 解压hadoop-2.2.0.tar.gz到/home/admin目录下
配置/etc/profile文件
export HADOOP_HOME=/home/admin/hadoop-2.2.0
export PATH=$HADOOP_HOME/bin
配置文件放在$HADOOP_HOME/etc/hadoop目录,该目录下的core-site.xml、yarn-site.xml、hdfs-site.xml、mapred-site.xml都是空的。可以从HADOOP_HOME/share/hadoop目录下拷贝一份到/etc/hadoop目录,然后在此基础上修改。
cd $HADOOP_HOME
cp ./share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml ./etc/hadoop/core-site.xml
cp ./share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml ./etc/hadoop/hdfs-site.xml
cp ./share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml ./etc/hadoop/yarn-site.xml
cp ./share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml ./etc/hadoop/mapred-site.xml
接下来,对默认的文件做适当修改,否则无法启动成功。
2.2 配置hadoop-env.sh 文件
vi /home/admin/hadoop-2.2.0/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.7.0_65
2.3 配置core-site.xml文件
vi /home/admin/hadoop-2.2.0/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/admin/hadoop-2.2.0/tmp</value>
</property>
</configuration>
2.4 配置hdfs-site.xml文件
vi /home/admin/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/admin/hadoop-2.2.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/admin/hadoop-2.2.0/dfs/data</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
<description>The address and the base port where the dfs namenode web ui will listen on.</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
<description>The secondary namenode http server address and port.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>(数据保存份数)Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.</description>
</property>
</configuration>
2.5 配置yarn-site.xml文件
vi /home/admin/hadoop-2.2.0/etc/hadoop/yarn-site.xml
<configuration>
<!– Site specific YARN configuration properties –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
(要配上这个参数,否则在8088页面看不到nodemanager节点)
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
2.6 配置mapred-site.xml文件
cp /home/admin/hadoop-2.2.0/etc/hadoop/mapred-site.xml.template /home/admin/hadoop-2.2.0/etc/hadoop/mapred-site.xml
vi /home/admin/hadoop-2.2.0/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
3 启动hadoop集群
(集群每启动一次,就要删除各个节点下的tmp文件和dfs文件,否则datanode启动不了,
整个集群最好只是格式化一次)
3.1 格式化namenode
bin/hdfs namenode -format
3.2 启动:
sbin/start-dfs.sh
sbin/start-yarn.sh
3.3 若要查看mapreduce执行的历史记录,需单独启动jobhistoryserver进程
sbin/mr-jobhistory-daemon.sh start historyserver
注意:再次启动时,所有格式化命令都不用运行,直接启动。
4、Hive的安装
hive将元数据存储在RDBMS中,有三种方式可以连接到数据库:
1、内嵌模式:元数据保存在内嵌数据库的Derby,一般用于单元测试,只允许一个会话连接。
2、多用户模式:在本地安装MySQL,把元数据放到MySQL内。
3、远程模式:元数据放置在远程的MySQL数据库。
4.1 在安装hive之前,由于HWI功能依赖ant,需要首先安装ant
下载:apache-ant-1.9.4-bin.tar.gz版本
将apache-ant-1.9.4-bin.tar.gz解压到/opt目录下,并改名ant
在/etc/profile里边添加ANT_HOME和PATH路径并使profile文件生效
vi /etc/profile
ANT_HOME=/opt/ant
PATH=$ANT_HOME/bin:$PATH
source /etc/profile
检查ant是否安装成功:
ant -v 或 ant -version
4.2 MySQL的安装
下载MySQL安装包
MySQL-server-5.6.20-1.el7.x86_64.rpm
rpm -ivh MySQL-server-5.6.20-1.el7.x86_64.rpm
初始化MySQL并设置密码:
# /usr/bin/mysql_install_db
# service mysql start
# cat /root/.mysql_secret #查看root账号密码
# The random password set for the root user at Wed Dec 11 23:32:50 2014 (local time): qKTaFZnl
# mysql -uroot –pqKTaFZnl
mysql> SET PASSWORD = PASSWORD(‘123456’); #设置密码为123456
mysql> exit
# mysql -uroot -p123456
创建新用户:
mysql> create user ‘admin’@’%’ identified by ‘password’;
给新用户test_user授权,让他可以从外部登陆和本地登陆
注意:@左边是用户名,右边是域名、IP和%,表示可以访问mysql的域名和IP,%表示外部任何地址都能访问。
mysql> grant all privileges on *.* to ‘admin’@’%’ identified by ‘password’;
mysql> select user,host,password from mysql.user;
设置开机自启动
# chkconfig mysql on
# chkconfig –list | grep mysql
查看mysql的默认存储引擎
mysql> show engines;
+————+———+————————————————————+————–+——+————+
03.| Engine | Support | Comment | Transactions | XA | Savepoints |
04.+————+———+————————————————————+————–+——+————+
05.| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO |
06.| CSV | YES | CSV storage engine | NO | NO | NO |
07.| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO |
08.| InnoDB | YES | Supports transactions, row-level locking, and foreign keys | YES | YES | YES |
09.| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO |
10.+————+———+————————————————————+————–+——+————+
11.5 rows in set (0.00 sec)
执行结果可以看出,mysql的默认引擎是MyISAM,这个引擎是不支持事务的。
也可以以下面的方式查看
mysql> show variables like ‘storage_engine’;
+—————-+——–+
| Variable_name | Value |
+—————-+——–+
| storage_engine | MyISAM |
+—————-+——–+
1 row in set (0.00 sec)
.修改mysql的默认引擎为InnoDB
停止mysql
mysql> exit;
# service mysqld stop
修改/etc/my.cnf
[mysqld] 后加入
default-storage-engine=InnoDB
加入后my.cnf的内容为:
[root@bogon etc]# more my.cnf
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
default-storage-engine=InnoDB
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
启动mysql
# service mysqld start
Starting mysqld: [ OK ]
查看mysql默认存储引擎
[root@bogon etc]# mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql> show variables like ‘storage_engine’;
+—————-+——–+
| Variable_name | Value |
+—————-+——–+
| storage_engine | InnoDB |
+—————-+——–+
1 row in set (0.00 sec)
把MySQL的JDBC驱动包mysql-connector-java-5.1.12.jar加入到hive的lib目录下。
把jdk目录下的tools.jar文件复制到hive的lib目录下
4.3 hive选用版本
apache-hive-0.13.1-bin.tar.gz (Hive与hadoop版本的匹配请参考http://hive.apache.org/downloads.html )
4.4 下载Hive并放在/home/admin目录下
tar -zxvf apache-hive-0.13.1-bin.tar.gz
4.5 设置环境变量
vi /etc/profile
export HIVE_HOME=/home/admin/apache-hive-0.13.1-bin
export PATH=$HIVE_HOME/bin:$PATH
配置Hive
(1)修改/home/admin/apache-hive-0.13.1-bin/conf/hive-env.sh
export JAVA_HOME=/home/admin/jdk1.7.0_65
export HIVE_HOME=/home/admin/apache-hive-0.13.1-bin
export HADOOP_HOME=/home/admin/hadoop-2.2.0
(2)根据hive-default.xml复制hive-site.xml
cp /usr/local/apache-hive-0.13.1-bin/conf/hive-default.xml /usr/local/apache-hive-0.13.1-bin/conf/hive-site.xml
(3)配置hive-site.xml,主要配置项如下:
hive.metastore.warehouse.dir:(HDFS上的)数据目录
hive.exec.scratchdir:(HDFS上的)临时文件目录
hive.metastore.warehouse.dir默认值是/user/hive/warehouse
hive.exec.scratchdir默认值是/tmp/hive-${user.name}
cd /usr/local/apache-hive-0.13.1-bin/conf
cp hive-default.xml ./hive-site.xml
cp hive-env.sh.template hive-env.sh
cp hive-exec-log4j.properties.template ./hive-exec-log4j.properties
cp hive-log4j.properties.template ./hive-log4j.properties
以下是配置好的hive-site.xml文件:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.105:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>admin</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.matastore.local</name>
<value>true</value>
</property>
<property>
<name>hive.matastore.warehourse.dir</name>
<value>hdfs://192.168.1.105:8020/hive/warehouse</value>
</property>
<property>
<name>hive.hwi.listen.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.hwi.listen.port</name>
<value>9999</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/lib/hive-hwi-0.12.0.war</value>
(将0.12.0版本中的hive-hwi-0.12.0.war复制到0.13.1中的lib目录下,或者自己编译0.13.1的war包)
</property>
</configuration>
4.5 启动Hive
执行/home/admin/apache-hive-0.13.1-bin/bin/hive
4.6 启动hive web interface
执行/home/admin/apache-hive-0.13.1-bin/bin/hive –service hwi
5、Zookeeper的安装
解压zookeeper-3.4.6.tar.gz到目录/home/admin/下
在/etc/profile文件里边加入ZOOKEEPER_HOME和PATH
配置zoo.cfg
复制conf目录下的zoo-example.cfg为zoo.cfg
cp zoo-example.cfg ./zoo.cfg
vi zoo.cfg
dataDir=/home/admin/zookeeper-3.4.6/data
server.1=centos95:2888:3888
server.2=centos111:2888:3888
server.3=centos112:2888:3888
然后将配置好的Zookeeper分发到server.1/2/3上的/home/admin/zookeeprt-3.4.6下,并在每一个节点的dataDir,即/home/hadoop/zookeeper-3.4.6/data下创建一个myid文件,其中包含一个该节点对应的数字,即server.1/2/3中’.’后面的数字,该数字应该在1-255之间。
echo 1 > myid(在dataDir目录下)
启动zookeeper
在server.1/2/3上分别启动Zookeeper:
$ ~/zookeeper-3.4.6/bin/zkServer.sh start
测试3个节点是否能连接:
$ ~/zookeeper-3.4.6/bin/zkCli.sh master:2181
$ ~/zookeeper-3.4.6/bin/zkCli.sh slave1:2181
$ ~/zookeeper-3.4.6/bin/zkCli.sh slave2:2181
6、HBase的安装
配置/etc/profile
HBASE_HOME=/home/admin/hbase
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
export JAVA_HOME HADOOP_HOME HBASE_HOME PATH
将安装包解压到/home/admin下, 编辑conf/hbase-env.sh 在开头部分添加:
export JAVA_HOME=/home/admin/jdk1.7.0_65
export HBASE_LOG_DIR=/home/admin/hbase-0.98.4-hadoop2/logs
export HBASE_CLASSPATH=/home/admin/hbase-0.98.4-hadoop2/conf:/home/admin/hadoop-2.2.0/etc/hadoop
export HBASE_MANAGES_ZK=false
配置${HBASE_HOME}/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.tmp.dir</name>
<value>/home/admin/var/hbase</value>
</property>
<property >
<name>hbase.rootdir</name>
<value>hdfs://master:8020/hbase</value>
</property>
<property >
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>centos95,centos111,centos112</value>
</property>
<property>
<name>hbase.master.maxclockskew</name>
<value>180000</value>
</property>
</configuration>
对于hbase.master.info.bindAddress的配置需要注意,该项默认值是0.0.0.0,若改为某个结点的主机名或IP时,若在另外一个结点上使用start-hbase.sh启动hbase会失败,原因是使用start-hbase.sh启动时,会将当前结点作为master,即在当前结点上启动master服务,但如果hbase.master.info.bindAddress是另外一个结点,那么另外一个主机的地址是肯定无法bind到当前主机上的,所以HMaster服务就起不来了.
配置slave结点列表
通常情况我们使用start-hbase.sh脚本来启动整个集群,查看该脚本可以知道,该脚本会基于配置文件在目标结点上启动master,zookeeper和regionserver,而regionserver的列表是在${HBASE_HOME}/conf/regionservers文件中配置的,一个结点一行。所以我们需要在此文件中添加所有的regionserver机器名或IP。
启动HBase集群
执行:
start-hbase.sh
该命令可在任意结点上执行,不过需要注意的是:在哪个结点上执行该命令,该点将自动成为master(与zookeeper的配置不同,hbase的配置文件中不提供指定master的选项),如果需要多个back-up master,可在另外的结点上通过hbase-daemon.sh start master单独启动master!
以下是单独启动某项服务的命令:
启动master
hbase-daemon.sh start master
启动regionserver
hbase-daemon.sh start regionserver
所有服务启动后,访问:
http://master:60010
检查各结点的状态,如都能访问表示HBase没有问题,如无法访问或缺少节点,可分析log的中的信息找出问题原因。