Hive简单入门

简介

hive是一个客户端, 也可以当作一个软件, 它可以将hql(类似于sql)语句转化为mapreduce算法执行, 得到需要的结果.

原理就是将hadoop文件系统中的一定格式的文件的解析思路保存到mysql(或者其他数据库)中, 这样就可以从数据库解析方法操作分布式文件系统的文件了!

环境准备

1. 3台centOS 6.5

关闭防火墙
安装jdk
配置host ( zk1, zk2, zk3)
配置免密钥ssh (包括自己链接自己)

2. mysql一台(主机名mysql)

允许远程连接
给与数据库权限

安装运行hadoop

1. 配置hadoop

解压

mkdir -p /opt/modules/cdh/
tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/modules/cdh
cd /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop

修改配置文件

  • core-site.xml hdfs-site.xml yarn-site.xml mapred-site.xml hadoop.env.sh yarn-env.sh mapred-env.sh去掉后缀 .template
  • hadoop.env.sh yarn-env.sh mapred-env.sh添加JAVA_HOME的变量
    《Hive简单入门》 添加变量
  • core-site.xml
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://zk1:8020</value>
        </property>

        <property>
                <name>hadoop.tmp.dir</name>
                <value>/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/data</value>
        </property>
</configuration>
  • hdfs-site.xml
<configuration>
        <!-- 指定数据冗余份数 -->
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>

        <!-- 关闭权限检查-->
        <property>
                <name>dfs.permissions.enable</name>
                <value>false</value>
        </property>

        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>zk3:50090</value>
        </property>

        <property>
                <name>dfs.namenode.http-address</name>
                <value>zk1:50070</value>
        </property>

        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
</configuration>
  • yarn-site.xml
<configuration>
        <!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>

        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>zk2</value>
        </property>

        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>

        <property>
                <name>yarn.log-aggregation.retain-seconds</name>
                <value>86400</value>
        </property>

        <!-- 任务历史服务 -->
        <property>
                <name>yarn.log.server.url</name>
                <value>http://zk1:19888/jobhistory/logs/</value>
        </property>
</configuration>
  • mapred-site.xml
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>

        <property>
                <name>mapreduce.jobhistory.adress</name>
                <value>zk1:10020</value>
        </property>

        <property>
                <name>mapreduce.jobhistory.webapp.adress</name>
                <value>zk1:19888</value>
        </property>
</configuration>

添加slave文件 (etc/hadoop/目录下)

vi slave

添加

zk1
zk2
zk3

配置完成后scp到其他两台主机上

scp -r /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop root@zk2:/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/
scp -r /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop root@zk3:/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/

在namenode机器(zk1)执行格式化namenode

/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/bin/hdfs namenode -format

2. 启动hadoop

启动namenode (zk1)

/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start namenode

启动secondarynamenode (zk3)

/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start secondarynamenode

启动datanode (zk1, zk2, zk3)

/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start datanode

启动resourcemanager (zk2)

/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/yarn-daemon.sh start resourcemanager

启动nodemanager (zk1, zk2, zk3)

/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/yarn-daemon.sh start nodemanager

启动historyserver (zk1)

/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/mr-jobhistory-daemon.sh start historyserver

验证是否完成启动, 浏览器访问http://zk1:50070

《Hive简单入门》 运行成功

安装运行hive

安装hive

解压tarbao

tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz -C /opt/modules/cdh/

修改配置文件

  • 重命名配置文件
mv hive-default.xml.template hive-site.xml
mv hive-env.sh.template hive-env.sh
mv hive-log4j.properties.template hive-log4j.properties
  • hive-env.sh
JAVA_HOME=/usr/local/jdk
HADOOP_HOME=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/
export HIVE_CONF_DIR=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/conf
  • hive-site.xml (修改不是添加)
        <property>
                <name>javax.jdo.option.ConnectionURL</name>
                <value>jdbc:mysql://mysql:3306/metastore?createDatabaseIfNotExist=true</value>
                <description>JDBC connect string for a JDBC metastore</description>
        </property>

        <property>
                <name>javax.jdo.option.ConnectionDriverName</name>
                <value>com.mysql.jdbc.Driver</value>
                <description>Driver class name for a JDBC metastore</description>
        </property>

        <property>
                <name>javax.jdo.option.ConnectionUserName</name>
                <value>root</value>
                <description>username to use against metastore database</description>
        </property>

        <property>
                <name>javax.jdo.option.ConnectionPassword</name>
                <value>123123</value>
                <description>password to use against metastore database</description>
        </property>
  • hive-log4j.properties
hive.log.dir=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/logs

拷贝jdbc驱动到lib目录下

cp -a mysql-connector-java-5.1.27-bin.jar /opt/modules/cdh/hive-0.13.1-cdh5.3.6/lib/

运行hive

/opt/modules/cdh/hive-0.13.1-cdh5.3.6/bin/hive

测试

《Hive简单入门》 测试

    原文作者:烈格黑街
    原文地址: https://www.jianshu.com/p/447bb466bcd7
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞