Mac环境下Hadoop的安装与配置
今天,由于云计算实验需要,同时对云计算也有很大兴趣,就在自己的Mac上安装了Hadoop。
===
首先
我来简短介绍一下Hadoop:
Hadoop是一个由Apache基金会所开发的分布式系统基础架构。
用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。
Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。
Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。
好的,了解了Hadoop,接下来,就直接进入正题
1. 安装环境
这是我Mac的系统版本:
macOS Sierra
版本10.12.3
2.安装jdk
之后会运行jar包,所以肯定需要java环境
打开terminal:
敲入命令
java -version
我的终端提示如下
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
说明我的电脑中已经装好了jdk,如果提示没装,请自行谷歌下载安装,此文不再一一叙述。
3. Mac OS X ssh设置
Mac下自带ssh,所以不需要安装ssh了。可以通过如下命令验证
➜ hadoop-1.2.1 which ssh
/usr/bin/ssh
➜ hadoop-1.2.1 which sshd
/usr/sbin/sshd
➜ hadoop-1.2.1 which ssh-keygen
/usr/bin/ssh-keygen
➜ hadoop-1.2.1
输入命令ssh localhost
,可能遇到如下问题
ssh: connect to host localhost port 22: Connection refused
原因是没打开远程登录,进入系统设置->共享->远程登录打开就好,这时你再ssh localhost
一下
➜ hadoop-1.2.1 ssh localhost
Password:
Last login: Tue Apr 18 09:45:33 2017 from ::1
期间你要输入你电脑的密码。
这里有个ssh免登录方法,具体我也没研究,不过依葫芦画瓢应该可行,有兴趣的可以试一下。
4. Hadoop1.2.1
选择如下版本
hadoop-1.2.1.tar.gz 06-Nov-2014 21:22 61M
下载完之后,我把它解压到了我的Documents即文稿目录下。
5. 设置环境变量
终端输入vim ~/.bash_profile
这里会问你是否编辑,有个安全提示,按E即可编辑。
在这里添加环境变量如下:
21 # Hadoop
22 export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home
23 export JRE_HOME=$JAVA_HOME/jre
24 export HADOOP_HOME=/Users/Apple/Documents/hadoop-1.2.1
25 export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
26 export HADOOP_HOME_WARN_SUPPRESS=1
27 export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$PATH
其中:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home
export JRE_HOME=$JAVA_HOME/jre
是java的系统环境变量。
export HADOOP_HOME=/Users/Apple/Documents/hadoop-1.2.1
是配置Hadoop的系统环境变量
export HADOOP_HOME_WARN_SUPPRESS=1
是防止出现:Warning: $HADOOP_HOME is deprecated
的警告错误。
上述环境变量增加完成后,退回到终端,输入:
source ~/.bash_profile
使得环境变量设置生效!
6、配置hadoop-env.sh
进入刚解压的Documents
目录下的Hadoop1.2.1
,
然后进入conf
文件夹,执行vim hadoop-env.sh
,对其进行如下配置
8 # The java implementation to use. Required.
9 export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home
10 #export JAVA_HOME=/usr/lib/j2sdk1.5-sun
11
12 # Extra Java CLASSPATH elements. Optional.
13 # export HADOOP_CLASSPATH=
14
15 # The maximum amount of heap to use, in MB. Default is 1000.
16 export HADOOP_HEAPSIZE=2000
17
18 # Extra Java runtime options. Empty by default.
19 export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
20 # export HADOOP_OPTS=-server
7、接着配置conf文件夹下的core-site.xml
core-site.xml
指定了NameNode
的主机名与端口
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>hdfs://localhost:9000</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
8、配置hdfs-site.xml
hdfs-site.xml
指定了HDFS的默认参数副本数,因为仅运行在一个节点上,所以这里的副本数为1
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
9.配置mapred-site.xml
mapred-site.xml
指定了JobTracker的主机名与端口
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
</configuration>
至此,在终端输入hadoop,就会出现如下
➜ ~ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
mradmin run a Map-Reduce admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
oiv apply the offline fsimage viewer to an fsimage
fetchdt fetch a delegation token from the NameNode
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
historyserver run job history servers as a standalone daemon
job manipulate MapReduce jobs
queue get information regarding JobQueues
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
distcp2 <srcurl> <desturl> DistCp version 2
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
➜ ~
表示已经可以找到Hadoop的执行程序。
在程序执行前,对Namenode执行格式化操作hadoop namenode -format
,出现如下图结果:
➜ ~ hadoop namenode -format
17/04/18 11:16:33 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = AppledeMacBook-Air-2.local/172.19.167.21
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.8.0_101
************************************************************/
17/04/18 11:16:33 INFO util.GSet: Computing capacity for map BlocksMap
17/04/18 11:16:33 INFO util.GSet: VM type = 64-bit
17/04/18 11:16:33 INFO util.GSet: 2.0% max memory = 1864368128
17/04/18 11:16:33 INFO util.GSet: capacity = 2^22 = 4194304 entries
17/04/18 11:16:33 INFO util.GSet: recommended=4194304, actual=4194304
17/04/18 11:16:33 INFO namenode.FSNamesystem: fsOwner=Apple
17/04/18 11:16:33 INFO namenode.FSNamesystem: supergroup=supergroup
17/04/18 11:16:33 INFO namenode.FSNamesystem: isPermissionEnabled=true
17/04/18 11:16:33 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
17/04/18 11:16:33 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
17/04/18 11:16:33 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
17/04/18 11:16:33 INFO namenode.NameNode: Caching file names occuring more than 10 times
17/04/18 11:16:34 INFO common.Storage: Image file hdfs:/localhost:9000/dfs/name/current/fsimage of size 111 bytes saved in 0 seconds.
17/04/18 11:16:34 INFO namenode.FSEditLog: closing edit log: position=4, editlog=hdfs:/localhost:9000/dfs/name/current/edits
17/04/18 11:16:34 INFO namenode.FSEditLog: close success: truncate to 4, editlog=hdfs:/localhost:9000/dfs/name/current/edits
17/04/18 11:16:34 INFO common.Storage: Storage directory hdfs:/localhost:9000/dfs/name has been successfully formatted.
17/04/18 11:16:34 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at AppledeMacBook-Air-2.local/172.19.167.21
************************************************************/
表示HDFS已经安装成功。
执行start-all.sh启动,期间输入了三次密码= =
➜ ~ start-all.sh
namenode running as process 61005. Stop it first.
Password:
localhost: starting datanode, logging to /Users/Apple/Documents/hadoop-1.2.1/libexec/../logs/hadoop-Apple-datanode-AppledeMacBook-Air-2.local.out
Password:
localhost: secondarynamenode running as process 61265. Stop it first.
starting jobtracker, logging to /Users/Apple/Documents/hadoop-1.2.1/libexec/../logs/hadoop-Apple-jobtracker-AppledeMacBook-Air-2.local.out
Password:
localhost: starting tasktracker, logging to /Users/Apple/Documents/hadoop-1.2.1/libexec/../logs/hadoop-Apple-tasktracker-AppledeMacBook-Air-2.local.out
这样说明启动成功
➜ ~ jps
61265 SecondaryNameNode
94723 Jps
61005 NameNode
➜ ~
浏览器输入网址http://localhost:50070
就能看到Hadoop的界面了:
NameNode 'localhost:8020'
Started: Tue Apr 18 10:24:19 CST 2017
Version: 1.2.1, r1503152
Compiled: Mon Jul 22 15:23:09 PDT 2013 by mattf
Upgrades: There are no upgrades in progress.
Browse the filesystem
Namenode Logs
Cluster Summary
1 files and directories, 0 blocks = 1 total. Heap Size is 77.5 MB / 1.74 GB (4%)
Configured Capacity : 0 KB
DFS Used : 0 KB
Non DFS Used : 0 KB
DFS Remaining : 0 KB
DFS Used% : 100 %
DFS Remaining% : 0 %
Live Nodes : 0
Dead Nodes : 0
Decommissioning Nodes : 0
Number of Under-Replicated Blocks : 0
There are no datanodes in the cluster
NameNode Storage:
Storage Directory Type State
hdfs:/localhost:9000/dfs/name IMAGE_AND_EDITS Active
This is Apache Hadoop release 1.2.1
至此,Hadoop的安装配置就全部完成了,如有任何疑问或者文章有任何错误,欢迎交流、批评与指正,本文章纯属原创,如需转载,请注明出处,谢谢!
联系方式:370555337@qq.com
MY BLOG