一、HDFS基本命令
1.创建目录:-mkdir
[jun@master ~]$ hadoop fs -mkdir /test [jun@master ~]$ hadoop fs -mkdir /test/input
2.查看文件列表:-ls
[jun@master ~]$ hadoop fs -ls / Found 1 items drwxr-xr-x - jun supergroup 0 2018-07-22 10:31 /test [jun@master ~]$ hadoop fs -ls /test Found 1 items drwxr-xr-x - jun supergroup 0 2018-07-22 10:31 /test/input
3.上传文件到HDFS
在/home/jun下新建两个文件jun.dat和jun.txt
(1)使用-put将文件从本地复制到HDFS集群
[jun@master ~]$ hadoop fs -put /home/jun/jun.dat /test/input/jun.dat
(2)使用-copyFromLocal将文件从本地复制到HDFS集群
[jun@master ~]$ hadoop fs -copyFromLocal -f /home/jun/jun.txt /test/input/jun.txt
(3)查看是否复制成功
[jun@master ~]$ hadoop fs -ls /test/input Found 2 items -rw-r--r-- 1 jun supergroup 22 2018-07-22 10:38 /test/input/jun.dat -rw-r--r-- 1 jun supergroup 22 2018-07-22 10:39 /test/input/jun.txt
4.下载文件到本地
(1)使用-get将文件从HDFS集群复制到本地
[jun@master ~]$ hadoop fs -get /test/input/jun.dat /home/jun/jun1.dat
(2)使用-copyToLocal将文件从HDFS集群复制到本地
[jun@master ~]$ hadoop fs -copyToLocal /test/input/jun.txt /home/jun/jun1.txt
(3)查看是否复制成功
[jun@master ~]$ ls -l /home/jun/ total 16 drwxr-xr-x. 2 jun jun 6 Jul 19 15:14 Desktop drwxr-xr-x. 2 jun jun 6 Jul 19 15:14 Documents drwxr-xr-x. 2 jun jun 6 Jul 19 15:14 Downloads drwxr-xr-x. 10 jun jun 161 Jul 21 19:25 hadoop drwxrwxr-x. 3 jun jun 17 Jul 20 20:07 hadoopdata -rw-r--r--. 1 jun jun 22 Jul 22 10:43 jun1.dat -rw-r--r--. 1 jun jun 22 Jul 22 10:44 jun1.txt -rw-rw-r--. 1 jun jun 22 Jul 22 10:35 jun.dat -rw-rw-r--. 1 jun jun 22 Jul 22 10:35 jun.txt drwxr-xr-x. 2 jun jun 6 Jul 19 15:14 Music drwxr-xr-x. 2 jun jun 6 Jul 19 15:14 Pictures drwxr-xr-x. 2 jun jun 6 Jul 19 15:14 Public drwxr-xr-x. 2 jun jun 6 Jul 20 16:43 Resources drwxr-xr-x. 2 jun jun 6 Jul 19 15:14 Templates drwxr-xr-x. 2 jun jun 6 Jul 19 15:14 Videos
5.查看HDFS集群中的文件
[jun@master ~]$ hadoop fs -cat /test/input/jun.txt This is the txt file. [jun@master ~]$ hadoop fs -text /test/input/jun.txt This is the txt file. [jun@master ~]$ hadoop fs -tail /test/input/jun.txt This is the txt file.
6.删除HDFS文件
[jun@master ~]$ hadoop fs -rm /test/input/jun.txt Deleted /test/input/jun.txt [jun@master ~]$ hadoop fs -ls /test/input Found 1 items -rw-r--r-- 1 jun supergroup 22 2018-07-22 10:38 /test/input/jun.dat
7.也可以在slave节点上执行命令
[jun@slave0 ~]$ hadoop fs -ls /test/input Found 1 items -rw-r--r-- 1 jun supergroup 22 2018-07-22 10:38 /test/input/jun.dat
二、在Hadoop集群中运行程序
Hadoop安装文件中有一个MapReduce示例程序,该程序用来计算圆周率pi的Java程序包,
参数说明:pi(类名)、10(Map次数)、10(随机生成点的次数)
[jun@master ~]$ hadoop jar /home/jun/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.4.jar pi 10 10 Number of Maps = 10 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 18/07/22 10:55:07 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.100:18040 18/07/22 10:55:08 INFO input.FileInputFormat: Total input files to process : 10 18/07/22 10:55:08 INFO mapreduce.JobSubmitter: number of splits:10 18/07/22 10:55:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532226440522_0001 18/07/22 10:55:10 INFO impl.YarnClientImpl: Submitted application application_1532226440522_0001 18/07/22 10:55:10 INFO mapreduce.Job: The url to track the job: http://master:18088/proxy/application_1532226440522_0001/ 18/07/22 10:55:10 INFO mapreduce.Job: Running job: job_1532226440522_0001 18/07/22 10:55:20 INFO mapreduce.Job: Job job_1532226440522_0001 running in uber mode : false 18/07/22 10:55:20 INFO mapreduce.Job: map 0% reduce 0% 18/07/22 10:56:21 INFO mapreduce.Job: map 10% reduce 0% 18/07/22 10:56:22 INFO mapreduce.Job: map 40% reduce 0% 18/07/22 10:56:23 INFO mapreduce.Job: map 50% reduce 0% 18/07/22 10:56:33 INFO mapreduce.Job: map 100% reduce 0% 18/07/22 10:56:34 INFO mapreduce.Job: map 100% reduce 100% 18/07/22 10:56:36 INFO mapreduce.Job: Job job_1532226440522_0001 completed successfully 18/07/22 10:56:36 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=226 FILE: Number of bytes written=1738836 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2590 HDFS: Number of bytes written=215 HDFS: Number of read operations=43 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=10 Launched reduce tasks=1 Data-local map tasks=10 Total time spent by all maps in occupied slots (ms)=635509 Total time spent by all reduces in occupied slots (ms)=10427 Total time spent by all map tasks (ms)=635509 Total time spent by all reduce tasks (ms)=10427 Total vcore-milliseconds taken by all map tasks=635509 Total vcore-milliseconds taken by all reduce tasks=10427 Total megabyte-milliseconds taken by all map tasks=650761216 Total megabyte-milliseconds taken by all reduce tasks=10677248 Map-Reduce Framework Map input records=10 Map output records=20 Map output bytes=180 Map output materialized bytes=280 Input split bytes=1410 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=280 Reduce input records=20 Reduce output records=0 Spilled Records=40 Shuffled Maps =10 Failed Shuffles=0 Merged Map outputs=10 GC time elapsed (ms)=59206 CPU time spent (ms)=54080 Physical memory (bytes) snapshot=2953310208 Virtual memory (bytes) snapshot=23216238592 Total committed heap usage (bytes)=2048393216 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97 Job Finished in 88.689 seconds Estimated value of Pi is 3.20000000000000000000
最后可以看到,得到的结果近似为3.2。