【Hadoop】集群中增加snappy压缩库

  1. 查看hadoop集群是否支持snappy库(可以看到snappy是没有安装的):
$ hadoop checknative
16/12/06 15:08:39 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
16/12/06 15:08:39 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /Users/hadoop/hadoop-2.7.2/lib/native/libhadoop.dylib
zlib:    true /usr/lib/libz.1.dylib
snappy:  false
lz4:     true revision:99
bzip2:   false 
openssl: false build does not support openssl.
  1. 安装snappy
  • 下载编译snappy:
$ wget https://github.com/google/snappy/releases/download/1.1.3/snappy-1.1.3.tar.gz
$ tar xvfz snappy-1.1.3.tar.gz
$ cd snappy-1.1.3/
$ ./configure
$ make
$ sudo make install
  • 安装后的snappy的lib在:/usr/local/lib目录下,查看snappy库文件
    $ ls -lh /usr/local/lib |grep snappy
  1. 重新编译hadoop的native lib,然后用新生产lib覆盖工作机器上lib
  2. 查看hadoop是否支持snappy:
$ hadoop checknative -a
  1. 使用snappy进行文件压缩:
job.setOutputFormatClass(SequenceFileOutputFormat.class);
  // 开启压缩
  SequenceFileOutputFormat.setCompressOutput(job, true);
  // 选用Snappy格式压缩,当然还可以选择Gzip压缩等
  SequenceFileOutputFormat.setOutputCompressorClass(job, SnappyCodec.class);
  // SequenceFileOutputFormat.setOutputCompressorClass(job,
  // GzipCodec.class);
  // 压缩以块的方式,不以记录,这样效率更高
  SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);
    原文作者:fanlehai
    原文地址: https://www.jianshu.com/p/af277e25ad23
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞