- 查看hadoop集群是否支持snappy库(可以看到snappy是没有安装的):
$ hadoop checknative
16/12/06 15:08:39 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
16/12/06 15:08:39 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /Users/hadoop/hadoop-2.7.2/lib/native/libhadoop.dylib
zlib: true /usr/lib/libz.1.dylib
snappy: false
lz4: true revision:99
bzip2: false
openssl: false build does not support openssl.
- 安装snappy
$ wget https://github.com/google/snappy/releases/download/1.1.3/snappy-1.1.3.tar.gz
$ tar xvfz snappy-1.1.3.tar.gz
$ cd snappy-1.1.3/
$ ./configure
$ make
$ sudo make install
- 安装后的snappy的lib在:/usr/local/lib目录下,查看snappy库文件
$ ls -lh /usr/local/lib |grep snappy
- 重新编译hadoop的native lib,然后用新生产lib覆盖工作机器上lib
- 查看hadoop是否支持snappy:
$ hadoop checknative -a
- 使用snappy进行文件压缩:
job.setOutputFormatClass(SequenceFileOutputFormat.class);
// 开启压缩
SequenceFileOutputFormat.setCompressOutput(job, true);
// 选用Snappy格式压缩,当然还可以选择Gzip压缩等
SequenceFileOutputFormat.setOutputCompressorClass(job, SnappyCodec.class);
// SequenceFileOutputFormat.setOutputCompressorClass(job,
// GzipCodec.class);
// 压缩以块的方式,不以记录,这样效率更高
SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);