1. 背景
异常堆栈如下
2015-12-23 10:44:45,289 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 71 segments left of total size: 43979223288 bytes
2015-12-23 10:44:45,372 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2015-12-23 11:08:39,995 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.lang.StringCoding.safeTrim(StringCoding.java:79)
at java.lang.StringCoding.access$300(StringCoding.java:50)
at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:305)
at java.lang.StringCoding.encode(StringCoding.java:344)
at java.lang.String.getBytes(String.java:916)
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
at java.io.File.isDirectory(File.java:843)
at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList(ProcfsBasedProcessTree.java:495)
at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:210)
at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:847)
at org.apache.hadoop.mapred.Task.updateCounters(Task.java:986)
at org.apache.hadoop.mapred.Task.access$500(Task.java:79)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:735)
at java.lang.Thread.run(Thread.java:724)
2015-12-23 11:09:39,065 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.io.UnixFileSystem.list(Native Method)
at java.io.File.list(File.java:1116)
at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.getProcessList(ProcfsBasedProcessTree.java:488)
at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:210)
at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:847)
at org.apache.hadoop.mapred.Task.updateCounters(Task.java:986)
at org.apache.hadoop.mapred.Task.access$500(Task.java:79)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:735)
at java.lang.Thread.run(Thread.java:724)
2015-12-23 11:12:00,745 INFO [communication thread] org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at sun.nio.cs.StreamDecoder.<init>(StreamDecoder.java:250)
at sun.nio.cs.StreamDecoder.<init>(StreamDecoder.java:230)
at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:69)
at java.io.InputStreamReader.<init>(InputStreamReader.java:74)
at java.io.FileReader.<init>(FileReader.java:72)
at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:524)
at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:223)
at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:847)
at org.apache.hadoop.mapred.Task.updateCounters(Task.java:986)
at org.apache.hadoop.mapred.Task.access$500(Task.java:79)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:735)
at java.lang.Thread.run(Thread.java:724)
通过jmap打印heap如下
Attaching to process ID 92841, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.0-b56
using thread-local object allocation.
Parallel GC with 23 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 4294967296 (4096.0MB)
NewSize = 268435456 (256.0MB)
MaxNewSize = 268435456 (256.0MB)
OldSize = 5439488 (5.1875MB)
NewRatio = 2
SurvivorRatio = 6
PermSize = 21757952 (20.75MB)
MaxPermSize = 134217728 (128.0MB)
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
PS Young Generation
Eden Space:
capacity = 201326592 (192.0MB)
used = 201326592 (192.0MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 33554432 (32.0MB)
used = 0 (0.0MB)
free = 33554432 (32.0MB)
0.0% used
To Space:
capacity = 33554432 (32.0MB)
used = 0 (0.0MB)
free = 33554432 (32.0MB)
0.0% used
PS Old Generation
capacity = 4026531840 (3840.0MB)
used = 4026067544 (3839.55721282959MB)
free = 464296 (0.44278717041015625MB)
99.9884690841039% used
PS Perm Generation
capacity = 32505856 (31.0MB)
used = 28384888 (27.06993865966797MB)
free = 4120968 (3.9300613403320312MB)
87.32238277312248% used
9035 interned Strings occupying 887704 bytes.
2. 产生原因
关于GC overhead limit exceeded , oracle官网有如下解释
The concurrent collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option-XX:-UseGCOverheadLimit to the command line.
大意为:JVM默认启动的时候-XX:+UseGCOverheadLimit,即启用了该特性。这其实是JVM的一种推断,如果垃圾回收耗费了98%的时间,但是回收的内存还不到2%,那么JVM会认为即将发生OOM,让程序提前结束。当然我们可以使用-XX:-UseGCOverheadLimit,关掉这个特性。
reduce代码如下:
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
List<Span> dataList = new ArrayList<Span>();
String name = "";
String url = "";
Long cost = 0L;
String ip = "";
Long sum = 0L;
for (Text value : values) {
String data = value.toString();
sum++;
try {
Span span = JSON.parseObject(data, Span.class);
if (span.getSpanId().equalsIgnoreCase("0")) {
name = span.getName();
url = span.getFuncName();
cost = span.getCost();
ip = span.getLocal().getHost().replaceAll("/", "");
}
dataList.add(span);
} catch (Exception e) {
e.printStackTrace();
}
}
if (sum > 100) {
System.out.println(key.toString() + ":" + sum);
}
if (name.length() != 0) {
TraceData trace = new TraceData(name, url, cost, dataList);
trace.setIp(ip);
context.write(key, new Text(JSON.toJSONString(trace)));
trace = null;
}
dataList.clear();
}
可以分析出,将value toString了之后,存入list列表中,如果某个key非常多,则造成reduce内存非常紧张, 上述问题就是因为业务线的某一个key超过3000W(换算出来需要16G内存,远超过可使用的内存),造成gc频繁,但回收不掉的问题。
3. 解决方案
- 由于集群中 mapred.job.reduce.memory.mb 参数被设置成 final,不允许用户随意更改设置,故用户可以通过增加reduce数量,或者优化reduce中代码。
- 但上述问题第一种解决方式显然不行,跟业务沟通,修改reduce业务逻辑,对单个Key对应的value数量进行限制。
- 如果要分析的key对应value值确实有很多,可以利用hadoop的partition将数据打散,再进行处理,这里先简单介绍下partition。Hadoop中为采用的partition策略为HashPartitioner, HashPartitioner是MapReduce的默认partitioner。计算方法是reduce=(key.hashcode()&Integer.MAX_VALUE)% numReduceTasks,得到当前的目的reducer,当然我们可以采用定制Partitioner来解决问题,Partitioner是partitioner的基类,如果需要定制partitioner则需要继承该类。示例代码如下:
public static class RandomPartitioner extends Partitioner<Text,Text> {
@Override
public int getPatition(Text key,Text value,int redueceNumber) {
return (int)((key.hashcode() * Match.random()) & Integer.MAX_VALUE) % numReduceTasks;
}
}