JNI的global reference table overflow是我们系统常见的一种泄露,当发生该问题时抓取hprof是一种非常有效快速定位问题的方法。 抓取hprof功能详细介绍请参考<抓取hprof的debug功能定制>一文,本文主要分享一些案例。
debug 开关
我们增加了抓取hprof的功能,但是需要设置一些property才能使能:
(ro.flyme.published != true && persist.sys.cts_state != true) OR (ro.monkey == true)
快速定位问题
对于AOSP,定位该问题的日志关键字是”global reference table overflow “,类似下面的日志(aosp是fatal的)
10-23 10:43:22.827 1397 7863 E zygote64: JNI ERROR (app bug): global reference table overflow (max=51200)
10-23 10:43:22.827 1397 7863 E zygote64: global reference table dump:
10-23 10:43:22.827 1397 7863 E zygote64: Last 10 entries (of 51200):
10-23 10:43:22.827 1397 7863 E zygote64: 51199: 0x13dc9508 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
10-23 10:43:22.827 1397 7863 E zygote64: 51197: 0x13dc8da0 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
10-23 10:43:22.827 1397 7863 E zygote64: 51196: 0x13d54ff0 java.lang.ref.WeakReference (referent is a android.os.BinderProxy
... ...
10-23 10:43:22.827 1397 7863 E zygote64: Summary:
10-23 10:43:22.827 1397 7863 E zygote64: 47902 of java.lang.ref.WeakReference (47902 unique instances)
10-23 10:43:22.827 1397 7863 E zygote64: 898 of com.android.server.content.ContentService$ObserverNode$ObserverEntry (898 unique instances)
10-23 10:43:22.827 1397 7863 E zygote64: 444 of android.os.RemoteCallbackList$Callback (444 unique instances)
10-23 10:43:22.827 1397 7863 E zygote64: 314 of java.lang.Class (236 unique instances)
... ...
上述日志告诉我们,有47902个WeakReference对象,最后10个分配的global reference是BinderProxy引用的WeakReference。
对于flyme系统,把JNI ERROR改成了error类型,抓取hprof后主动abort,所以除了看到上述日志外,还会在errormonitor日志中看到fatal exception是abort native crash, crash log如下:
10-23 10:43:26.946 21948 21948 F DEBUG : pid: 1397, tid: 21928, name: Dump Heap Threa >>> system_server <<<
10-23 10:43:26.946 21948 21948 F DEBUG : signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
10-23 10:43:26.946 21948 21948 F DEBUG : x0 0000000000000000 x1 00000000000055a8 x2 0000000000000006 x3 0000000000000008
10-23 10:43:26.946 21948 21948 F DEBUG : x4 0000000000000000 x5 0000000000000000 x6 0000000000000000 x7 00000000001fffff
10-23 10:43:26.946 21948 21948 F DEBUG : x8 0000000000000083 x9 0000000010000000 x10 00000072305ff310 x11 29a701d2c38f16c8
10-23 10:43:26.946 21948 21948 F DEBUG : x12 29a701d2c38f16c8 x13 0000000000000000 x14 ffffffffffffffdf x15 00000072f7ffd8a8
10-23 10:43:26.946 21948 21948 F DEBUG : x16 0000006148fa9fa8 x17 00000072f7f7f14c x18 0000007275200080 x19 0000000000000575
10-23 10:43:26.946 21948 21948 F DEBUG : x20 00000000000055a8 x21 0000000000000083 x22 0000000000001eb7 x23 00000072750eaf80
10-23 10:43:26.946 21948 21948 F DEBUG : x24 00000072749d3cc8 x25 0000007230503000 x26 000000000000c800 x27 0000000000000001
10-23 10:43:26.946 21948 21948 F DEBUG : x28 0000000000000000 x29 00000072305ff350 x30 00000072f7f25900
10-23 10:43:26.946 21948 21948 F DEBUG : sp 00000072305ff310 pc 00000072f7f25928 pstate 0000000060000000
10-23 10:43:26.956 21948 21948 F DEBUG :
10-23 10:43:26.956 21948 21948 F DEBUG : backtrace:
10-23 10:43:26.956 21948 21948 F DEBUG : #00 pc 000000000001e928 /system/lib64/libc.so (abort+120)
10-23 10:43:26.956 21948 21948 F DEBUG : #01 pc 0000000000263f50 /system/lib64/libart.so (art::Run(void*)+648)
10-23 10:43:26.956 21948 21948 F DEBUG : #02 pc 0000000000077980 /system/lib64/libc.so (__pthread_start(void*)+36)
10-23 10:43:26.956 21948 21948 F DEBUG : #03 pc 000000000001fd04 /system/lib64/libc.so (__start_thread+68)
案例分析
案例一 bug#842436
log戳这里
通过日志知道是android.os.BinderProxy引用的WeakReference类型的泄露,通过mat打开hprof
在histogram 页面按照对象个数降序排列除了一些基础类,android.os.BinderProx的对象占用高达47883也非常可以
p1.png
选中该类,右键list objects -> with incomming references 如下图一个个点击看谁引用了这些BinderProxy对象
P2.png
因为总的对象有4万多,所以也只能选择性的查看一些,上图看到点击前几个其mRemote都是IApplicationThread
Proxy的对象,所以首先怀疑它,在Dominator Tree里面搜索IApplicationThread
Proxy对象如下图
P3.png
这里看到一共才83个对象,而由代码可知一个IApplicationThread
Proxy对应一个android.os.BinderProxy最多也就占用83个BinderProxy,所以这里排除。
继续查看BinderProxy其他incomming的引用:
P4.png
这里发现0x18316300对象的引用是一个Hashmap的key, 且需要是696,这里非常可疑说明有一个Hashmap以binderproxy作为key,且至少有696个对象。那么我们看下这个Hashmap被谁引用了点击key 右键->list objects->with incoming references:
P5.png
这里找到了是PowerManagerService的mAudioMixMap对象我们with outgoing references看下mAudioMixMap 0x17e33af8对象的情况:
P6.png
结合定义:
private HashMap<IBinder, int[]> mAudioMixMap = new HashMap<>();
得出结果:
mAudioMixMap保存了BinderProxy作为key,int数组作为value, 这里出现了泄露导致mAudioMixMap有21396个BinderProxy引用没有释放。
案例二 bug#837168
通过日志确认泄露类型
09-16 05:42:21.520 1455 7439 E zygote64: JNI ERROR (app bug): global reference table overflow (max=51200)
09-16 05:42:21.521 1455 7439 E zygote64: global reference table dump:
09-16 05:42:21.521 1455 7439 E zygote64: Last 10 entries (of 51200):
09-16 05:42:21.521 1455 7439 E zygote64: 51199: 0x13d9f610 com.android.server.am.ContentProviderConnection
09-16 05:42:21.521 1455 7439 E zygote64: 51198: 0x13e40ae8 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
09-16 05:42:21.521 1455 7439 E zygote64: 51197: 0x13dc9a00 com.android.server.am.ContentProviderConnection
09-16 05:42:21.521 1455 7439 E zygote64: 51196: 0x13d9c638 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
09-16 05:42:21.521 1455 7439 E zygote64: 51195: 0x13e05a38 com.android.server.am.ContentProviderConnection
09-16 05:42:21.521 1455 7439 E zygote64: 51194: 0x13e40638 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
09-16 05:42:21.521 1455 7439 E zygote64: 51193: 0x13cbfad8 com.android.server.am.ContentProviderConnection
09-16 05:42:21.521 1455 7439 E zygote64: 51192: 0x13d985c0 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
09-16 05:42:21.521 1455 7439 E zygote64: 51191: 0x13d9b4f0 com.android.server.am.ContentProviderConnection
09-16 05:42:21.521 1455 7439 E zygote64: 51190: 0x13d96d08 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)
09-16 05:42:21.521 1455 7439 E zygote64: Summary:
09-16 05:42:21.521 1455 7439 E zygote64: 29925 of java.lang.ref.WeakReference (29925 unique instances)
09-16 05:42:21.521 1455 7439 E zygote64: 18591 of com.android.server.content.ContentService$ObserverNode$ObserverEntry (18591 unique instances)
09-16 05:42:21.521 1455 7439 E zygote64: 812 of android.os.RemoteCallbackList$Callback (812 unique instances)
09-16 05:42:21.521 1455 7439 E zygote64: 314 of java.lang.Class (236 unique instances
通过summay 可以知道泄露的类型是WeakReference和ObserverEntry,根据last 10 entries基本可以确认是BinderProxy引用的WeakReference是ObserverEntry引起。我们接着分析hprof