一次SystemServer ANR 最终导致重启的分析
涉及文件:
1 bugreport
2 BinderTrace
3 WDT
AA :
从Bugreport中找到kill system_server的关键字,如下:
W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in handler on i/o thread (android.io)
06-27 09:41:10.844 1000 894 1788 W Watchdog: i/o thread stack trace:
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.os.BinderProxy.transactNative(Native Method)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.os.BinderProxy.transact(Binder.java:789)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at com.android.internal.telephony.ITelephony$Stub$Proxy.getTetherApnRequired(ITelephony.java:4099)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.telephony.TelephonyManager.getTetherApnRequired(TelephonyManager.java:4968)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at com.android.server.connectivity.tethering.TetheringConfiguration.checkDunRequired(TetheringConfiguration.java:189)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at com.android.server.connectivity.tethering.TetheringConfiguration.<init>(TetheringConfiguration.java:100)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at com.android.server.connectivity.Tethering.updateConfiguration(Tethering.java:263)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at com.android.server.connectivity.Tethering.-wrap13(Unknown Source:0)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at com.android.server.connectivity.Tethering$StateReceiver.onReceive(Tethering.java:809)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$-android_app_LoadedApk$ReceiverDispatcher$Args_52497(LoadedApk.java:1320)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.app.-$Lambda$aS31cHIhRx41653CMnd4gZqshIQ.$m$7(Unknown Source:4)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.app.-$Lambda$aS31cHIhRx41653CMnd4gZqshIQ.run(Unknown Source:39)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.os.Handler.handleCallback(Handler.java:790)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.os.Looper.loop(Looper.java:171)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:65)
06-27 09:41:10.844 1000 894 1788 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:46)
06-27 09:41:10.844 1000 894 1788 W Watchdog: *** GOODBYE!
查看进程被杀原因:Binder IO block导致,那么接下来需要追查system_server正在跟谁进行binder通讯
BB :
从BinderTraces文件中查看进程894 相关transaction相关信息,如下:
Line 3092: pending transaction 6820434: ffffffe2bb9a6d80 from 894:1080 to 1854:0 code 17 flags 10 pri 0:130 r1 node 12820 size 80:0 data ffffff8026005438
Line 3094: pending transaction 6820815: ffffffe202187a80 from 894:1162 to 1854:0 code 61 flags 10 pri 0:120 r1 node 14448 size 92:0 data ffffff8026005758
Line 3108: pending transaction 6821061: ffffffe262236980 from 894:3197 to 1854:0 code 14 flags 10 pri 0:120 r1 node 12820 size 80:0 data ffffff8026006318
Line 3109: pending transaction 6821064: ffffffe262236700 from 894:1792 to 1854:0 code 6 flags 10 pri 0:120 r1 node 12820 size 100:0 data ffffff80260063c0
Line 3148: pending transaction 6824766: ffffffe2623b3e80 from 894:1547 to 1854:0 code 6 flags 10 pri 0:112 r1 node 12820 size 100:0 data ffffff8026009ff8
Line 3149: pending transaction 6827279: ffffffe24d98fe00 from 894:1582 to 1854:0 code 14 flags 10 pri 0:120 r1 node 12820 size 80:0 data ffffff802600b7d8
Line 8842: outgoing transaction 6820434: ffffffe2bb9a6d80 from 894:1080 to 1854:0 code 17 flags 10 pri 0:130 r1
Line 8848: outgoing transaction 6820815: ffffffe202187a80 from 894:1162 to 1854:0 code 61 flags 10 pri 0:120 r1
Line 8858: outgoing transaction 6824766: ffffffe2623b3e80 from 894:1547 to 1854:0 code 6 flags 10 pri 0:112 r1
Line 8866: outgoing transaction 6827279: ffffffe24d98fe00 from 894:1582 to 1854:0 code 14 flags 10 pri 0:120 r1
Line 8874: outgoing transaction 6821064: ffffffe262236700 from 894:1792 to 1854:0 code 6 flags 10 pri 0:120 r1
Line 8887: outgoing transaction 6821061: ffffffe262236980 from 894:3197 to 1854:0 code 14 flags 10 pri 0:120 r1
Line 11051: incoming transaction 6827914: ffffffe22b82b400 from 894:1497 to 794:1495 code 4 flags 10 pri 1:89 r1 node
可以看到894正在跟1854通讯,
Tips:
pending transaction: 记录当前所有进程和线程 TODO队列的transaction.
outgoing transaction: 当前线程transaction_stack, 由该线程发出的事务;
incoming transaction: 当前线程transaction_stack, 由需要线程接收的事务;
pending transactions: 记录当前进程总的pending事务;
CC :
从WDT文件中查看Binder:1854_相关信息,如下:
"Binder:1854_1" prio=5 tid=8 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x14801768 self=0x7212c14400
| sysTid=1875 nice=0 cgrp=default sched=0/0 handle=0x72038d74f0
| state=S schedstat=( 13768917056 19184339770 80523 ) utm=843 stm=533 core=0 HZ=100
| stack=0x72037dd000-0x72037df000 stackSize=1005KB
| held mutexes=
at com.qualcomm.qti.internal.telephony.QtiIccPhoneBookInterfaceManager.getAdnRecordsInEf(QtiIccPhoneBookInterfaceManager.java:90)
- waiting to lock <0x02e2601a> (a java.lang.Object) held by thread 40
at com.android.internal.telephony.UiccPhoneBookController.getAdnRecordsInEfForSubscriber(UiccPhoneBookController.java:123)
at com.android.internal.telephony.IccProvider.loadFromEf(IccProvider.java:430)
at com.android.internal.telephony.IccProvider.query(IccProvider.java:104)
at android.content.ContentProvider.query(ContentProvider.java:1105)
at android.content.ContentProvider.query(ContentProvider.java:1197)
at android.content.ContentProvider$Transport.query(ContentProvider.java:247)
at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:102)
at android.os.Binder.execTransact(Binder.java:708)
。。。。。。
。。。。。。
"Binder:1854_7" prio=5 tid=40 Waiting
| group="main" sCount=1 dsCount=0 flags=1 obj=0x14804270 self=0x7212ceb400
| sysTid=2970 nice=0 cgrp=default sched=0/0 handle=0x72004524f0
| state=S schedstat=( 13309285438 19466662160 79302 ) utm=740 stm=590 core=4 HZ=100
| stack=0x7200358000-0x720035a000 stackSize=1005KB
| held mutexes=
at java.lang.Object.wait(Native method)
- waiting on <0x0c573241> (a java.lang.Object)
at com.qualcomm.qti.internal.telephony.uicccontact.QtiSimPhoneBookAdnRecordCache.queryAdnRecord(QtiSimPhoneBookAdnRecordCache.java:149)
at com.qualcomm.qti.internal.telephony.uicccontact.QtiSimPhoneBookAdnRecordCache.requestLoadAllAdnLike(QtiSimPhoneBookAdnRecordCache.java:177)
- locked <0x0c573241> (a java.lang.Object)
at com.qualcomm.qti.internal.telephony.QtiIccPhoneBookInterfaceManager.getAdnRecordsInEf(QtiIccPhoneBookInterfaceManager.java:98)
- locked <0x02e2601a> (a java.lang.Object)
at com.android.internal.telephony.UiccPhoneBookController.getAdnRecordsInEfForSubscriber(UiccPhoneBookController.java:123)
at com.android.internal.telephony.IccProvider.loadFromEf(IccProvider.java:430)
at com.android.internal.telephony.IccProvider.query(IccProvider.java:104)
at android.content.ContentProvider.query(ContentProvider.java:1105)
at android.content.ContentProvider.query(ContentProvider.java:1197)
at android.content.ContentProvider$Transport.query(ContentProvider.java:247)
at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:102)
at android.os.Binder.execTransact(Binder.java:708)
。。。。。。
。。。。。。
"Binder:1854_F" prio=5 tid=63 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13f80e10 self=0x720206aa00
| sysTid=7468 nice=0 cgrp=default sched=0/0 handle=0x71fd9eb4f0
| state=S schedstat=( 14860937747 20246234650 84331 ) utm=866 stm=620 core=3 HZ=100
| stack=0x71fd8f1000-0x71fd8f3000 stackSize=1005KB
| held mutexes=
at com.qualcomm.qti.internal.telephony.QtiIccPhoneBookInterfaceManager.getAdnRecordsInEf(QtiIccPhoneBookInterfaceManager.java:90)
- waiting to lock <0x02e2601a> (a java.lang.Object) held by thread 40
at com.android.internal.telephony.UiccPhoneBookController.getAdnRecordsInEfForSubscriber(UiccPhoneBookController.java:123)
at com.android.internal.telephony.IccProvider.loadFromEf(IccProvider.java:430)
at com.android.internal.telephony.IccProvider.query(IccProvider.java:104)
at android.content.ContentProvider.query(ContentProvider.java:1105)
at android.content.ContentProvider.query(ContentProvider.java:1197)
at android.content.ContentProvider$Transport.query(ContentProvider.java:247)
at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:102)
at android.os.Binder.execTransact(Binder.java:708)
可以看到1854的16个binder全部耗尽了,并且都held by thread 40,
可以看到tid=40正在Waitting,通过callStack可以看到卡在QtiSimPhoneBookAdnRecordCache.java:149中
然后根据详情做相关修改即可