实例分析:Phone进程的 binder_thread耗尽导致system ANR并重启

一次SystemServer ANR 最终导致重启的分析

涉及文件:
1 bugreport
2 BinderTrace
3 WDT

AA :
从Bugreport中找到kill system_server的关键字,如下:

W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in handler on i/o thread (android.io)
06-27 09:41:10.844  1000   894  1788 W Watchdog: i/o thread stack trace:
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.os.BinderProxy.transactNative(Native Method)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.os.BinderProxy.transact(Binder.java:789)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at com.android.internal.telephony.ITelephony$Stub$Proxy.getTetherApnRequired(ITelephony.java:4099)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.telephony.TelephonyManager.getTetherApnRequired(TelephonyManager.java:4968)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at com.android.server.connectivity.tethering.TetheringConfiguration.checkDunRequired(TetheringConfiguration.java:189)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at com.android.server.connectivity.tethering.TetheringConfiguration.<init>(TetheringConfiguration.java:100)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at com.android.server.connectivity.Tethering.updateConfiguration(Tethering.java:263)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at com.android.server.connectivity.Tethering.-wrap13(Unknown Source:0)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at com.android.server.connectivity.Tethering$StateReceiver.onReceive(Tethering.java:809)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$-android_app_LoadedApk$ReceiverDispatcher$Args_52497(LoadedApk.java:1320)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.app.-$Lambda$aS31cHIhRx41653CMnd4gZqshIQ.$m$7(Unknown Source:4)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.app.-$Lambda$aS31cHIhRx41653CMnd4gZqshIQ.run(Unknown Source:39)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:790)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:99)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.os.Looper.loop(Looper.java:171)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:65)
06-27 09:41:10.844  1000   894  1788 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:46)
06-27 09:41:10.844  1000   894  1788 W Watchdog: *** GOODBYE!

查看进程被杀原因:Binder IO block导致,那么接下来需要追查system_server正在跟谁进行binder通讯

BB :
从BinderTraces文件中查看进程894 相关transaction相关信息,如下:

    Line 3092:   pending transaction 6820434: ffffffe2bb9a6d80 from 894:1080 to 1854:0 code 17 flags 10 pri 0:130 r1 node 12820 size 80:0 data ffffff8026005438
    Line 3094:   pending transaction 6820815: ffffffe202187a80 from 894:1162 to 1854:0 code 61 flags 10 pri 0:120 r1 node 14448 size 92:0 data ffffff8026005758
    Line 3108:   pending transaction 6821061: ffffffe262236980 from 894:3197 to 1854:0 code 14 flags 10 pri 0:120 r1 node 12820 size 80:0 data ffffff8026006318
    Line 3109:   pending transaction 6821064: ffffffe262236700 from 894:1792 to 1854:0 code 6 flags 10 pri 0:120 r1 node 12820 size 100:0 data ffffff80260063c0
    Line 3148:   pending transaction 6824766: ffffffe2623b3e80 from 894:1547 to 1854:0 code 6 flags 10 pri 0:112 r1 node 12820 size 100:0 data ffffff8026009ff8
    Line 3149:   pending transaction 6827279: ffffffe24d98fe00 from 894:1582 to 1854:0 code 14 flags 10 pri 0:120 r1 node 12820 size 80:0 data ffffff802600b7d8
    Line 8842:     outgoing transaction 6820434: ffffffe2bb9a6d80 from 894:1080 to 1854:0 code 17 flags 10 pri 0:130 r1
    Line 8848:     outgoing transaction 6820815: ffffffe202187a80 from 894:1162 to 1854:0 code 61 flags 10 pri 0:120 r1
    Line 8858:     outgoing transaction 6824766: ffffffe2623b3e80 from 894:1547 to 1854:0 code 6 flags 10 pri 0:112 r1
    Line 8866:     outgoing transaction 6827279: ffffffe24d98fe00 from 894:1582 to 1854:0 code 14 flags 10 pri 0:120 r1
    Line 8874:     outgoing transaction 6821064: ffffffe262236700 from 894:1792 to 1854:0 code 6 flags 10 pri 0:120 r1
    Line 8887:     outgoing transaction 6821061: ffffffe262236980 from 894:3197 to 1854:0 code 14 flags 10 pri 0:120 r1
    Line 11051:     incoming transaction 6827914: ffffffe22b82b400 from 894:1497 to 794:1495 code 4 flags 10 pri 1:89 r1 node
    
可以看到894正在跟1854通讯,


Tips:
pending transaction: 记录当前所有进程和线程 TODO队列的transaction.
outgoing transaction: 当前线程transaction_stack, 由该线程发出的事务;
incoming transaction: 当前线程transaction_stack, 由需要线程接收的事务;
pending transactions: 记录当前进程总的pending事务;

CC :
从WDT文件中查看Binder:1854_相关信息,如下:

"Binder:1854_1" prio=5 tid=8 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x14801768 self=0x7212c14400
  | sysTid=1875 nice=0 cgrp=default sched=0/0 handle=0x72038d74f0
  | state=S schedstat=( 13768917056 19184339770 80523 ) utm=843 stm=533 core=0 HZ=100
  | stack=0x72037dd000-0x72037df000 stackSize=1005KB
  | held mutexes=
  at com.qualcomm.qti.internal.telephony.QtiIccPhoneBookInterfaceManager.getAdnRecordsInEf(QtiIccPhoneBookInterfaceManager.java:90)
  - waiting to lock <0x02e2601a> (a java.lang.Object) held by thread 40
  at com.android.internal.telephony.UiccPhoneBookController.getAdnRecordsInEfForSubscriber(UiccPhoneBookController.java:123)
  at com.android.internal.telephony.IccProvider.loadFromEf(IccProvider.java:430)
  at com.android.internal.telephony.IccProvider.query(IccProvider.java:104)
  at android.content.ContentProvider.query(ContentProvider.java:1105)
  at android.content.ContentProvider.query(ContentProvider.java:1197)
  at android.content.ContentProvider$Transport.query(ContentProvider.java:247)
  at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:102)
  at android.os.Binder.execTransact(Binder.java:708)
  
  
  。。。。。。
  。。。。。。



"Binder:1854_7" prio=5 tid=40 Waiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x14804270 self=0x7212ceb400
  | sysTid=2970 nice=0 cgrp=default sched=0/0 handle=0x72004524f0
  | state=S schedstat=( 13309285438 19466662160 79302 ) utm=740 stm=590 core=4 HZ=100
  | stack=0x7200358000-0x720035a000 stackSize=1005KB
  | held mutexes=
  at java.lang.Object.wait(Native method)
  - waiting on <0x0c573241> (a java.lang.Object)
  at com.qualcomm.qti.internal.telephony.uicccontact.QtiSimPhoneBookAdnRecordCache.queryAdnRecord(QtiSimPhoneBookAdnRecordCache.java:149)
  at com.qualcomm.qti.internal.telephony.uicccontact.QtiSimPhoneBookAdnRecordCache.requestLoadAllAdnLike(QtiSimPhoneBookAdnRecordCache.java:177)
  - locked <0x0c573241> (a java.lang.Object)
  at com.qualcomm.qti.internal.telephony.QtiIccPhoneBookInterfaceManager.getAdnRecordsInEf(QtiIccPhoneBookInterfaceManager.java:98)
  - locked <0x02e2601a> (a java.lang.Object)
  at com.android.internal.telephony.UiccPhoneBookController.getAdnRecordsInEfForSubscriber(UiccPhoneBookController.java:123)
  at com.android.internal.telephony.IccProvider.loadFromEf(IccProvider.java:430)
  at com.android.internal.telephony.IccProvider.query(IccProvider.java:104)
  at android.content.ContentProvider.query(ContentProvider.java:1105)
  at android.content.ContentProvider.query(ContentProvider.java:1197)
  at android.content.ContentProvider$Transport.query(ContentProvider.java:247)
  at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:102)
  at android.os.Binder.execTransact(Binder.java:708)
  
  。。。。。。
  。。。。。。
  
  "Binder:1854_F" prio=5 tid=63 Blocked
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x13f80e10 self=0x720206aa00
  | sysTid=7468 nice=0 cgrp=default sched=0/0 handle=0x71fd9eb4f0
  | state=S schedstat=( 14860937747 20246234650 84331 ) utm=866 stm=620 core=3 HZ=100
  | stack=0x71fd8f1000-0x71fd8f3000 stackSize=1005KB
  | held mutexes=
  at com.qualcomm.qti.internal.telephony.QtiIccPhoneBookInterfaceManager.getAdnRecordsInEf(QtiIccPhoneBookInterfaceManager.java:90)
  - waiting to lock <0x02e2601a> (a java.lang.Object) held by thread 40
  at com.android.internal.telephony.UiccPhoneBookController.getAdnRecordsInEfForSubscriber(UiccPhoneBookController.java:123)
  at com.android.internal.telephony.IccProvider.loadFromEf(IccProvider.java:430)
  at com.android.internal.telephony.IccProvider.query(IccProvider.java:104)
  at android.content.ContentProvider.query(ContentProvider.java:1105)
  at android.content.ContentProvider.query(ContentProvider.java:1197)
  at android.content.ContentProvider$Transport.query(ContentProvider.java:247)
  at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:102)
  at android.os.Binder.execTransact(Binder.java:708)

可以看到1854的16个binder全部耗尽了,并且都held by thread 40,
可以看到tid=40正在Waitting,通过callStack可以看到卡在QtiSimPhoneBookAdnRecordCache.java:149中
然后根据详情做相关修改即可

    原文作者:ThinkToHappy
    原文地址: https://www.jianshu.com/p/f4eb98243c2f
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系博主进行删除。
点赞