一、死機系統(tǒng)簡圖
當用戶對手機進行操作時, 對應的數(shù)據(jù)流將是下面一個概括的流程圖.
HW 如傳感器, 觸摸屏(TP), 物理按鍵(KP)等感知到用戶操作后,觸發(fā)相關(guān)的中斷(ISR) 傳遞給Kernel, Kernel 相關(guān)的driver 對這些ISR 進行處理后,轉(zhuǎn)化成標準的InputEvent.
User Space 的System Server 中的Input System 則持續(xù)監(jiān)聽Kernel 傳遞上來的原始InputEvent, 對其進行進一步的處理后, 變成上層APP 可直接處理的Input Event, 如button 點擊, 長按, 滑動等等.
APP 對相關(guān)的事件進行處理后,請求更新相關(guān)的邏輯界面,而這個則由System Server 中的WMS 等來負責.
相關(guān)的邏輯界面更新后(Z-Window), 則會請求SurfaceFlinger 來產(chǎn)生FrameBuffer 數(shù)據(jù), SurfaceFlinger 則會利用GPU 等來計算生成.
Display System/Driver 則會將FrameBuffer 中的數(shù)據(jù)更新顯示出來, 這樣用戶才能感知到他的操作行為.
二、死機的可能原因
原則上上面流程中,每一步出現(xiàn)問題,都可能引發(fā)死機問題. 大的方面講,可以分成 硬件HW 和 軟件SW 兩個層次, 硬件HW 不在我們的講訴之內(nèi).
1.軟件SW 上,死機的原因可以分成兩種:
邏輯行為異常
邏輯判斷錯誤
邏輯設(shè)計錯誤
邏輯卡頓(block)
死循環(huán) (Deadloop)
死鎖 (Deadlock)
2.從具體的原因上將,可以進一步分成:
(1). Input Driver
無法接收HW 的中ISR,產(chǎn)生原始的InputEvent, 或者產(chǎn)生的InputEvent 異常.
(2). Input System
無法監(jiān)聽Kernel 傳遞上來的原始InputEvent, 或者轉(zhuǎn)換與傳遞異常.
(3). System Logic
無法正常響應Input System 傳遞過來的InputEvent, 或者響應出錯.
(4). WMS/Surfaceflinger 行為異常
WMS/SF 無法正確的對Z-Window 進行疊加轉(zhuǎn)換
(5). Display System3
無法更新Framebuffer 數(shù)據(jù),或者填充的數(shù)據(jù)錯誤
(6). LCM Driver
無法將Framebuffer 數(shù)據(jù)顯示在LCM 上
(7). CPU Scheduler/File System/Memory Manager 等Kernel 支撐性模塊.
系統(tǒng)支撐性模塊無法正常, 導致整個流程都沒法正常進行, 如CPU Scheduler 無法正常執(zhí)行,導致thread 都無法被調(diào)度, memory 耗盡導致系統(tǒng)無法快速申請到memory, file system 卡住導致無法進行讀寫操作等.
3. 對應硬件HW hang, 經(jīng)常見得的情況有:
Power 異常
Clock/26M/32K 失效
CPU Core 無法啟動, 執(zhí)行異常等.
Memory & Memory Controller.
Fail IC.
三、死機問題需要分析哪些數(shù)據(jù)
1. 死機分析數(shù)據(jù)
俗話說, 巧婦難為無米之炊, 死機分析, 同樣需要獲取第一手的資料, 方可分析問題. 那么哪些數(shù)據(jù)可以用來分析死機呢。
大概的講,可以分成空間數(shù)據(jù)和時間數(shù)據(jù)??臻g數(shù)據(jù),即當時現(xiàn)場環(huán)境,如有哪些process 在運行,CPU 的執(zhí)行情況,memory 的利用情況,以及具體的process 的memory 數(shù)據(jù)等。 時間數(shù)據(jù),即行為上的連續(xù)數(shù)據(jù),比如某個Process 在一段時間內(nèi)執(zhí)行了哪些操作,某段時間內(nèi)CPU 利用率的變化等。通常時空都是交融的,對應我們抓取資訊時往往也是。
2. 哪些數(shù)據(jù)可以用來分析.
Backtrace
Backtrace 又分成Java backtrace, Native Backtrace, Kernel Backtrace. 它是分析死機的非常重要的手段,我們可以快速的知道,對應的process/thread 在當時正在執(zhí)行哪些動作,卡住哪里等??梢苑浅V庇^的分析死機現(xiàn)場。
另外還有一類trace 為ftrace/systrace, 除非特別開啟, 以及在特定的流程點上添加, 這類trace 往往不是很詳盡, 但也具有比較好的參考作用.
系統(tǒng)運行環(huán)境
客觀的反應系統(tǒng)的執(zhí)行環(huán)境,通常包括如CPU 利用率,Memory 使用情況, Storage 剩余情況等。這些資料也非常重要,比如可以快速的知道,當時是否有Process 在瘋狂的執(zhí)行,當時是不是處于嚴重的low memory 情況, Storage 是否有耗盡的情況發(fā)生等。
程序執(zhí)行環(huán)境
客觀的反應當時某個程序(Kernel 也可以看成一個程序)的執(zhí)行現(xiàn)場, 此類資訊通常包括如process 的coredump, java heap prof, kernel 的memory dump 等。完整的執(zhí)行環(huán)境,我們可以快速的知道當時具體的變量的值,寄存器值等,可以精細的分析問題。
其他的一些資訊
這些資訊相對來說,比較零散了,如通常的LOG, 一些debug 命令的結(jié)果, JTAG & CVD 的數(shù)據(jù)等。
四 、Backtrace 分析
1. Java Backtrace
從Java Backtrace, 我們可以知道當時Process 的虛擬機執(zhí)行狀態(tài). Java Backtrace 依靠SignalCatcher 來抓取.
Google default: SignalCatcher catchs SIGQUIT(3), and then print the java backtrace to /data/anr/trace.txt
MTK Enhance: SignalCatcher catchs SIGSTKFLT(16), and then print the java backtrace to /data/anr/mtktrace.txt( 僅僅 Android ICS 4.0 <-> Android M 6.0版本)
You can update system properties dalvik.vm.stack-trace-file to Change the address, default is /data/anr/traces.txt
1.1 抓取的方式
在ENG Build 中
adb remount adb shell chmod 0777 data/anr adb shell kill -3 pid adb pull /data/anr
在User Build 中
沒有root 權(quán)限的情況下,只能直接pull 已經(jīng)存在的backtrace.
adb pull /data/anr
你可以嘗試直接使用下面的腳本一次性抓取
adb remount adb shell chmod 0777 data/anr adb shell ps @echo off set processid= set /p processid=Please Input process id: @echo on adb shell kill -3 %processid% @echo off ping -n 8 127.0.0.1>nul @echo on adb pull data/anr/traces.txt trace-%processid%.txt pause
1.2 JavaBacktrace 解析
Android 比較新的版本的java backtrace, 除了直接的thread backtrace 之外, 同時也會把ART 的一些基本狀態(tài)也打印出來, 比較方便觀察ART 的基本狀態(tài), 比如:
----- pid 1051 at 2018-09-29 23:23:52 ----- Cmd line: system_server Build fingerprint: 'XXXXX/A73/A73:8.1.0/O11019/1537977601:user/release-keys' ABI: 'arm64' Build type: optimized Zygote loaded classes=5413 post zygote classes=6049 Intern table: 61702 strong; 12632 weak JNI: CheckJNI is off; globals=10050 (plus 1015 weak) Libraries: /system/lib64/libandroid.so /system/lib64/libandroid_servers.so /system/lib64/libcompiler_rt.so /system/lib64/libdcfdecoderjni.so /system/lib64/libjavacrypto.so /system/lib64/libjnigraphics.so /system/lib64/libmedia_jni.so /system/lib64/libmediatek_exceptionlog.so /system/lib64/libperfframeinfo_jni.so /system/lib64/librutils.so /system/lib64/libsoundpool.so /system/lib64/libwebviewchromium_loader.so /system/lib64/libwifi-service.so /vendor/lib64/libnativecheck-jni.so libjavacore.so libopenjdk.so (17) Heap: 9% free, 80MB/89MB; 1893251 objects Dumping cumulative Gc timings Start Dumping histograms for 12601 iterations for concurrent copying ProcessMarkStack: Sum: 3760.617s 99% C.I. 35.990ms-1218.784ms Avg: 298.438ms Max: 14617.568ms ScanImmuneSpaces: Sum: 216.241s 99% C.I. 2.719ms-110.340ms Avg: 17.160ms Max: 1644.161ms FlipOtherThreads: Sum: 136.969s 99% C.I. 0.517ms-174.411ms Avg: 10.869ms Max: 5585.050ms VisitConcurrentRoots: Sum: 119.937s 99% C.I. 1.928ms-50.934ms Avg: 9.518ms Max: 1356.509ms ClearFromSpace: Sum: 66.775s 99% C.I. 0.168ms-21.473ms Avg: 5.299ms Max: 118.068ms SweepSystemWeaks: Sum: 58.215s 99% C.I. 0.196ms-19.387ms Avg: 4.619ms Max: 342.553ms GrayAllDirtyImmuneObjects: Sum: 57.570s 99% C.I. 0.133ms-123.041ms Avg: 4.568ms Max: 2010.547ms ForwardSoftReferences: Sum: 37.474s 99% C.I. 0.037ms-13.834ms Avg: 2.973ms Max: 118.213ms EnqueueFinalizerReferences: Sum: 30.905s 99% C.I. 0.094ms-22.239ms Avg: 2.452ms Max: 248.969ms MarkingPhase: Sum: 24.277s 99% C.I. 0.257ms-51.155ms Avg: 1.926ms Max: 3743.053ms VisitNonThreadRoots: Sum: 23.507s 99% C.I. 0.065ms-15.260ms Avg: 1.865ms Max: 190.694ms ProcessReferences: Sum: 17.221s 99% C.I. 9.019us-4689.544us Avg: 683.321us Max: 94215us EmptyRBMarkBitStack: Sum: 15.343s 99% C.I. 0.026ms-13.119ms Avg: 1.217ms Max: 145.672ms ThreadListFlip: Sum: 10.730s 99% C.I. 128.387us-25549.046us Avg: 851.524us Max: 1555533us InitializePhase: Sum: 7.689s 99% C.I. 125us-9536.627us Avg: 610.256us Max: 434902us FlipThreadRoots: Sum: 4.346s 99% C.I. 16.387us-13773.217us Avg: 344.930us Max: 241943us RecordFree: Sum: 4.106s 99% C.I. 95us-2474.875us Avg: 325.895us Max: 31656us SweepAllocSpace: Sum: 2.746s 99% C.I. 32.035us-6375.082us Avg: 217.946us Max: 397272us SweepLargeObjects: Sum: 2.611s 99% C.I. 10us-1976.352us Avg: 207.248us Max: 36205us ResumeOtherThreads: Sum: 2.453s 99% C.I. 8.115us-8079.599us Avg: 194.712us Max: 86018us ReclaimPhase: Sum: 2.086s 99% C.I. 9us-4266.285us Avg: 165.567us Max: 155093us ResumeRunnableThreads: Sum: 1.759s 99% C.I. 8.063us-2676.218us Avg: 139.607us Max: 87029us MarkStackAsLive: Sum: 782.752ms 99% C.I. 5us-799.362us Avg: 62.118us Max: 41297us MarkZygoteLargeObjects: Sum: 741.028ms 99% C.I. 12us-899.899us Avg: 58.807us Max: 24679us (Paused)GrayAllNewlyDirtyImmuneObjects: Sum: 651.153ms 99% C.I. 12us-373.754us Avg: 51.674us Max: 14328us ClearRegionSpaceCards: Sum: 412.842ms 99% C.I. 8us-199.633us Avg: 32.762us Max: 16297us (Paused)SetFromSpace: Sum: 357.553ms 99% C.I. 3us-286.340us Avg: 28.374us Max: 14500us SwapBitmaps: Sum: 151.902ms 99% C.I. 4us-99.729us Avg: 12.054us Max: 5447us Sweep: Sum: 110.635ms 99% C.I. 2us-49.924us Avg: 8.779us Max: 1087us (Paused)FlipCallback: Sum: 105.455ms 99% C.I. 2us-99.800us Avg: 8.368us Max: 8433us (Paused)ClearCards: Sum: 89.377ms 99% C.I. 1000ns-199016ns Avg: 295ns Max: 17545000ns UnBindBitmaps: Sum: 17.838ms 99% C.I. 0.250us-49.765us Avg: 1.415us Max: 1516us Done Dumping histograms concurrent copying paused: Sum: 10.999s 99% C.I. 320.432us-63766.026us Avg: 872.888us Max: 1214342us concurrent copying total time: 4607.007s mean time: 365.606ms concurrent copying freed: 4075114399 objects with total size 152GB concurrent copying throughput: 884547/s / 33MB/s Cumulative bytes moved 115136403208 Cumulative objects moved 2160415247 Total time spent in GC: 4607.007s Mean GC size throughput: 32MB/s Mean GC object throughput: 884504 objects/s Total number of allocations 4076810306 Total bytes allocated 145GB Total bytes freed 145GB Free memory 8MB Free memory until GC 8MB Free memory until OOME 431MB Total memory 89MB Max memory 512MB Zygote space size 652KB Total mutator paused time: 10.999s Total time waiting for GC to complete: 71.458s Total GC count: 12601 Total GC time: 4607.007s Total blocking GC count: 3305 Total blocking GC time: 1302.402s Histogram of GC count per 10000 ms: 0:11034,1:7128,2:1807,3:417,4:87,5:21,6:6,7:3,8:2,9:2,11:2,12:2,17:1 Histogram of blocking GC count per 10000 ms: 0:17706,1:2336,2:443,3:26,4:1 Registered native bytes allocated: 57354982 /system/framework/oat/arm64/com.android.location.provider.odex: speed /system/priv-app/FusedLocation/oat/arm64/FusedLocation.odex: speed /system/priv-app/Telecom/oat/arm64/Telecom.odex: speed /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/com.coloros.statistics.odex: quicken /system/priv-app/SettingsProvider/oat/arm64/SettingsProvider.odex: speed /system/framework/oat/arm64/services.odex: speed /system/framework/oat/arm64/ethernet-service.odex: speed /system/framework/oat/arm64/wifi-service.odex: speed /system/framework/oat/arm64/com.android.location.provider.odex: speed /system/framework/oat/arm64/mediatek-services.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /data/dalvik-cache/arm64/vendor@app@LPPeService@LPPeService.apk@classes.dex: quicken /system/framework/oat/arm64/mediatek-framework-net.odex: quicken /data/dalvik-cache/arm64/vendor@app@NlpService@NlpService.apk@classes.dex: quicken Running non JIT
對于死機情況來說, 我們比較關(guān)注java heap 的memory 使用與GC 情況:
Heap: 9% free, 80MB/89MB; 1893251 objects Free memory 8MB Free memory until GC 8MB Free memory until OOME 431MB Total memory 89MB Max memory 512MB
==》 可以簡單知道是否有java 層的object leaks, 以及觸發(fā)GC 的情況
除去這些之后, 我們再來看java backtrace 的具體資訊.
下面是一小段system server 的java backtrace 的開始
----- pid 682 at 2014-07-30 1853 ----- Cmd line: system_server JNI: CheckJNI is off; workarounds are off; pins=4; globals=1484 (plus 50 weak) DALVIK THREADS: (mutexes: tll=0 tsl=0 tscl=0 ghl=0) <== 只有老版本有. "main" prio=5 tid=1 NATIVE | group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8 | sysTid=682 nice=-2 sched=0/0 cgrp=apps handle=1074835940 | state=S schedstat=( 47858718206 26265263191 44902 ) utm=4074 stm=711 core=0 at android.os.MessageQueue.nativePollOnce(Native Method) at android.os.MessageQueue.next(MessageQueue.java:138) at android.os.Looper.loop(Looper.java:150) at com.android.server.ServerThread.initAndLoop(SystemServer.java:1468) at com.android.server.SystemServer.main(SystemServer.java:1563) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:829) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:645) at dalvik.system.NativeStart.main(Native Method)
我們一行一行來解析.
0# 最開始是 -----PID at Time 然后接著 Cmd line: process name
1# the backtrace header: dvm thread :“DALVIK THREADS:”
新版本中有包括 thread 量, 如: "DALVIK THREADS (214):" 可以觀察到是否有java thread leaks.
2# Global DVM mutex value: if 0 unlock, else lock
tll: threadListLock,
tsl: threadSuspendLock,
tscl: threadSuspendCountLock
ghl: gcHeapLock
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)
3# thread name, java thread Priority, [daemon], DVM thread id, DVM thread status.
"main" -> main thread -> activity thread
prio -> java thread priority default is 5, (nice =0, linux thread priority 120), domain is [1,10]
DVM thread id, NOT linux thread id
DVM thread Status:
ZOMBIE, RUNNABLE, TIMED_WAIT, MONITOR, WAIT, INITALIZING,STARTING, NATIVE, VMWAIT, SUSPENDED,UNKNOWN
"main" prio=5 tid=1 NATIVE
4# DVM thread status
group: default is “main”
Compiler,JDWP,Signal Catcher,GC,FinalizerWatchdogDaemon,FinalizerDaemon,ReferenceQueueDaemon are system group
sCount: thread suspend count
dsCount: thread dbg suspend count
obj: thread obj address
Sef: thread point address
group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8
5 Linux thread status
sysTId: linux thread tid
Nice: linux thread nice value
sched: cgroup policy/gourp id
cgrp: c group
handle: handle address
sysTid=682 nice=-2 sched=0/0 cgrp=apps handle=1074835940
6 CPU Sched stat
Schedstat (Run CPU Clock/ns, Wait CPU Clock/ns, Slice times)
utm: utime, user space time(jiffies)
stm: stime, kernel space time(jiffies)
Core now running in cpu.
state=S schedstat=( 47858718206 26265263191 44902 ) utm=4074 stm=711 core=0
五、常見 Java backtrace 舉例
1.ActivityThread 正常狀態(tài)/ActivityThread Normal Case
Message Queue is empty, and thread wait for next message. "main" prio=5 tid=1 NATIVE | group="main" sCount=1 dsCount=0 obj=0x4193fde0 self=0x418538f8 | sysTid=11559 nice=0 sched=0/0 cgrp=apps/bg_non_interactive handle=1074835940 | state=S schedstat=( 2397315020 9177261498 7975 ) utm=100 stm=139 core=1 at android.os.MessageQueue.nativePollOnce(Native Method) at android.os.MessageQueue.next(MessageQueue.java:138) at android.os.Looper.loop(Looper.java:150) at android.app.ActivityThread.main(ActivityThread.java:5299) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:829) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:645) at dalvik.system.NativeStart.main(Native Method)
2.Java Backtrace Monitor case
Synchronized Lock: 等待同步鎖時的backtrace. "AnrMonitorThread" prio=5 tid=24 MONITOR | group="main" sCount=1 dsCount=0 obj=0x41fd80c8 self=0x551ac808 | sysTid=711 nice=0 sched=0/0 cgrp=apps handle=1356369328 | state=S schedstat=( 8265377638 4744771625 6892 ) utm=160 stm=666 core=0 at com.android.server.am.ANRManager$AnrDumpMgr.dumpAnrDebugInfoLocked(SourceFile:~832) - waiting to lock <0x42838968> (a com.android.server.am.ANRManager$AnrDumpRecord) held by tid=20 (ActivityManager) at com.android.server.am.ANRManager$AnrDumpMgr.dumpAnrDebugInfo(SourceFile:824) at com.android.server.am.ANRManager$AnrMonitorHandler.handleMessage(SourceFile:220) at android.os.Handler.dispatchMessage(Handler.java:110) at android.os.Looper.loop(Looper.java:193) at android.os.HandlerThread.run(HandlerThread.java:61)
3.執(zhí)行JNI code 未返回,狀態(tài)是native 的情況
"WifiP2pService" prio=5 tid=37 NATIVE | group="main" sCount=1 dsCount=0 obj=0x427a9910 self=0x55f088d8 | sysTid=734 nice=0 sched=0/0 cgrp=apps handle=1443230288 | state=S schedstat=( 91121772 135245305 170 ) utm=7 stm=2 core=1 #00 pc 00032700 /system/lib/libc.so (epoll_wait+12) #01 pc 000105e3 /system/lib/libutils.so (android::pollInner(int)+94) #02 pc 00010811 /system/lib/libutils.so (android::pollOnce(int, int*, int*, void**)+92) #03 pc 0006c96d /system/lib/libandroid_runtime.so (android::pollOnce(_JNIEnv*, int)+22) #04 pc 0001eacc /system/lib/libdvm.so (dvmPlatformInvoke+112) #05 pc 0004fed9 /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+484) #06 pc 00027ea8 /system/lib/libdvm.so #07 pc 0002f4b0 /system/lib/libdvm.so (dvmMterpStd(Thread*)+76) #08 pc 0002c994 /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+188) #09 pc 000632a5 /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+340) #10 pc 000632c9 /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+20) #11 pc 00057961 /system/lib/libdvm.so #12 pc 0000dd40 /system/lib/libc.so (__thread_entry+72) at android.os.MessageQueue.nativePollOnce(Native Method) at android.os.MessageQueue.next(MessageQueue.java:138) at android.os.Looper.loop(Looper.java:150) at android.os.HandlerThread.run(HandlerThread.java:61)
4. 執(zhí)行object.wait 等待狀態(tài)
"AsyncTask #1" prio=5 tid=33 WAIT | group="main" sCount=1 dsCount=0 obj=0x427a8480 self=0x56036b40 | sysTid=733 nice=10 sched=0/0 cgrp=apps/bg_non_interactive handle=1443076000 | state=S schedstat=( 1941480839 10140523154 4229 ) utm=119 stm=75 core=0 at java.lang.Object.wait(Native Method) - waiting on <0x427a8618> (a java.lang.VMThread) held by tid=33 (AsyncTask #1) at java.lang.Thread.parkFor(Thread.java:1212) at sun.misc.Unsafe.park(Unsafe.java:325) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:157) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:410) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1035) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1097) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) at java.lang.Thread.run(Thread.java:848)
5. Suspend 狀態(tài),通常表明是抓取backtrace 時,當時還正在執(zhí)行java code, 被強制suspend 的情況
"FileObserver" prio=5 tid=23 SUSPENDED | group="main" sCount=1 dsCount=0 obj=0x41fd1dc8 self=0x551abda0 | sysTid=710 nice=0 sched=0/0 cgrp=apps handle=1427817920 | state=S schedstat=( 130152222 399783851 383 ) utm=9 stm=4 core=0 #00 pc 000329f8 /system/lib/libc.so (__futex_syscall3+8) #01 pc 000108cc /system/lib/libc.so (__pthread_cond_timedwait_relative+48) #02 pc 0001092c /system/lib/libc.so (__pthread_cond_timedwait+64) #03 pc 00055a93 /system/lib/libdvm.so #04 pc 0005614d /system/lib/libdvm.so (dvmChangeStatus(Thread*, ThreadStatus)+34) #05 pc 0004ae7f /system/lib/libdvm.so #06 pc 0004e353 /system/lib/libdvm.so #07 pc 000518d5 /system/lib/libandroid_runtime.so #08 pc 0008af9f /system/lib/libandroid_runtime.so #09 pc 0001eacc /system/lib/libdvm.so (dvmPlatformInvoke+112) #10 pc 0004fed9 /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+484) #11 pc 00027ea8 /system/lib/libdvm.so #12 pc 0002f4b0 /system/lib/libdvm.so (dvmMterpStd(Thread*)+76) #13 pc 0002c994 /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+188) #14 pc 000632a5 /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+340) #15 pc 000632c9 /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+20) #16 pc 00057961 /system/lib/libdvm.so #17 pc 0000dd40 /system/lib/libc.so (__thread_entry+72) at android.os.FileObserver$ObserverThread.observe(Native Method) at android.os.FileObserver$ObserverThread.run(FileObserver.java:88)
六、Native Backtrace
1. Native Backtrace 抓取方式
1.利用debuggerd 抓取
MTK 已經(jīng)制作了一個利用debuggerd 抓取Native backtrace 的tool RTT(Runtime Trace), 對應的執(zhí)行命令是:
rtt built timestamp (Apr 18 2014 1521) USAGE : rtt [-h] -f function -p pid [-t tid] -f funcion : current support functions: bt (Backtrace function) -p pid : pid to trace -t tid : tid to trace -n name : process name to trace -h : help menu
這個rtt 目前只有在eng/userdebug 中可以使用; 另外大家可以使用 google 默認的debuggerd 來抓取:
adb shell debuggerd -b pid
注意的是這些都是要依賴debuggerd 的流程正常, 以及對應抓取process 的debuggerd_signal_handler 行為正常, 否則抓取不到.
2. 添加代碼直接抓取
Google 默認提供了CallStack API, 請參考
system/core/include/libutils/CallStack.h system/core/libutils/CallStack.cpp
可快速打印單個線程的backtrace.
2.解析Native Backtrace
你可以使用GDB, 或者addr2line 等 tool 來解析抓回的Native Backtrace, 從而知道當時正在執(zhí)行的native 代碼.
如addr2line 執(zhí)行
arm-linux-androideabi-addr2line -f -C -e symbols address
七、Kernel Backtrace
1.Kernel Backtrace 抓取方式
1 運行時抓取
AEE/RTT 工具
Proc System
cat proc/pid/task/tid/stack
Sysrq-trigger
adb shell cat proc/kmsg > kmsg.txt adb shell "echo 8 > proc/sys/kernel/printk“ //修改printk loglevel adb shell "echo t > /proc/sysrq-trigger“ //打印所有的backtrace adb shell "echo w > /proc/sysrq-trigger“//打印'-D' status 'D' 的 process
KDB
Long press volume UP and DOWN more then 10s btpDisplay stack for process bta [DRSTCZEUIMA] Display stack all processes btc Backtrace current process on each cpu btt Backtrace process given its struct task add
2.添加代碼直接抓取
#include當前thread: dump_stack(); 其他thread: show_stack(task, NULL);
3. Process/Thread 狀態(tài)
"R (running)", /* 0/
"S (sleeping)", /1/
"D (disk sleep)", /2/
"T (stopped)", /4/
"t (tracing stop)", /8/
"Z (zombie)", /16/
"X (dead)", /32/
"x (dead)", /64/
"K (wakekill)", /128/
"W (waking)", /256 */
通常一般的Process 處于的狀態(tài)都是S (sleeping), 而如果一旦發(fā)現(xiàn)處于如D (disk sleep), T (stopped), Z (zombie) 等就要認真審查.
八、幾種典型的異常情況
1. Deadlock
下面這個case 可以看到PowerManagerService, ActivityManager, WindowManager 相互之間發(fā)生deadlock.
"PowerManagerService" prio=5 tid=25 MONITOR | group="main" sCount=1 dsCount=0 obj=0x42bae270 self=0x6525d5c0 | sysTid=913 nice=0 sched=0/0 cgrp=apps handle=1696964440 | state=S schedstat=( 5088939411 10237027338 34016 ) utm=232 stm=276 core=2 at com.android.server.am.ActivityManagerService.bindService(ActivityManagerService.java:~14079) - waiting to lock <0x42aa0f78> (a com.android.server.am.ActivityManagerService) held by tid=16 (ActivityManager) at android.app.ContextImpl.bindServiceCommon(ContextImpl.java:1665) at android.app.ContextImpl.bindService(ContextImpl.java:1648) at com.android.server.power.PowerManagerService.bindSmartStandByService(PowerManagerService.java:4090) at com.android.server.power.PowerManagerService.handleSmartStandBySettingChangedLocked(PowerManagerService.java:4144) at com.android.server.power.PowerManagerService.access$5600(PowerManagerService.java:102) at com.android.server.power.PowerManagerService$SmartStandBySettingObserver.onChange(PowerManagerService.java:4132) at android.database.ContentObserver$NotificationRunnable.run(ContentObserver.java:181) at android.os.Handler.handleCallback(Handler.java:809) at android.os.Handler.dispatchMessage(Handler.java:102) at android.os.Looper.loop(Looper.java:139) at android.os.HandlerThread.run(HandlerThread.java:58) "ActivityManager" prio=5 tid=16 MONITOR | group="main" sCount=1 dsCount=0 obj=0x42aa0d08 self=0x649166b0 | sysTid=902 nice=-2 sched=0/0 cgrp=apps handle=1687251744 | state=S schedstat=( 39360881460 25703061063 69675 ) utm=1544 stm=2392 core=2 at com.android.server.wm.WindowManagerService.setAppVisibility(WindowManagerService.java:~4783) - waiting to lock <0x42d17590> (a java.util.HashMap) held by tid=12 (WindowManager) at com.android.server.am.ActivityStack.stopActivityLocked(ActivityStack.java:2432) at com.android.server.am.ActivityStackSupervisor.activityIdleInternalLocked(ActivityStackSupervisor.java:2103) at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.activityIdleInternal(ActivityStackSupervisor.java:2914) at com.android.server.am.ActivityStackSupervisor$ActivityStackSupervisorHandler.handleMessage(ActivityStackSupervisor.java:2921) at android.os.Handler.dispatchMessage(Handler.java:110) at android.os.Looper.loop(Looper.java:147) at com.android.server.am.ActivityManagerService$AThread.run(ActivityManagerService.java:2112) "WindowManager" prio=5 tid=12 MONITOR | group="main" sCount=1 dsCount=0 obj=0x42a92550 self=0x6491dd48 | sysTid=898 nice=-4 sched=0/0 cgrp=apps handle=1687201104 | state=S schedstat=( 60734070955 41987172579 219755 ) utm=4659 stm=1414 core=1 at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManagerInternal(PowerManagerService.java:~3207) - waiting to lock <0x42a95140> (a java.lang.Object) held by tid=25 (PowerManagerService) at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManager(PowerManagerService.java:3196) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedInner(WindowManagerService.java:9686) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedLoop(WindowManagerService.java:8923) at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLocked(WindowManagerService.java:8879) at com.android.server.wm.WindowManagerService.access$500(WindowManagerService.java:170) at com.android.server.wm.WindowManagerService$H.handleMessage(WindowManagerService.java:7795) at android.os.Handler.dispatchMessage(Handler.java:110) at android.os.Looper.loop(Looper.java:147) at android.os.HandlerThread.run(HandlerThread.java:58)
2. 執(zhí)行JNI native code 后一去不復返
"main" prio=5 tid=1 NATIVE | group="main" sCount=1 dsCount=0 obj=0x41bb3d98 self=0x41ba2878 | sysTid=814 nice=-2 sched=0/0 cgrp=apps handle=1074389380 | state=D schedstat=( 22048890928 19526803112 32612 ) utm=1670 stm=534 core=0 (native backtrace unavailable) at android.hardware.SystemSensorManager$BaseEventQueue.nativeDisableSensor(Native Method) at android.hardware.SystemSensorManager$BaseEventQueue.disableSensor(SystemSensorManager.java:399) at android.hardware.SystemSensorManager$BaseEventQueue.removeAllSensors(SystemSensorManager.java:325) at android.hardware.SystemSensorManager.unregisterListenerImpl(SystemSensorManager.java:194) at android.hardware.SensorManager.unregisterListener(SensorManager.java:560) at com.android.internal.policy.impl.WindowOrientationListener.disable(WindowOrientationListener.java:139) at com.android.internal.policy.impl.PhoneWindowManager.updateOrientationListenerLp(PhoneWindowManager.java:774) at com.android.internal.policy.impl.PhoneWindowManager.screenTurnedOff(PhoneWindowManager.java:4897) at com.android.server.power.Notifier.sendGoToSleepBroadcast(Notifier.java:518) at com.android.server.power.Notifier.sendNextBroadcast(Notifier.java:434) at com.android.server.power.Notifier.access$500(Notifier.java:63) at com.android.server.power.Notifier$NotifierHandler.handleMessage(Notifier.java:584) at android.os.Handler.dispatchMessage(Handler.java:110) at android.os.Looper.loop(Looper.java:193) at com.android.server.ServerThread.initAndLoop(SystemServer.java:1436) at com.android.server.SystemServer.main(SystemServer.java:1531) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:515) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:824) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:640) at dalvik.system.NativeStart.main(Native Method)
===>
KERNEL SPACE BACKTRACE, sysTid=814
[] 0xffffffff from [ ] __schedule+0x3fc/0x950 [ ] __schedule+0xc/0x950 from [ ] schedule+0x40/0x80 [ ] schedule+0xc/0x80 from [ ] schedule_preempt_disabled+0x20/0x2c [ ] schedule_preempt_disabled+0xc/0x2c from [ ] mutex_lock_nested+0x1b8/0x560 [ ] mutex_lock_nested+0xc/0x560 from [ ] gsensor_operate+0x1bc/0x2c0 [ ] gsensor_operate+0xc/0x2c0 from [ ] hwmsen_enable+0xa8/0x30c [ ] hwmsen_enable+0xc/0x30c from [ ] hwmsen_unlocked_ioctl+0x2fc/0x528 [ ] hwmsen_unlocked_ioctl+0xc/0x528 from [ ] do_vfs_ioctl+0x94/0x5bc [ ] do_vfs_ioctl+0xc/0x5bc from [ ] sys_ioctl+0x7c/0x8c [ ] sys_ioctl+0xc/0x8c from [ ] ret_fast_syscall+0x0/0x40 [ ] from [ ] 編輯:黃飛
-
傳感器
+關(guān)注
關(guān)注
2550文章
51035瀏覽量
753078 -
cpu
+關(guān)注
關(guān)注
68文章
10854瀏覽量
211583 -
觸摸屏
+關(guān)注
關(guān)注
42文章
2301瀏覽量
116119 -
JAVA
+關(guān)注
關(guān)注
19文章
2966瀏覽量
104702 -
死機
+關(guān)注
關(guān)注
0文章
17瀏覽量
8597
原文標題:八、幾種典型的異常情況
文章出處:【微信號:哆啦安全,微信公眾號:哆啦安全】歡迎添加關(guān)注!文章轉(zhuǎn)載請注明出處。
發(fā)布評論請先 登錄
相關(guān)推薦
評論