日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 运维知识 > linux >内容正文

linux

Linux Kernel Oops异常分析

發布時間:2023/11/30 linux 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Linux Kernel Oops异常分析 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

?

0.linux內核異常常用分析方法

  • 異常地址是否在0附近,確認是否是空指針解引用問題
  • 異常地址是否在iomem映射區,確認是否是設備訪問總線異常問題,如PCI異常導致的地址訪問異常
  • 異常地址是否在stack附近,如果相鄰,要考慮是否被踩
  • 比較delay reset/nmi watchdog等多種機制打印的棧信息,看看pc是否在動,確定是否是死鎖
  • 用SysRq判斷是真死還是假死
  • 通過反匯編獲得發生異常的C代碼段和函數,查找開源社區是否已有補丁修復
  • 下面分別通過PowerPC和Mips64的2個異常例子詳細講解分析過程。

    1.PowerPC小系統內核異常分析

    1.1? 異常打印

    ?

    Unable to handle kernel paging request for data at address 0x36fef31e
    Faulting instruction address: 0xc0088b8c
    Oops: Kernel access of bad area, sig: 11 [#1]
    PREEMPT SMP NR_CPUS=2
    Modules linked in: ossmod tipc ohci_hcd ehci_hcd cmm uart1655x bcm334 bootflash mtdchar bsp_flash_init boardctrl 85xx_debug util
    NIP: C0088B8C LR: C0088CF8 CTR: 00000000
    REGS: ce283e20 TRAP: 0300 Not tainted (2.6.21.7-EMBSYS-CGEL-3.04.10.P6.F5)
    MSR: 00021000 <ME> CR: 22004222 XER: 00000000
    DAR: 36FEF31E, DSISR: 00800000
    TASK = cffdf180[26] 'events/1' THREAD: ce282000 CPU: 1
    GPR00: 00100100 CE283ED0 CFFDF180 CF528000 C09EA500 EFFEAD20 CF5188A0 00000000
    GPR08: CF5188BC 00200200 36FEF31E D1FD7F9E 22004222 1010DA44 00000290 00000000
    GPR16: 1011C858 100147F4 BF9BC9C4 10100000 00000001 C0460000 C06454CC 00000000
    GPR24: C0640000 CE282000 C0640000 00000005 00000000 00000000 EFFE8EC0 CFFED958
    NIP [C0088B8C] free_block+0xc4/0x16c
    LR [C0088CF8] drain_array+0xc4/0x100
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Call Trace:
    [CE283ED0] [C06ABEC0] 0xc06abec0(unreliable)
    [CE283EF0] [C0088CF8] drain_array+0xc4/0x100
    [CE283F10] [C008A70C] cache_reap+0x94/0x13c
    [CE283F30] [C003DA2C] run_workqueue+0xc4/0x198
    [CE283F60] [C003E6D4] worker_thread+0x130/0x154
    [CE283FB0] [C0042E80] kthread+0xd4/0x110
    [CE283FF0] [C0011A70] original_kernel_thread+0x44/0x60

    Instruction dump:
    5400cffe 0f000000 80c4001c 7d1cf214 3c000010 3d200020 80a8001c 60000100
    81660000 61290200 81460004 3906001c <916a0000> 914b0004 90060000 91260004
    ------------[ cut here ]------------
    Badness at c0011e4c [verbose debug info unavailable]
    Call Trace:
    [CE283C50] [C00080BC] show_stack+0x3c/0x1a0 (unreliable)
    [CE283C80] [C018EA28] report_bug+0xb0/0xb8
    [CE283C90] [C000EC94] program_check_exception+0xcc/0x4f8
    [CE283CD0] [C0010BE4] ret_from_except_full+0x0/0x4c
    [CE283D90] [C0640000] 0xc0640000
    [CE283DD0] [C000E61C] die+0x1f0/0x27c
    [CE283E00] [C0014B18] bad_page_fault+0x98/0xe8
    [CE283E10] [C0010A88] handle_page_fault+0x7c/0x80
    [CE283ED0] [C06ABEC0] 0xc06abec0
    [CE283EF0] [C0088CF8] drain_array+0xc4/0x100
    [CE283F10] [C008A70C] cache_reap+0x94/0x13c
    [CE283F30] [C003DA2C] run_workqueue+0xc4/0x198
    [CE283F60] [C003E6D4] worker_thread+0x130/0x154
    [CE283FB0] [C0042E80] kthread+0xd4/0x110
    [CE283FF0] [C0011A70] original_kernel_thread+0x44/0x60

    1.2? Oops分析

    ?Oops: Kernel access of bad area, sig: 11 [#1]? ?

    異常分類

    Oops:內核態指令異常;

    BUG:內核檢測到邏輯異常(類似于assert),會影響內核的后續運行;

    WARNING:類似于BUG,但是不會影響內核的后續運行;

    PANIC:類似于BUG,系統不能繼續運行,直接掛起或重啟;

    SOFTLOCK:長時間任務得不到調度;

    ?

    異常信號

    Signal

    Code

    Default Action

    Description

    SIGABRT

    6

    A

    Process abort signal

    SIGALRM

    14

    T

    Alarm clock

    SIGBUS

    10

    A

    Access to an undefined portion of a memory object

    SIGCHLD

    18

    I - Ignore the Signal

    Child process terminated, stopped,

    SIGCONT

    25

    C - Continue the process

    Continue executing, if stopped.

    SIGFPE

    8

    A

    Erroneous arithmetic operation.

    SIGHUP

    1

    T

    Hangup.

    SIGILL

    4

    A

    Illegal instruction.

    SIGINT

    2

    T

    Terminal interrupt signal.

    SIGKILL

    9

    T

    Kill (cannot be caught or ignored).

    SIGPIPE

    13

    T - Abnormal termination of the process

    Write on a pipe with no one to read it.

    SIGQUIT

    3

    A - Abnormal termination of the process

    Terminal quit signal.

    SIGSEGV

    11

    A

    Invalid memory reference.

    SIGSTOP

    23

    S - Stop the process

    Stop executing (cannot be caught or ignored).

    SIGTERM

    15

    T

    Termination signal.

    SIGTSTP

    23

    S

    Terminal stop signal.

    SIGTTIN

    26

    S

    Background process attempting read.

    SIGTTOU

    27

    S

    Background process attempting write.

    SIGUSR1

    16

    T

    User-defined signal 1.

    SIGUSR2

    17

    T

    User-defined signal 2.

    SIGPOLL

    22

    T

    Pollable event.

    SIGPROF

    29

    T

    Profiling timer expired.

    SIGSYS

    12

    A

    Bad system call.

    SIGTRAP

    5

    A

    Trace/breakpoint trap.

    SIGURG

    21

    I

    High bandwidth data is available at a socket.

    SIGVTALRM

    28

    T

    Virtual timer expired.

    SIGXCPU

    30

    A

    CPU time limit exceeded.

    SIGXFSZ

    31

    A

    File size limit exceeded

    Default Actions:

    T?- Abnormal termination of the process. The process is terminated with all the consequences of _exit() except that the status made available to wait() and waitpid() indicates abnormal termination by the specified signal.

    A?- Abnormal termination of the process. Additionally, implementation-defined abnormal termination actions, such as creation of a core file, may occur.

    I?- Ignore the signal.

    S?- Stop the process.

    C?- Continue the process, if it is stopped; otherwise, ignore the signal.

    ?

    具體針對powerpc e500內核,異常與信號的對應關系如下:

    ?

    所以有進程訪問了超出其虛擬地址空間的地址,內核報SIGSEGV(segment fault)信號。

    那是什么進程呢?

    其他

    #1,die_counter,表示Oops發生的次數,一般來說,如果有多條Oops,看第一條Oops信息,因為后面的Oops可能是第一條Oops的錯誤傳播導致的。

    ?

    1.3? 寄存器分析

    NIP: C0088B8C LR: C0088CF8 CTR: 00000000?

    NIP是next instruction pointer,值就是當前指令的地址。這里列出了3個寄存器的值。

    LR是link register其值為上一條指令的地址。

    CTR是count register,其值用于循環指令。

    REGS: ce283e20 TRAP: 0300?? Not tainted? (2.6.21.7-EMBSYS-CGEL-3.04.10.P6.F5)??

    TRAP :異常處理函數入口地址;REGS :系統棧pt_regs的基址。pt_regs這個結構封裝了需要在內核入口中保存的最少的狀態信息。比如說每一次的系統調用、中斷、陷阱、故障。

    ??? 0x100:??? "(System Reset)"

    ?????? 0x200:??? "(Machine Check)"

    ?????? 0x300:??? "(Data Access)"

    ?????? 0x380:??? "(Data SLB Access)"

    ?????? 0x400:??? "(Instruction Access)"

    ?????? 0x480:??? "(Instruction SLB Access)"

    ?????? 0x500:??? "(Hardware Interrupt)"

    ?????? 0x600:??? "(Alignment)"

    ?????? 0x700:??? "(Program Check)"

    ?????? 0x800:??? "(FPU Unavailable)"

    ?????? 0x900:??? "(Decrementer)"

    ?????? 0xc00:???? "(System Call)"

    ?????? 0xd00:??? "(Single Step)"

    ?????? 0xf00:???? "(Performance Monitor)"

    ?????? 0xf20:???? "(Altivec Unavailable)"

    ?????? 0x1300:?? "(Instruction Breakpoint)"

    詳細解釋見《PowerPC? e500 Core Family Reference Manual》“5.7 Interrupt Definitions”。

    ?

    tainted :內核錯誤信息,由add_taint設置,解釋如下:

    *? 'P' - Proprietary module has been loaded.

    ?*? 'F' - Module has been forcibly loaded.

    ?*? 'S' - SMP with CPUs not designed for SMP.

    ?*? 'R' - User forced a module unload.

    ?*? 'M' - System experienced a machine check exception.

    ?*? 'B' - System has hit bad_page.

    ?*? 'U' - Userspace-defined naughtiness.

    ?*? 'D' - Kernel has oopsed before

    ?*? 'A' - ACPI table overridden.

    ?*? 'W' - Taint on warning.

    ?*? 'C' - modules from drivers/staging are loaded.

    ?

    MSR: 00021000 <ME>? CR: 22004222? XER: 00000000??

    DAR: 36FEF31E, DSISR: 00800000

    MSR是machine state register;

    CR是condition register;

    XER為Integer Exception Register

    DAR為data address register,其值為造成了內存訪問異常的地址。E500中為Data Exception Address Register (DEAR)

    DSISR為Data Storage Interrupt Status Register,是存儲著發生內存訪問異常原因的寄存器。E500中為Exception Syndrome Register (ESR)。0x00800000表示Store operation中的Alignment, data storage, data TLB error異常。

    ?

    TASK = cffdf180[26] 'events/1' THREAD: ce282000 CPU: 1

    cffdf180:進程task_struct結構體的地址;

    26:進程號;

    events/1:進程名;

    THREAD:進程的內核棧起始地址;

    CPU:當前CPU;

    當前進程也就是'events/1進程,出現SIGSEGV異常了。

    ?

    GPR00: 00100100 CE283ED0 CFFDF180 CF528000 C09EA500 EFFEAD20 CF5188A0 00000000

    GPR08: CF5188BC 00200200 36FEF31E D1FD7F9E 22004222 1010DA44 00000290 00000000?????????????????????????????????????????????????????????????????????

    GPR16: 1011C858 100147F4 BF9BC9C4 10100000 00000001 C0460000 C06454CC 00000000?????????????????????????????????????????????????????

    GPR24: C0640000 CE282000 C0640000 00000005 00000000 00000000 EFFE8EC0 CFFED958

    ? ? PowerPC的ABI規定的寄存器的使用規則如下:

    ? (1)GPR0:屬于易失性寄存器,ABI規定普通用戶不能使用此寄存器。GCC編譯器用此寄存器來保存LR寄存器,Linux PowerPC用此寄存器來傳遞系統調用號碼。

    ? (2)GPR1:屬于專用寄存器,ABI規定用次寄存器來保存堆棧的棧頂指針。

    ? (3)GPR2:屬于專用寄存器,ABI規定普通用戶不使用才寄存器,Linux PowerPC用此寄存器來保存當前進程的進程描述符地址。

    ? (4)GPR3-GPR4:屬于易失性寄存器,ABI使用這兩個寄存器來保存函數的返回值,或者用來傳遞參數。

    ? (5)GPR5-GPR10:也屬于易失性寄存器,加上GPR3和GPR4共8個寄存器用來傳遞函數的參數。當函數的參數超過八個時使用堆棧來傳遞。

    ? (6)GPR11-GPR12:屬于易失性寄存器,ABI規定普通用戶不使用該寄存器,Linux PowerPC有時用這兩個寄存器來存放臨時變量,但是GCC編譯器沒有使用這兩個寄存器。

    ? (7)GPR13:屬于專用寄存器,ABI規定該寄存器sdata段的基地址指針。Linux PowerPC在系統初始化時使用該寄存器來存放臨時變量。GCC有時會根據某些規則將一些常用的數據放入sdata或者sbss段中。應用程序對sdata或者sbss段數據的訪問與對data和bss段數據的訪問機制不同,訪問sdata段的數據速度更快。

    ? (8)GPR14-GPR31:屬于非易失性寄存器。ABI使用這些寄存器來存放一些臨時變量,在應用程序中可以自由使用這些變量。

    ?

    1.4? 調用棧分析

    調用鏈

    ?

    NIP [C0088B8C] free_block+0xc4/0x16c

    LR [C0088CF8] drain_array+0xc4/0x100

    Call Trace:

    [CE283ED0] [C06ABEC0] 0xc06abec0(unreliable)

    [CE283EF0] [C0088CF8] drain_array+0xc4/0x100

    [CE283F10] [C008A70C] cache_reap+0x94/0x13c

    [CE283F30] [C003DA2C] run_workqueue+0xc4/0x198

    [CE283F60] [C003E6D4] worker_thread+0x130/0x154

    [CE283FB0] [C0042E80] kthread+0xd4/0x110

    [CE283FF0] [C0011A70] original_kernel_thread+0x44/0x60

    Instruction dump:

    5400cffe 0f000000 80c4001c 7d1cf214 3c000010 3d200020 80a8001c 60000100

    81660000 61290200 81460004 3906001c <916a0000> 914b0004 90060000 91260004


    [CE283FB0] [C0042E80] kthread+0xd4/0x110
    ?

    CE283FB0:棧地址;

    C0042E80:棧上保存的LR值,即函數返回地址。

    kthread:函數名;

    0xd4/0x110:異常指令偏移/調用函數長度。

    ?

    static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects, int node)

    ?

    從調用棧上看,內核在drain_array中調用free_block出現異常,查看free_block原型,對比入棧參數(CF528000 C09EA500 EFFEAD20 CF5188A0),可以發現int nr_objects, int node明顯異常,可能推斷調用棧可能已經被踩。

    ?

    指令碼

    Instruction dump:
    5400cffe 0f000000 80c4001c 7d1cf214 3c000010 3d200020 80a8001c 60000100
    81660000 61290200 81460004 3906001c <916a0000> 914b0004 90060000 91260004

    ??????? Instruction dump打印出NIP附近的指令字節碼。其中<916a0000>為NIP的指令碼。

    反匯編定位

    objump -dS vmlinux > /tmp/kernel.s

    通過查找<916a0000>對應的C代碼,確定具體那句C代碼出現異常。

    其中vmlinux為已打開調試信息的,與故障相同版本的內核鏡像。

    ?

    2.MIPS小系統內核異常分析

    ?

    2.1? 異常打印

    0:Oops[#1]:

    ? 0:Cpu 0

    ? 0:Show thread info from vcpu 0

    ? 0: VCPU?? Stack bottom????? Task?????????? ???????Ti at

    ? 0:? 0??? c000000595057fe0??? swapper????????????? c000000595054000

    ? 0:Thread info( c000000595054000 ):

    ? 0:??? Process swapper (pid: 1)

    ? 0:? exec_domain ffffffffc0f299b0

    ? 0:? flags 100000

    ? 0:? tp_value 0

    ? 0:? cpu 0

    ? 0:? preempt_count 2

    ? 0:? regs (null)

    ? 0:STACK_END_MAGIC at va( c000000595054068 ): 57AC6E9D( =? 57AC6E9D)

    ? 0:

    ? 0:$ 0?? :? 0: 0000000000000000? 0: 0000000000000000? 0: 0000000000000000? 0: 0000000000000001? 0:

    ? 0:$ 4?? :? 0: 0000000000000000? 0: 0000000000000000? 0: ffffffffffffffff? 0: 0000000000002976? 0:

    ? 0:$ 8?? :? 0: 0000000000007fff? 0: 000000000000000a? 0: 5f73746172747570? 0: 000000000000006c? 0:

    ? 0:$12?? :? 0: 0000000000000068? 0: 000000000000004c? 0: ffffffffc10bc384? 0: c000000593338000? 0:

    ? 0:$16?? :? 0: 0000000000000000? 0: ffffffffc10e42b8? 0: ffffffffc10e0000? 0: ffffffffc10e0000? 0:

    ? 0:$20?? :? 0: 0000000000000000? 0: 0000000000000080? 0: 0000000000000080? 0: 0000000000000000? 0:

    ? 0:$24?? :? 0: 0000000000000006? 0: ffffffffc06501a8? 0:???????????????? ??0:?????????????????? 0:

    ? 0:$28?? :? 0: c000000595054000? 0: c000000595057c88? 0: 0000000000000000? 0: ffffffffc087bf40? 0:

    ? 0:Hi??? : 0000000000000000

    ? 0:Lo??? : 0000000000000000?

    0:epc?? : ffffffffc087c4b4 _bcore_cleanup+0x34/0x190

    ? 0:??? Not tainted

    ? 0:ra??? : ffffffffc087bf40 _init+0x3e8/0x480

    ? 0:Status: 5400ffe3????? 0:KX?? 0:SX?? 0:UX?? 0:KERNEL?? 0:EXL?? 0:IE?? 0:

    ? 0:Cause : 00800008

    ? 0:BadVA : 0000000000000008

    ? 0:PrId? : 000c1102 (XLP316?? A2? )

    ? 0:<d>Modules linked in:? 0:

    ? 0:Process swapper (pid: 1, threadinfo=c000000595054000, task=c000000595053898, tls=0000000000000000)

    ? 0:Stack :? 0: ffffffffffffffff? 0: ffffffffc10e0000? 0: c000000595193240? 0: 0000000000000001? 0:

    ???????? 0: ffffffffc104365c? 0: ffffffffc087bf40? 0: 000001fac104365c? 0: ffffffffc087cb30? 0:

    ???????? 0: ffffffffc087c3a8? 0: 0000000000000000? 0: ffffffffc0f4a778? 0: c000000595193000? 0:

    ???????? 0: c000000595193240? 0: 0000000000000001? 0: ffffffffc10e0000? 0: c000000595193240? 0:

    ???????? 0: 0000000000000001? 0: ffffffffc104365c? 0: 0000000000000000? 0: 0000000000000080? 0:

    ???????? 0: 0000000000000080? 0: ffffffffc1043c44? 0: 00008a17bc300000? 0: ffffffffc10e0000? 0:

    ???????? 0: c00000059333dd40? 0: 0000000000000000? 0: 3800000000000000? 0: 0000000000000000? 0:

    ???????? 0: 000000009333dd40? 0: ffffffffc1043638? 0: 000000005400ffe0? 0: ffffffffbfff00fe? 0:

    ???????? 0: ffffffffc1070000? 0: ffffffffc1063200? 0: 0000000000000001? 0: ffffffffc104365c? 0:

    ???????? 0: 0000000000000000? 0: 0000000000000080? 0: 0000000000000080? 0: 0000000000000000? 0:

    ???????? 0: ...? 0:

    ? 0:Call Trace: [jiffies: 0xfffff79f]

    ? 0:[<ffffffffc087c4b4>] _bcore_cleanup+0x34/0x190

    ? 0:[<ffffffffc087bf40>] _init+0x3e8/0x480

    ? 0:[<ffffffffc1043c44>] bcmxgs_init_module+0x5e8/0xc00

    ? 0:[<ffffffffc060eebc>] do_one_initcall+0x3c/0x1a0

    ? 0:[<ffffffffc102cc04>] kernel_init+0x220/0x2b8

    ? 0:[<ffffffffc062c730>] kernel_thread_helper+0x10/0x20

    ? 0:

    ? 0:

    Code:? 0: ffbf0028?? 0: 0000802d?? 0: 663142b8?? 0:<dc420008>? 0: 0040f809?? 0: 00000000?? 0: 0202102a? ?0: 1040001d?? 0: 00000000

    ?0:

    ? 0:<4>Disabling lock debugging due to kernel taint


    ?
    2.2? 異常信號

    異常與信號之間的關系:

    2.3? 線程信息分析

    0:Cpu 0:這2個0為當前CPU核ID;

    ??0:Show thread info from vcpu 0

    ? 0: VCPU?? Stack bottom????? Task????????????????? Ti at

    ? 0:? 0??? c000000595057fe0?? ?swapper????????????? c000000595054000

    VCPU:CPU核;

    Stack bottom:棧底指針;

    Task:線程名;

    Ti at:線程thread_info結構體指針;

    ?0:Thread info( c000000595054000 ):

    ? 0:??? Process swapper (pid: 1)

    ? 0:? exec_domain ffffffffc0f299b0

    ? 0:? flags 100000

    ? 0:? tp_value 0

    ? 0:? cpu 0

    ? 0:? preempt_count 2

    ? 0:? regs (null)

    ? 0:STACK_END_MAGIC at va( c000000595054068 ): 57AC6E9D( =? 57AC6E9D)?


    flags :線程標志位,具體標記如下表。此時值為TIF_FIXADE,表示有address errors。
    Thread info( c000000595054000 ):產生異常的線程信息;下面的字段為thread_info結構體中的字段信息。其中,

    preempt_count:為搶占計數。為0時,內核可以安全的執行搶占此線程。不為0,表示當前進程持有鎖不能釋放CPU控制權(不能被搶占)。

    STACK_END_MAGIC:棧底部的魔幻數,可以輔助判斷棧是否被踩。

    #define TIF_SIGPENDING 1 /* signal pending */ #define TIF_NEED_RESCHED 2 /* rescheduling necessary */ #define TIF_SYSCALL_AUDIT 3 /* syscall auditing active */ #define TIF_SECCOMP 4 /* secure computing */ #define TIF_NOTIFY_RESUME 5 /* callback before returning to user */ #define TIF_RESTORE_SIGMASK 9 /* restore signal mask in do_signal() */ #define TIF_USEDFPU 16 /* FPU was used by this task this quantum (SMP) */ #define TIF_POLLING_NRFLAG 17 /* true if poll_idle() is polling TIF_NEED_RESCHED */ #define TIF_MEMDIE 18 #define TIF_FREEZE 19 #define TIF_FIXADE 20 /* Fix address errors in software */ #define TIF_LOGADE 21 /* Log address errors to syslog */ #define TIF_32BIT_REGS 22 /* also implies 16/32 fprs */ #define TIF_32BIT_ADDR 23 /* 32-bit address space (o32/n32) */ #define TIF_FPUBOUND 24 /* thread bound to FPU-full CPU set */ #define TIF_LOAD_WATCH 25 /* If set, load watch registers */ #define TIF_XKPHYS_MEM_EN 26 #define TIF_XKPHYS_IO_EN 27 #define TIF_SYSCALL_TRACE 31 /* syscall trace active */

    ?

    2.4? 寄存器分析?

    ??0:$ 0?? :? 0: 0000000000000000? 0: 0000000000000000? 0: 0000000000000000? 0: 0000000000000001? 0:

    ? 0:$ 4?? :? 0: 0000000000000000? 0: 0000000000000000? 0: ffffffffffffffff? 0: 0000000000002976 ?0:

    ? 0:$ 8?? :? 0: 0000000000007fff? 0: 000000000000000a? 0: 5f73746172747570? 0: 000000000000006c? 0:

    ? 0:$12?? :? 0: 0000000000000068? 0: 000000000000004c? 0: ffffffffc10bc384? 0: c000000593338000? 0:

    ? 0:$16?? :? 0: 0000000000000000? 0: ffffffffc10e42b8? 0: ffffffffc10e0000? 0: ffffffffc10e0000? 0:

    ? 0:$20?? :? 0: 0000000000000000? 0: 0000000000000080? 0: 0000000000000080? 0: 0000000000000000? 0:

    ? 0:$24?? :? 0: 0000000000000006? 0: ffffffffc06501a8? 0:?????????????????? 0:?????????????????? 0:

    ? 0:$28?? :? 0: c000000595054000? 0: c000000595057c88 ?0: 0000000000000000? 0: ffffffffc087bf40? 0:

    ? 0:Hi??? : 0000000000000000

    ? 0:Lo??? : 0000000000000000

    ? 0:epc?? : ffffffffc087c4b4 _bcore_cleanup+0x34/0x190

    ? 0:??? Not tainted

    ? 0:ra??? : ffffffffc087bf40 _init+0x3e8/0x480

    ??0:Status: 5400ffe3????? 0:KX?? 0:SX ??0:UX?? 0:KERNEL?? 0:EXL?? 0:IE?? 0:

    ? 0:Cause : 00800008

    ? 0:BadVA : 0000000000000008

    ? 0:PrId? : 000c1102 (XLP316?? A2? )

    ?

    Mips核心寄存器組有4組,分別是GP, COP0, COP1, COP2。

    其中COP0幾個重要的寄存器解釋如下:

    Status:c0p0狀態cp0_status。其中EXL標示在異常模式中,具體解釋請參照《參考資料6.7 第193頁》

    Cause:00800008,標示 TLB exception(load or instruction fetch)

    BadVA:產生異常的虛擬地址,如地址錯誤、無效的TLB,TLB modified等等。

    2.5? 調用棧分析

    0:Process swapper (pid: 1, threadinfo=c000000595054000, task=c000000595053898, tls=0000000000000000)

    ? 0:Stack :? 0: ffffffffffffffff? 0: ffffffffc10e0000? 0: c000000595193240? 0: 0000000000000001? 0:

    ???????? 0: ffffffffc104365c? 0: ffffffffc087bf40? 0: 000001fac104365c? 0: ffffffffc087cb30? 0:

    ? ???????0: ffffffffc087c3a8? 0: 0000000000000000? 0: ffffffffc0f4a778? 0: c000000595193000? 0:

    ???????? 0: c000000595193240? 0: 0000000000000001? 0: ffffffffc10e0000? 0: c000000595193240? 0:

    ???????? 0: 0000000000000001? 0: ffffffffc104365c? 0: 0000000000000000? 0: 0000000000000080? 0:

    ???????? 0: 0000000000000080? 0: ffffffffc1043c44? 0: 00008a17bc300000? 0: ffffffffc10e0000? 0:

    ???????? 0: c00000059333dd40? 0: 0000000000000000? 0: 3800000000000000? 0: 0000000000000000? 0:

    ???????? 0: 000000009333dd40? 0: ffffffffc1043638? 0: 000000005400ffe0? 0: ffffffffbfff00fe? 0:

    ???????? 0: ffffffffc1070000? 0: ffffffffc1063200? 0: 0000000000000001? 0: ffffffffc104365c? 0:

    ???????? 0: 0000000000000000? 0: 0000000000000080? 0: 0000000000000080? 0: 0000000000000000? 0:

    ? ???????0: ...? 0:

    ? 0:Call Trace: [jiffies: 0xfffff79f]

    ? 0:[<ffffffffc087c4b4>] _bcore_cleanup+0x34/0x190

    ? 0:[<ffffffffc087bf40>] _init+0x3e8/0x480

    ? 0:[<ffffffffc1043c44>] bcmxgs_init_module+0x5e8/0xc00

    ? 0:[<ffffffffc060eebc>] do_one_initcall+0x3c/0x1a0

    ? 0:[<ffffffffc102cc04>] kernel_init+0x220/0x2b8

    ? 0:[<ffffffffc062c730>] kernel_thread_helper+0x10/0x20

    ? 0:

    ? 0:

    Code:? 0: ffbf0028?? 0: 0000802d?? 0: 663142b8?? 0:<dc420008>? 0: 0040f809?? 0: 00000000?? 0: 0202102a?? 0: 1040001d?? 0: 00000000

    ?0:


    Call Trace:出現異常線程的調用棧信息。
    Stack:出現異常線程的堆棧信息。

    Code:異常附近的指令碼打印。其中0:<dc420008>為epc處的指令碼,對應代碼位置為(epc?? : ffffffffc087c4b4 _bcore_cleanup+0x34/0x190)。具體代碼需要反匯編定位。

    反匯編定位方法與Powerpc的相同。

    ?

    分析代碼可知,異常由于訪問了BadVA : 0000000000000008的非法地址,查看_bcore_cleanup代碼,可知此時bde指針沒有初始化,是空指針,所以bde->num_devices的地址剛好是0000000000000008,導致異常。

    異常代碼段如下:

    _bcore_cleanup(void)

    {

    ??? for (unit = 0; unit < bde->num_devices(BDE_ALL_DEVICES); unit++)

    ?

    6.參考資料

    6.1???????? http://en.wikipedia.org/wiki/Unix_signal

    6.2???????? http://www.powerlinuxchina.net/club/viewthread.php?tid=981

    6.3???????? 《PowerPC? e500 Application Binary Interface User’s Guide》

    6.4???????? 《PowerPC? e500 Core Family Reference Manual》

    6.5???????? 《MPC8572E PowerQUICC? III Integrated Host Processor Family Reference Manual》

    6.6???????? 《SYSTEM V APPLICATION BINARY INTERFACE – MIPS RISC Processor Supplement》

    6.7???????? 《XLP 300-/300-Lite-Series-Processor Programmer’s Register Reference Guide》

    6.8???????? http://blog.chinaunix.net/uid-16459552-id-3459993.html

    6.9???????? http://blog.chinaunix.net/uid-16459552-id-3257539.html

    6.10???? http://www.linuxspy.info/2249/tainted-kernel/

    ?

    --EOF--

    轉載于:https://www.cnblogs.com/wahaha02/p/5363793.html

    總結

    以上是生活随笔為你收集整理的Linux Kernel Oops异常分析的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。