debugging - 观察 Linux 内核中的变量(内存地址)变化,并在变化时打印堆栈跟踪?

标签 debugging linux-kernel

我想以某种方式“观察”Linux内核(确切地说是内核模块/驱动程序)中的变量(或者内存地址);并找出是什么改变了它 - 基本上,当变量改变时打印出堆栈跟踪。

例如,在内核模块 testjiffy-hr.c列于 this answer 末尾,我想每次 runcount 打印出一个堆栈跟踪可变的变化;希望随后的堆栈跟踪将包含对 testjiffy_timer_function 的提及。 ,这确实是改变该变量的函数。

现在,我知道我可以使用 kgdb连接到在虚拟机中运行的调试 Linux 内核,甚至设置断点(希望也是观察点) - 但问题是我实际上想要调试 ALSA 驱动程序,尤其是播放 dma_area缓冲区(我得到一些意外数据的地方)——对时间高度敏感;并且运行调试内核本身会弄乱时间(更不用说在虚拟机中运行它了)。

这里更大的问题是播放 dma_area指针仅在播放操作期间存在(或者换句话说,在 _start_stop 处理程序之间) - 所以我必须记录 dma_area每个地址 _start回调,然后以某种方式“安排”它在播放操作期间“观看”。

所以我希望有一种方法可以直接在驱动程序代码中执行类似的操作 - 例如,在 _start 中添加一些代码记录 dma_area 的回调指针,并将其用作启动“监视”以进行更改的命令的参数;从相应的回调函数打印堆栈跟踪。 (我知道这也会影响时间,但我希望它足够“轻”,不会过多地影响“实时”驱动程序操作)。

所以我的问题是:是否存在这种在 Linux 内核中进行调试的技术?

如果不是:是否可以设置硬件(或软件)中断,对特定内存地址的更改使用react?那么我可以设置这样一个中断处理程序,可以打印出堆栈跟踪吗? (虽然,我认为当 IRQ 处理程序运行时整个上下文都会发生变化,所以可能会出现错误的堆栈跟踪)?

如果没有:是否还有其他技术可以让我打印更改存储在内核给定内存位置中的值的进程的堆栈跟踪(希望在实时的非调试内核中)?

最佳答案

非常感谢 @CosminRatiu 的回复和 Eugene ;多亏了这些,我发现:

  • debugging - Linux kernel hardware break points - Stack Overflow
  • Hardware Breakpoint (or watchpoint) - The Linux Kernel Archives

  • ...我可以用它来开发我在这里发布的示例,testhrarr.c内核模块/驱动程序和 Makefile (以下)。它演示了硬件观察点跟踪可以通过两种方式实现:使用 perf程序,可以原样探测驱动程序;或通过向驱动程序添加一些硬件断点代码(在示例中,由 HWDEBUG_STACK 定义变量封装)。

    本质上,标准原子变量类型(如整数)(如 runcount 变量)的调试内容很简单,只要它们在内核模块中定义为全局变量,因此它们最终会显示为全局内核符号。因此,下面的代码添加了 testhrarr_作为变量的前缀(以避免命名冲突)。但是,由于需要取消引用,调试数组的内容可能有点棘手——这就是这篇文章所演示的,调试 testhrarr_arr 的第一个字节大批。它是在:
    $ echo `cat /etc/lsb-release` 
    DISTRIB_ID=Ubuntu DISTRIB_RELEASE=11.04 DISTRIB_CODENAME=natty DISTRIB_DESCRIPTION="Ubuntu 11.04"
    $ uname -a
    Linux mypc 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 i686 i386 GNU/Linux
    $ cat /proc/cpuinfo | grep "model name"
    model name  : Intel(R) Atom(TM) CPU N450   @ 1.66GHz
    model name  : Intel(R) Atom(TM) CPU N450   @ 1.66GHz
    
    testhrarr模块基本上在模块初始化时为一个小数组分配内存,设置一个定时器函数,并暴露一个 /proc/testhrarr_proc文件(使用较新的 proc_create 接口(interface))。然后,尝试从 /proc/testhrarr_proc 中读取文件(例如,使用 cat )将触发计时器功能,这将修改 testhrarr_arr数组值,并将消息转储到 /var/log/syslog .我们预计 testhrarr_arr[0]操作过程中会变化3次;一次 testhrarr_startup , 两次在 testhrarr_timer_function (由于包装)。

    使用 perf
    使用 make 构建模块后,你可以加载它:
    sudo insmod ./testhrarr.ko
    

    那时,/var/log/syslog将包含:
    kernel: [40277.199913] Init testhrarr: 0 ; HZ: 250 ; 1/HZ (ms): 4 ; hrres: 0.000000001
    kernel: [40277.199930]  Addresses: _runcount 0xf84be22c ; _arr 0xf84be2a0 ; _arr[0] 0xed182a80 (0xed182a80) ; _timer_function 0xf84bc1c3 ; my_hrtimer 0xf84be260; my_hrt.f 0xf84be27c
    kernel: [40277.220329] HW Breakpoint for testhrarr_arr write installed (0xf84be2a0)
    

    请注意,只需传递 testhrarr_arr作为硬件观察点的符号扫描该变量的地址(0xf84be2a0),而不是数组的第一个元素的地址(0xed182a80)!正因为如此,硬件断点不会被触发——所以行为就像硬件断点代码根本不存在(这可以通过取消定义 HWDEBUG_STACK 来实现)!

    因此,即使没有通过内核模块代码设置硬件断点,我们仍然可以使用 perf观察内存地址的变化 - 在 perf ,我们指定我们想要观察的地址(这里是 testhrarr_arr 的第一个元素的地址,0xed182a80 ),以及应该运行的进程:这里我们运行 bash , 所以我们可以执行 cat /proc/testhrarr_proc这将触发内核模块计时器,然后是 sleep 0.5这将允许计时器完成。 -a参数也是必需的,否则可能会错过一些事件:
    $ sudo perf record -a --call-graph --event=mem:0xed182a80:w bash -c 'cat /proc/testhrarr_proc ; sleep 0.5'
    testhrarr proc: startup
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.485 MB perf.data (~21172 samples) ]
    

    此时,/var/log/syslog还将包含以下内容:
    
    [40822.114964]  testhrarr_timer_function: testhrarr_runcount 0 
    [40822.114980]  testhrarr jiffies 10130528 ; ret: 1 ; ktnsec: 40822114975062
    [40822.118956]  testhrarr_timer_function: testhrarr_runcount 1 
    [40822.118977]  testhrarr jiffies 10130529 ; ret: 1 ; ktnsec: 40822118973195
    [40822.122940]  testhrarr_timer_function: testhrarr_runcount 2 
    [40822.122956]  testhrarr jiffies 10130530 ; ret: 1 ; ktnsec: 40822122951143
    [40822.126962]  testhrarr_timer_function: testhrarr_runcount 3 
    [40822.126978]  testhrarr jiffies 10130531 ; ret: 1 ; ktnsec: 40822126973583
    [40822.130941]  testhrarr_timer_function: testhrarr_runcount 4 
    [40822.130961]  testhrarr jiffies 10130532 ; ret: 1 ; ktnsec: 40822130955167
    [40822.134940]  testhrarr_timer_function: testhrarr_runcount 5 
    [40822.134962]  testhrarr jiffies 10130533 ; ret: 1 ; ktnsec: 40822134958888
    [40822.138936]  testhrarr_timer_function: testhrarr_runcount 6 
    [40822.138958]  testhrarr jiffies 10130534 ; ret: 1 ; ktnsec: 40822138955693
    [40822.142940]  testhrarr_timer_function: testhrarr_runcount 7 
    [40822.142962]  testhrarr jiffies 10130535 ; ret: 1 ; ktnsec: 40822142959345
    [40822.146936]  testhrarr_timer_function: testhrarr_runcount 8 
    [40822.146957]  testhrarr jiffies 10130536 ; ret: 1 ; ktnsec: 40822146954479
    [40822.150949]  testhrarr_timer_function: testhrarr_runcount 9 
    [40822.150970]  testhrarr jiffies 10130537 ; ret: 1 ; ktnsec: 40822150963438
    [40822.154974]  testhrarr_timer_function: testhrarr_runcount 10 
    [40822.154988] testhrarr [ 5, 7, 9, 11, 13, ]
    

    To read the capture of perf (a file called perf.data) we can use:

    $ sudo perf report --call-graph flat --stdio
    No kallsyms or vmlinux with build-id 5031df4d8668bcc45a7bdb4023909c6f8e2d3d34 was found
    [testhrarr] with build id 5031df4d8668bcc45a7bdb4023909c6f8e2d3d34 not found, continuing without symbols
    Failed to open /bin/cat, continuing without symbols
    Failed to open /usr/lib/libpixman-1.so.0.20.2, continuing without symbols
    Failed to open /usr/lib/xorg/modules/drivers/intel_drv.so, continuing without symbols
    Failed to open /usr/bin/Xorg, continuing without symbols
    # Events: 5  unknown
    #
    # Overhead  Command  Shared Object                                Symbol
    # ........  .......  .............  ....................................
    #
        87.50%     Xorg  [testhrarr]    [k] testhrarr_timer_function
                87.50%
                    testhrarr_timer_function
                    __run_hrtimer
                    hrtimer_interrupt
                    smp_apic_timer_interrupt
                    apic_timer_interrupt
                    0x30185d
                    0x2ed701
                    0x2ed8cc
                    0x2edba0
                    0x9d0386
                    0x8126fc8
                    0x81217a1
                    0x811bdd3
                    0x8070aa7
                    0x806281c
                    __libc_start_main
                    0x8062411
    
         6.25%      cat  [testhrarr]    [k] testhrarr_timer_function
                 6.25%
                    testhrarr_timer_function
                    testhrarr_proc_show
                    seq_read
                    proc_reg_read
                    vfs_read
                    sys_read
                    syscall_call
                    0xaa2416
                    0x8049f4d
                    __libc_start_main
                    0x8049081
    
         3.12%  swapper  [testhrarr]    [k] testhrarr_timer_function
                 3.12%
                    testhrarr_timer_function
                    __run_hrtimer
                    hrtimer_interrupt
                    smp_apic_timer_interrupt
                    apic_timer_interrupt
                    cpuidle_idle_call
                    cpu_idle
                    start_secondary
    
         3.12%      cat  [testhrarr]    [k] 0x356   
                 3.12%
                    0xf84bc356
                    0xf84bc3a7
                    seq_read
                    proc_reg_read
                    vfs_read
                    sys_read
                    syscall_call
                    0xaa2416
                    0x8049f4d
                    __libc_start_main
                    0x8049081
    
    
    
    #
    # (For a higher level overview, try: perf report --sort comm,dso)
    #
    

    So, since we're building the kernel module with debugging on (-g in the Makefile), it is not a problem for perf to find this module's symbols, even if the live kernel is not a debug kernel. So it correctly interprets testhrarr_timer_function as the setter most of the time, although it doesn't report testhrarr_startup (but it does report testhrarr_proc_show which calls it). There are also references to 0xf84bc3a7 and 0xf84bc356 which it couldn't resolve; however, note that the module is loaded at 0xf84bc000:

    $ sudo cat /proc/modules | grep testhr
    testhrarr 13433 0 - Live 0xf84bc000
    

    ...并且该条目也以 ...[k] 0x356 开头;如果我们查看 objdump内核模块:
    $ objdump -S testhrarr.ko | less
    ...
    00000323 :
    
    static void testhrarr_startup(void)
    {
    ...
        testhrarr_arr[0] = 0; //just the first element
     34b:   a1 80 00 00 00          mov    0x80,%eax
     350:   c7 00 00 00 00 00       movl   $0x0,(%eax)
        hrtimer_start(&my_hrtimer, ktime_period_ns, HRTIMER_MODE_REL);
     356:   c7 04 24 01 00 00 00    movl   $0x1,(%esp)                     **********
     35d:   8b 15 1c 00 00 00       mov    0x1c,%edx
    ...
    00000375 :
    
    
    static int testhrarr_proc_show(struct seq_file *m, void *v) {
    ...
        seq_printf(m, "testhrarr proc: startup\n");
     38f:   c7 44 24 04 79 00 00    movl   $0x79,0x4(%esp)
     396:   00 
     397:   8b 45 fc                mov    -0x4(%ebp),%eax
     39a:   89 04 24                mov    %eax,(%esp)
     39d:   e8 fc ff ff ff          call   39e 
        testhrarr_startup();
     3a2:   e8 7c ff ff ff          call   323 
     3a7:   eb 1c                   jmp    3c5   **********
      } else {
        seq_printf(m, "testhrarr proc: (is running, %d)\n", testhrarr_runcount);
     3a9:   a1 0c 00 00 00          mov    0xc,%eax
    ...
    

    ... so 0xf84bc356 apparently refers to hrtimer_start; and 0xf84bc3a7 -> 3a7 refers to its calling testhrarr_proc_show function; which thankfully makes sense. (Note that I've experienced with different versions of the driver, that the _start could show, and the timer_function to be expressed by sheer addresses; not sure what this is due).

    One problem with perf, though, is that it gives me a statistical "Overhead" of these functions occurring (not sure what that refers to - probably time spent between entry and exit of a function?) - but what I want, really, is a log of stack traces which is sequential. Not sure if perf can be set up for that - but it definitely be done with kernel module code for hardware breakpoints.

    using kernel module HW breakpoint

    The code which is in the HWDEBUG_STACK implements the HW breakpoint setup and handling. As noted, the default set up for the symbol ksym_name (if unspecified), is testhrarr_arr, which doesn't trigger the hardware breakpoint at all. The ksym_name parameter can be specified on the command line during insmod; here we can note that:

    $ sudo rmmod testhrarr    # remove module if still loaded
    $ sudo insmod ./testhrarr.ko ksym=testhrarr_arr[0]
    

    ... 结果为 HW Breakpoint for testhrarr_arr[0] write installed (0x (null))/var/log/syslog ; - 这意味着我们不能使用带括号符号的符号进行数组访问;谢天谢地,这里的空指针只是意味着硬件断点不会再次触发;它不会使操作系统完全崩溃 :)
    但是,有一个全局变量用于引用 testhrarr_arr 的第一个元素。数组,称为 testhrarr_arr_first - 注意这个全局变量在代码中是如何特殊处理的,需要解引用,这样才能得到正确的地址。所以我们这样做:
    $ sudo rmmod testhrarr    # remove module if still loaded
    $ sudo insmod ./testhrarr.ko ksym=testhrarr_arr_first
    

    ...并且系统日志通知:
    kernel: [43910.509726] Init testhrarr: 0 ; HZ: 250 ; 1/HZ (ms): 4 ; hrres: 0.000000001
    kernel: [43910.509765]  Addresses: _runcount 0xf84be22c ; _arr 0xf84be2a0 ; _arr[0] 0xedf6c5c0 (0xedf6c5c0) ; _timer_function 0xf84bc1c3 ; my_hrtimer 0xf84be260; my_hrt.f 0xf84be27c
    kernel: [43910.538535] HW Breakpoint for testhrarr_arr_first write installed (0xedf6c5c0)
    

    ...我们可以看到硬件断点设置在0xedf6c5c0 ,即testhrarr_arr[0]的地址.现在,如果我们通过 /proc 触发驱动程序文件:
    $ cat /proc/testhrarr_proc 
    testhrarr proc: startup
    

    ...我们在 syslog 中获得:
    kernel: [44069.735695] testhrarr_arr_first value is changed
    [44069.735711] Pid: 29320, comm: cat Not tainted 2.6.38-16-generic #67-Ubuntu
    [44069.735719] Call Trace:
    [44069.735737]  [] ? sample_hbp_handler+0x2d/0x3b [testhrarr]
    [44069.735755]  [] ? __perf_event_overflow+0x90/0x240
    [44069.735768]  [] ? proc_alloc_inode+0x23/0x90
    [44069.735778]  [] ? proc_alloc_inode+0x23/0x90
    [44069.735790]  [] ? perf_swevent_event+0x136/0x140
    [44069.735801]  [] ? perf_bp_event+0x70/0x80
    [44069.735812]  [] ? prep_new_page+0x110/0x1a0
    [44069.735824]  [] ? get_page_from_freelist+0x12e/0x320
    [44069.735836]  [] ? seq_open+0x3d/0xa0
    [44069.735848]  [] ? hw_breakpoint_handler.clone.0+0x102/0x130
    [44069.735861]  [] ? hw_breakpoint_exceptions_notify+0x22/0x30
    [44069.735872]  [] ? notifier_call_chain+0x45/0x60
    [44069.735883]  [] ? atomic_notifier_call_chain+0x22/0x30
    [44069.735894]  [] ? notify_die+0x2d/0x30
    [44069.735904]  [] ? do_debug+0x88/0x180
    [44069.735915]  [] ? debug_stack_correct+0x30/0x38
    [44069.735928]  [] ? testhrarr_startup+0x33/0x52 [testhrarr]
    [44069.735940]  [] ? testhrarr_proc_show+0x32/0x57 [testhrarr]
    [44069.735952]  [] ? seq_read+0x145/0x390
    [44069.735963]  [] ? seq_read+0x0/0x390
    [44069.735973]  [] ? proc_reg_read+0x64/0xa0
    [44069.735985]  [] ? vfs_read+0x9f/0x160
    [44069.735995]  [] ? proc_reg_read+0x0/0xa0
    [44069.736003]  [] ? sys_read+0x42/0x70
    [44069.736013]  [] ? syscall_call+0x7/0xb
    [44069.736019] Dump stack from sample_hbp_handler
    [44069.740132]  testhrarr_timer_function: testhrarr_runcount 0 
    [44069.740146]  testhrarr jiffies 10942435 ; ret: 1 ; ktnsec: 44069740142485
    [44069.740159] testhrarr_arr_first value is changed
    [44069.740169] Pid: 4302, comm: gnome-terminal Not tainted 2.6.38-16-generic #67-Ubuntu
    [44069.740176] Call Trace:
    [44069.740195]  [] ? sample_hbp_handler+0x2d/0x3b [testhrarr]
    [44069.740213]  [] ? __perf_event_overflow+0x90/0x240
    [44069.740227]  [] ? perf_swevent_event+0x136/0x140
    [44069.740239]  [] ? perf_bp_event+0x70/0x80
    [44069.740253]  [] ? sched_clock_local+0xd3/0x1c0
    [44069.740267]  [] ? format_decode+0x323/0x380
    [44069.740280]  [] ? hw_breakpoint_handler.clone.0+0x102/0x130
    [44069.740292]  [] ? hw_breakpoint_exceptions_notify+0x22/0x30
    [44069.740302]  [] ? notifier_call_chain+0x45/0x60
    [44069.740313]  [] ? atomic_notifier_call_chain+0x22/0x30
    [44069.740324]  [] ? notify_die+0x2d/0x30
    [44069.740335]  [] ? do_debug+0x88/0x180
    [44069.740345]  [] ? debug_stack_correct+0x30/0x38
    [44069.740364]  [] ? init_intel_cacheinfo+0x103/0x394
    [44069.740379]  [] ? testhrarr_timer_function+0xed/0x160 [testhrarr]
    [44069.740391]  [] ? __run_hrtimer+0x6f/0x190
    [44069.740404]  [] ? testhrarr_timer_function+0x0/0x160 [testhrarr]
    [44069.740416]  [] ? hrtimer_interrupt+0x108/0x240
    [44069.740430]  [] ? smp_apic_timer_interrupt+0x56/0x8a
    [44069.740441]  [] ? apic_timer_interrupt+0x31/0x38
    [44069.740453]  [] ? _raw_spin_unlock_irqrestore+0x15/0x20
    [44069.740465]  [] ? try_to_del_timer_sync+0x67/0xb0
    [44069.740476]  [] ? del_timer_sync+0x29/0x50
    [44069.740486]  [] ? flush_delayed_work+0x13/0x40
    [44069.740500]  [] ? tty_flush_to_ldisc+0x12/0x20
    [44069.740510]  [] ? n_tty_poll+0x4f/0x190
    [44069.740523]  [] ? tty_poll+0x6d/0x90
    [44069.740531]  [] ? n_tty_poll+0x0/0x190
    [44069.740542]  [] ? do_poll.clone.3+0xd0/0x210
    [44069.740553]  [] ? do_sys_poll+0x134/0x1e0
    [44069.740563]  [] ? __pollwait+0x0/0xd0
    [44069.740572]  [] ? pollwake+0x0/0x60
    ...
    [44069.740742]  [] ? pollwake+0x0/0x60
    [44069.740757]  [] ? rw_verify_area+0x6c/0x130
    [44069.740770]  [] ? ktime_get_ts+0xf8/0x120
    [44069.740781]  [] ? poll_select_set_timeout+0x64/0x70
    [44069.740793]  [] ? sys_poll+0x5a/0xd0
    [44069.740804]  [] ? syscall_call+0x7/0xb
    [44069.740815]  [] ? init_intel_cacheinfo+0x23/0x394
    [44069.740822] Dump stack from sample_hbp_handler
    [44069.744130]  testhrarr_timer_function: testhrarr_runcount 1 
    [44069.744143]  testhrarr jiffies 10942436 ; ret: 1 ; ktnsec: 44069744140055
    [44069.748132]  testhrarr_timer_function: testhrarr_runcount 2 
    [44069.748145]  testhrarr jiffies 10942437 ; ret: 1 ; ktnsec: 44069748141271
    [44069.752131]  testhrarr_timer_function: testhrarr_runcount 3 
    [44069.752145]  testhrarr jiffies 10942438 ; ret: 1 ; ktnsec: 44069752141164
    [44069.756131]  testhrarr_timer_function: testhrarr_runcount 4 
    [44069.756141]  testhrarr jiffies 10942439 ; ret: 1 ; ktnsec: 44069756138318
    [44069.760130]  testhrarr_timer_function: testhrarr_runcount 5 
    [44069.760141]  testhrarr jiffies 10942440 ; ret: 1 ; ktnsec: 44069760138469
    [44069.760154] testhrarr_arr_first value is changed
    [44069.760164] Pid: 4302, comm: gnome-terminal Not tainted 2.6.38-16-generic #67-Ubuntu
    [44069.760170] Call Trace:
    [44069.760187]  [] ? sample_hbp_handler+0x2d/0x3b [testhrarr]
    [44069.760202]  [] ? __perf_event_overflow+0x90/0x240
    [44069.760213]  [] ? perf_swevent_event+0x136/0x140
    [44069.760224]  [] ? perf_bp_event+0x70/0x80
    [44069.760235]  [] ? sched_clock_local+0xd3/0x1c0
    [44069.760247]  [] ? format_decode+0x323/0x380
    [44069.760258]  [] ? hw_breakpoint_handler.clone.0+0x102/0x130
    [44069.760269]  [] ? hw_breakpoint_exceptions_notify+0x22/0x30
    [44069.760279]  [] ? notifier_call_chain+0x45/0x60
    [44069.760289]  [] ? atomic_notifier_call_chain+0x22/0x30
    [44069.760299]  [] ? notify_die+0x2d/0x30
    [44069.760308]  [] ? do_debug+0x88/0x180
    [44069.760318]  [] ? debug_stack_correct+0x30/0x38
    [44069.760334]  [] ? init_intel_cacheinfo+0x103/0x394
    [44069.760345]  [] ? testhrarr_timer_function+0xed/0x160 [testhrarr]
    [44069.760356]  [] ? __run_hrtimer+0x6f/0x190
    [44069.760366]  [] ? send_to_group.clone.1+0xf8/0x150
    [44069.760376]  [] ? testhrarr_timer_function+0x0/0x160 [testhrarr]
    [44069.760387]  [] ? hrtimer_interrupt+0x108/0x240
    [44069.760396]  [] ? fsnotify+0x1a5/0x290
    [44069.760407]  [] ? smp_apic_timer_interrupt+0x56/0x8a
    [44069.760416]  [] ? apic_timer_interrupt+0x31/0x38
    [44069.760428]  [] ? mem_cgroup_resize_limit+0x108/0x1c0
    [44069.760437]  [] ? fput+0x0/0x30
    [44069.760446]  [] ? sys_write+0x67/0x70
    [44069.760455]  [] ? syscall_call+0x7/0xb
    [44069.760464]  [] ? init_intel_cacheinfo+0x23/0x394
    [44069.760470] Dump stack from sample_hbp_handler
    [44069.764134]  testhrarr_timer_function: testhrarr_runcount 6 
    [44069.764147]  testhrarr jiffies 10942441 ; ret: 1 ; ktnsec: 44069764144141
    [44069.768133]  testhrarr_timer_function: testhrarr_runcount 7 
    [44069.768146]  testhrarr jiffies 10942442 ; ret: 1 ; ktnsec: 44069768142976
    [44069.772134]  testhrarr_timer_function: testhrarr_runcount 8 
    [44069.772148]  testhrarr jiffies 10942443 ; ret: 1 ; ktnsec: 44069772144121
    [44069.776132]  testhrarr_timer_function: testhrarr_runcount 9 
    [44069.776145]  testhrarr jiffies 10942444 ; ret: 1 ; ktnsec: 44069776141971
    [44069.780133]  testhrarr_timer_function: testhrarr_runcount 10 
    [44069.780141] testhrarr [ 5, 7, 9, 11, 13, ]

    ... we get a stack trace exactly three times - once during testhrarr_startup, and twice in testhrarr_timer_function: once for runcount==0 and once for runcount==5, as expected.

    Well, hope this helps someone,
    Cheers!


    Makefile

    CONFIG_MODULE_FORCE_UNLOAD=y
    
    # debug build:
    # "CFLAGS was changed ... Fix it to use EXTRA_CFLAGS."
    override EXTRA_CFLAGS+=-g -O0 
    
    obj-m += testhrarr.o
    #testhrarr-objs  := testhrarr.o
    
    all:
        @echo EXTRA_CFLAGS = $(EXTRA_CFLAGS)
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
    
    clean:
        make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
    

    testhrarr.c
    /*
     * [http://www.tldp.org/LDP/lkmpg/2.6/html/lkmpg.html#AEN189 The Linux Kernel Module Programming Guide]
     * https://stackoverflow.com/questions/16920238/reliability-of-linux-kernel-add-timer-at-resolution-of-one-jiffy/17055867#17055867
     * https://stackoverflow.com/questions/8516021/proc-create-example-for-kernel-module/18924359#18924359
     * http://lxr.free-electrons.com/source/samples/hw_breakpoint/data_breakpoint.c
     */
    
    
    #include <linux/module.h>   /* Needed by all modules */
    #include <linux/kernel.h>   /* Needed for KERN_INFO */
    #include <linux/init.h>     /* Needed for the macros */
    #include <linux/jiffies.h>
    #include <linux/time.h>
    #include <linux/proc_fs.h>  /* /proc entry */
    #include <linux/seq_file.h> /* /proc entry */
    #define ARRSIZE 5
    #define MAXRUNS 2*ARRSIZE
    
    #include <linux/hrtimer.h>
    
    #define HWDEBUG_STACK 1
    
    #if (HWDEBUG_STACK == 1)
    #include <linux/perf_event.h>
    #include <linux/hw_breakpoint.h>
    
    struct perf_event * __percpu *sample_hbp;
    static char ksym_name[KSYM_NAME_LEN] = "testhrarr_arr";
    module_param_string(ksym, ksym_name, KSYM_NAME_LEN, S_IRUGO);
    MODULE_PARM_DESC(ksym, "Kernel symbol to monitor; this module will report any"
          " write operations on the kernel symbol");
    #endif
    
    static volatile int testhrarr_runcount = 0;
    static volatile int testhrarr_isRunning = 0;
    
    static unsigned long period_ms;
    static unsigned long period_ns;
    static ktime_t ktime_period_ns;
    static struct hrtimer my_hrtimer;
    
    static int* testhrarr_arr;
    static int* testhrarr_arr_first;
    
    static enum hrtimer_restart testhrarr_timer_function(struct hrtimer *timer)
    {
      unsigned long tjnow;
      ktime_t kt_now;
      int ret_overrun;
    
      printk(KERN_INFO
        " %s: testhrarr_runcount %d \n",
        __func__, testhrarr_runcount);
    
      if (testhrarr_runcount < MAXRUNS) {
        tjnow = jiffies;
        kt_now = hrtimer_cb_get_time(&my_hrtimer);
        ret_overrun = hrtimer_forward(&my_hrtimer, kt_now, ktime_period_ns);
        printk(KERN_INFO
          " testhrarr jiffies %lu ; ret: %d ; ktnsec: %lld\n",
          tjnow, ret_overrun, ktime_to_ns(kt_now));
        testhrarr_arr[(testhrarr_runcount % ARRSIZE)] += testhrarr_runcount;
        testhrarr_runcount++;
        return HRTIMER_RESTART;
      }
      else {
        int i;
        testhrarr_isRunning = 0;
        // do not use KERN_DEBUG etc, if printk buffering until newline is desired!
        printk("testhrarr_arr [ ");
        for(i=0; i<ARRSIZE; i++) {
          printk("%d, ", testhrarr_arr[i]);
        }
        printk("]\n");
        return HRTIMER_NORESTART;
      }
    }
    
    static void testhrarr_startup(void)
    {
      if (testhrarr_isRunning == 0) {
        testhrarr_isRunning = 1;
        testhrarr_runcount = 0;
        testhrarr_arr[0] = 0; //just the first element
        hrtimer_start(&my_hrtimer, ktime_period_ns, HRTIMER_MODE_REL);
      }
    }
    
    
    static int testhrarr_proc_show(struct seq_file *m, void *v) {
      if (testhrarr_isRunning == 0) {
        seq_printf(m, "testhrarr proc: startup\n");
        testhrarr_startup();
      } else {
        seq_printf(m, "testhrarr proc: (is running, %d)\n", testhrarr_runcount);
      }
      return 0;
    }
    
    static int testhrarr_proc_open(struct inode *inode, struct  file *file) {
      return single_open(file, testhrarr_proc_show, NULL);
    }
    
    static const struct file_operations testhrarr_proc_fops = {
      .owner = THIS_MODULE,
      .open = testhrarr_proc_open,
      .read = seq_read,
      .llseek = seq_lseek,
      .release = single_release,
    };
    
    
    #if (HWDEBUG_STACK == 1)
    static void sample_hbp_handler(struct perf_event *bp,
                 struct perf_sample_data *data,
                 struct pt_regs *regs)
    {
      printk(KERN_INFO "%s value is changed\n", ksym_name);
      dump_stack();
      printk(KERN_INFO "Dump stack from sample_hbp_handler\n");
    }
    #endif
    
    static int __init testhrarr_init(void)
    {
      struct timespec tp_hr_res;
      #if (HWDEBUG_STACK == 1)
      struct perf_event_attr attr;
      #endif
    
      period_ms = 1000/HZ;
      hrtimer_get_res(CLOCK_MONOTONIC, &tp_hr_res);
      printk(KERN_INFO
        "Init testhrarr: %d ; HZ: %d ; 1/HZ (ms): %ld ; hrres: %lld.%.9ld\n",
                   testhrarr_runcount,      HZ,        period_ms, (long long)tp_hr_res.tv_sec, tp_hr_res.tv_nsec );
    
      testhrarr_arr = (int*)kcalloc(ARRSIZE, sizeof(int), GFP_ATOMIC);
      testhrarr_arr_first = &testhrarr_arr[0];
    
      hrtimer_init(&my_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
      my_hrtimer.function = &testhrarr_timer_function;
      period_ns = period_ms*( (unsigned long)1E6L );
      ktime_period_ns = ktime_set(0,period_ns);
    
      printk(KERN_INFO
        " Addresses: _runcount 0x%p ; _arr 0x%p ; _arr[0] 0x%p (0x%p) ; _timer_function 0x%p ; my_hrtimer 0x%p; my_hrt.f 0x%p\n",
        &testhrarr_runcount, &testhrarr_arr, &(testhrarr_arr[0]), testhrarr_arr_first, &testhrarr_timer_function, &my_hrtimer, &my_hrtimer.function);
    
    
      proc_create("testhrarr_proc", 0, NULL, &testhrarr_proc_fops);
    
    
      #if (HWDEBUG_STACK == 1)
      hw_breakpoint_init(&attr);
      if (strcmp(ksym_name, "testhrarr_arr_first") == 0) {
        // just for testhrarr_arr_first - interpret the found symbol address
        // as int*, and dereference it to get the "real" address it points to
        attr.bp_addr = *((int*)kallsyms_lookup_name(ksym_name));
      } else {
        // the usual - address is kallsyms_lookup_name result
        attr.bp_addr = kallsyms_lookup_name(ksym_name);
      }
      attr.bp_len = HW_BREAKPOINT_LEN_1;
      attr.bp_type = HW_BREAKPOINT_W ; //| HW_BREAKPOINT_R;
    
      sample_hbp = register_wide_hw_breakpoint(&attr, (perf_overflow_handler_t)sample_hbp_handler);
      if (IS_ERR((void __force *)sample_hbp)) {
        int ret = PTR_ERR((void __force *)sample_hbp);
        printk(KERN_INFO "Breakpoint registration failed\n");
        return ret;
      }
    
      // explicit cast needed to show 64-bit bp_addr as 32-bit address
      // https://stackoverflow.com/questions/11796909/how-to-resolve-cast-to-pointer-from-integer-of-different-size-warning-in-c-co/11797103#11797103
      printk(KERN_INFO "HW Breakpoint for %s write installed (0x%p)\n", ksym_name, (void*)(uintptr_t)attr.bp_addr);
      #endif
    
      return 0;
    }
    
    static void __exit testhrarr_exit(void)
    {
      int ret_cancel = 0;
      kfree(testhrarr_arr);
      while( hrtimer_callback_running(&my_hrtimer) ) {
        ret_cancel++;
      }
      if (ret_cancel != 0) {
        printk(KERN_INFO " testhrarr Waited for hrtimer callback to finish (%d)\n", ret_cancel);
      }
      if (hrtimer_active(&my_hrtimer) != 0) {
        ret_cancel = hrtimer_cancel(&my_hrtimer);
        printk(KERN_INFO " testhrarr active hrtimer cancelled: %d (%d)\n", ret_cancel, testhrarr_runcount);
      }
      if (hrtimer_is_queued(&my_hrtimer) != 0) {
        ret_cancel = hrtimer_cancel(&my_hrtimer);
        printk(KERN_INFO " testhrarr queued hrtimer cancelled: %d (%d)\n", ret_cancel, testhrarr_runcount);
      }
      remove_proc_entry("testhrarr_proc", NULL);
      #if (HWDEBUG_STACK == 1)
      unregister_wide_hw_breakpoint(sample_hbp);
      printk(KERN_INFO "HW Breakpoint for %s write uninstalled\n", ksym_name);
      #endif
      printk(KERN_INFO "Exit testhrarr\n");
    }
    
    module_init(testhrarr_init);
    module_exit(testhrarr_exit);
    
    MODULE_LICENSE("GPL");
    

    关于debugging - 观察 Linux 内核中的变量(内存地址)变化,并在变化时打印堆栈跟踪?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19725900/

    相关文章:

    ios - Xcode 中的线程调试

    c - Linux内核中的CTR模式

    wordpress - WordPress调试技术-使用自定义mu插件隐藏错误通知吗?

    c# - 如何调试与低级 API(如 I/O 完成端口)交互的代码?

    c++ - 为什么我从 gdbserver 得到 "not in executable format: Success",但在 gdb 中一切正常?

    ruby-on-rails - 来自 Rails 应用程序的神秘 GET 请求

    memory - 从自由命令理解 "Buffers"和 "Cached"

    linux - 在 linux 中,如何确保不间断地执行一系列代码

    c - 如何将数据包从 NF_INET_PRE_ROUTING 移动到 NF_INET_POST_ROUTING?

    c - 如何理解这个 dmesg 错误信息?