linux - 使用 ftrace 和 kprobes(通过使用虚拟地址转换)捕获用户空间程序集?

标签 linux debugging linux-kernel ftrace

为冗长的帖子道歉,我无法以较短的方式制定它。另外,也许这更适合 Unix 和 Linux Stack Exchange,但我会先在这里尝试,因为有一个 ftrace标签。

无论如何 - 我想观察用户程序的机器指令是否在完整 function_graph 的上下文中执行使用 ftrace 捕获.一个问题是我需要这个用于较旧的内核:

$ uname -a
Linux mypc 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 i686 i386 GNU/Linux

...而在这个版本中,没有 UPROBES - 其中,如 Uprobes in 3.5 [LWN.net]笔记,应该能够做这样的事情。 (只要我不必修补原始内核,我就愿意尝试用树构建的内核模块,正如 User-Space Probes (Uprobes) [chunghwan.com] 似乎证明的那样;但据我从 0: Inode based uprobes [LWN.net] 中看到的, 2.6 可能需要一个完整的补丁)

然而,在这个版本上,有一个 /sys/kernel/debug/kprobes , 和 /sys/kernel/debug/tracing/kprobe_events ;和 Documentation/trace/kprobetrace.txt意味着可以直接在地址上设置 kprobe;即使我在任何地方都找不到有关如何使用它的示例。

无论如何,我仍然不确定要使用哪些地址 - 作为一个小例子,假设我想跟踪 main 的开始。 wtest.c的功能程序(包括在下面)。我可以这样做来编译并获取机器指令汇编列表:
$ gcc -g -O0 wtest.c -o wtest
$ objdump -S wtest | less
...
08048474 <main>:
int main(void) {
 8048474:       55                      push   %ebp
 8048475:       89 e5                   mov    %esp,%ebp
 8048477:       83 e4 f0                and    $0xfffffff0,%esp
 804847a:       83 ec 30                sub    $0x30,%esp
 804847d:       65 a1 14 00 00 00       mov    %gs:0x14,%eax
 8048483:       89 44 24 2c             mov    %eax,0x2c(%esp)
 8048487:       31 c0                   xor    %eax,%eax
  char filename[] = "/tmp/wtest.txt";
...
  return 0;
 804850a:       b8 00 00 00 00          mov    $0x0,%eax
}
...

我将通过此脚本设置 ftrace 日志记录:
sudo bash -c '
KDBGPATH="/sys/kernel/debug/tracing"
echo function_graph > $KDBGPATH/current_tracer
echo funcgraph-abstime > $KDBGPATH/trace_options
echo funcgraph-proc > $KDBGPATH/trace_options
echo 0 > $KDBGPATH/tracing_on
echo > $KDBGPATH/trace
echo 1 > $KDBGPATH/tracing_on ; ./wtest ; echo 0 > $KDBGPATH/tracing_on
cat $KDBGPATH/trace > wtest.ftrace
'

您可以看到生成的(否则很复杂)的一部分 ftrace登录 debugging - Observing a hard-disk write in kernel space (with drivers/modules) - Unix & Linux Stack Exchange (我从那里得到的例子)。

基本上,我想要一个打印输出 ftrace日志,当main的第一条指令- 例如,0x8048474、0x8048475、0x8048477、0x804847a、0x804847d、0x8048483 和 0x8048487 处的指令 - 由(任何)CPU 执行。问题是,据我所知 Anatomy of a Program in Memory : Gustavo Duarte ,这些地址是虚拟地址,从进程本身的角度来看(我收集,相同的角度显示为 /proc/PID/maps )......显然,对于 krpobe_event我需要一个物理地址?

所以,我的想法是:如果我能找到与程序反汇编的虚拟地址相对应的物理地址(比如通过编写一个内核模块,它会接受 pid 和地址,并通过 procfs 返回物理地址),我可以设置通过 /sys/kernel/debug/tracing/kprobe_events 将地址作为一种“跟踪点”在上面的脚本中 - 希望将它们放入 ftrace日志。原则上,这可行吗?

一个问题,我在 Linux(ubuntu), C language: Virtual to Physical Address Translation - Stack Overflow 上发现:

In user code, you can't know the physical address corresponding to a virtual address. This is information is simply not exported outside the kernel. It could even change at any time, especially if the kernel decides to swap out part of your process's memory.
...
Pass the virtual address to the kernel using systemcall/procfs and use vmalloc_to_pfn. Return the Physical address through procfs/registers.



然而,vmalloc_to_pfn似乎也不是微不足道的:

x86 64 - vmalloc_to_pfn returns 32 bit address on Linux 32 system. Why does it chop off higher bits of PAE physical address? - Stack Overflow

VA: 0xf8ab87fc PA using vmalloc_to_pfn: 0x36f7f7fc. But I'm actually expecting: 0x136f7f7fc.
...
The physical address falls between 4 to 5 GB. But I can't get the exact physical address, I only get the chopped off 32-bit address. Is there another way to get true physical address?



所以,我不确定我提取物理地址的可靠性如何,以便 kprobes 跟踪它们 - 特别是因为“它甚至可以随时更改”。但在这里,我希望由于程序小而琐碎,程序在被跟踪时有可能不会交换,从而获得适当的捕获。 (因此,即使我必须多次运行上面的调试脚本,只要我希望在 10 次(甚至 100 次)中获得一次“正确”的捕获,我就可以接受。)。

请注意,我希望通过 ftrace 输出,以便时间戳在同一域中表示(有关时间戳问题的说明,请参阅 Reliable Linux kernel timestamps (or adjustment thereof) with both usbmon and ftrace? - Stack Overflow)。因此,即使我能想出一个 gdb脚本,从用户空间运行和跟踪程序(同时获得 ftrace 捕获) - 我想避免这种情况,因为来自 gdb 的开销本身将显示在 ftrace日志。

所以,总结一下:
  • 从虚拟(从可执行文件的反汇编)地址获取(可能通过单独的内核模块)物理地址的方法 - 因此它们用于触发由 ftrace 记录的 kprobe_event - 值得追求吗?如果是这样,是否有任何内核模块示例可用于此地址转换目的?
  • 在执行特定内存地址时,我是否可以使用内核模块“注册”回调/处理程序函数?然后我可以简单地使用 trace_printk在该函数中有一个 ftrace日志(或者即使没有,处理程序函数名称本身应该显示在 ftrace 日志中),并且似乎不会有太多开销...

  • 实际上,在 2007 年的帖子中,Jim Keniston - utrace-based uprobes: systemtap mailing list ,有一个11. Uprobes Example (添加到 Documentation/uprobes.txt ),这似乎正是 - 注册处理程序函数的内核模块。不幸的是,它使用 linux/uprobes.h ;而我只有 kprobes.h在我的 /usr/src/linux-headers-2.6.38-16/include/linux/ .此外,在我的系统上,甚至 systemtap投诉 CONFIG_UTRACE未启用(请参阅 this comment )...因此,如果我可以使用任何其他方法来获取我想要的调试跟踪,而无需重新编译内核以获得 uprobes,那么知道...
    wtest.c :
    #include <stdio.h>
    #include <fcntl.h>  // O_CREAT, O_WRONLY, S_IRUSR
    
    int main(void) {
      char filename[] = "/tmp/wtest.txt";
      char buffer[] = "abcd";
      int fd;
      mode_t perms = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH;
    
      fd = open(filename, O_RDWR|O_CREAT, perms);
      write(fd,buffer,4);
      close(fd);
    
      return 0;
    }
    

    最佳答案

    显然,使用内核 3.5+ 上的内置 uprobes 会容易得多;但是考虑到我的内核 2.6.38 的 uprobes 是一个非常深入的补丁(我无法真正将其隔离在单独的内核模块中,以避免修补内核),以下是我可以注意的独立模块在 2.6.38。 (由于我对很多事情仍然不确定,我仍然希望看到一个可以纠正本文中任何误解的答案。)

    我想我到了某个地方,但不是 kprobes .我不确定,但似乎我设法获得了正确的物理地址;然而,kprobes文档是特定的,当使用“@ADDR:在 ADDR 处获取内存(ADDR 应该在内核中)”时;并且我得到的物理地址低于 0xc0000000 的内核边界(但是,0xc0000000 通常与虚拟内存布局一起?)。

    所以我改用了硬件断点 - 模块在下面,但是需要注意 emptor - 它的行为是随机的,偶尔会导致内核 oops!。通过编译模块,并在 bash 中运行:

    $ sudo bash -c 'KDBGPATH="/sys/kernel/debug/tracing" ;
    echo function_graph > $KDBGPATH/current_tracer ; echo funcgraph-abstime > $KDBGPATH/trace_options
    echo funcgraph-proc > $KDBGPATH/trace_options ; echo 8192 > $KDBGPATH/buffer_size_kb ;
    echo 0 > $KDBGPATH/tracing_on ; echo > $KDBGPATH/trace'
    $ sudo insmod ./callmodule.ko && sleep 0.1 && sudo rmmod callmodule && \
    tail -n25 /var/log/syslog | tee log.txt && \
    sudo cat /sys/kernel/debug/tracing/trace >> log.txt
    

    ...我得到一个日志。我想跟踪 main() 的前两条指令的 wtest ,对我来说是:
    $ objdump -S wtest/wtest | grep -A3 'int main'
    int main(void) {
     8048474:   55                      push   %ebp
     8048475:   89 e5                   mov    %esp,%ebp
     8048477:   83 e4 f0                and    $0xfffffff0,%esp
    

    ...在虚拟地址 0x08048474 和 0x08048475 处。在 syslog输出,我可以得到,说:
    ...
    [ 1106.383011] callmodule: parent task a: f40a9940 c: kworker/u:1 p: [14] s: stopped
    [ 1106.383017] callmodule: - wtest [9404]
    [ 1106.383023] callmodule: Trying to walk page table; addr task 0xEAE90CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4
    [ 1106.383029] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
    [ 1106.383049] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
    [ 1106.383067] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
    [ 1106.383083] callmodule: physaddr : (0x080483c0 ->) 0x639ec3c0 : (0x08048474 ->) 0x639ec474
    [ 1106.383106] callmodule: 0x08048474 id [3]
    [ 1106.383113] callmodule: 0x08048475 id [4]
    [ 1106.383118] callmodule: (( 0x08048000 is_vmalloc_addr 0 virt_addr_valid 0 ))
    [ 1106.383130] callmodule: cont pid task a: eae90ca0 c: wtest p: [9404] s: runnable
    [ 1106.383147] initcall callmodule_init+0x0/0x1000 [callmodule] returned with preemption imbalance
    [ 1106.518074] callmodule: < exit
    

    ...意味着它将虚拟地址 0x08048474 映射到物理地址 0x639ec474。但是,物理断点不用于硬件断点 - 我们可以直接向 register_user_hw_breakpoint 提供虚拟地址。 ;但是,我们还需要提供 task_struct过程也是。有了这个,我可以在 ftrace 中得到这样的东西输出:
    ...
      597.907256 |   1)   wtest-5339   |               |  handle_mm_fault() {
    ...
      597.907310 |   1)   wtest-5339   | + 35.627 us   |      }
      597.907311 |   1)   wtest-5339   | + 46.245 us   |    }
      597.907312 |   1)   wtest-5339   | + 56.143 us   |  }
      597.907313 |   1)   wtest-5339   |   1.039 us    |  up_read();
      597.907317 |   1)   wtest-5339   |   1.285 us    |  native_get_debugreg();
      597.907319 |   1)   wtest-5339   |   1.075 us    |  native_set_debugreg();
      597.907322 |   1)   wtest-5339   |   1.129 us    |  native_get_debugreg();
      597.907324 |   1)   wtest-5339   |   1.189 us    |  native_set_debugreg();
      597.907329 |   1)   wtest-5339   |               |  () {
      597.907333 |   1)   wtest-5339   |               |  /* callmodule: hwbp hit: id [3] */
      597.907334 |   1)   wtest-5339   |   5.567 us    |  }
      597.907336 |   1)   wtest-5339   |   1.123 us    |  native_set_debugreg();
      597.907339 |   1)   wtest-5339   |   1.130 us    |  native_get_debugreg();
      597.907341 |   1)   wtest-5339   |   1.075 us    |  native_set_debugreg();
      597.907343 |   1)   wtest-5339   |   1.075 us    |  native_get_debugreg();
      597.907345 |   1)   wtest-5339   |   1.081 us    |  native_set_debugreg();
      597.907348 |   1)   wtest-5339   |               |  () {
      597.907350 |   1)   wtest-5339   |               |  /* callmodule: hwbp hit: id [4] */
      597.907351 |   1)   wtest-5339   |   3.033 us    |  }
      597.907352 |   1)   wtest-5339   |   1.105 us    |  native_set_debugreg();
      597.907358 |   1)   wtest-5339   |   1.315 us    |  down_read_trylock();
      597.907360 |   1)   wtest-5339   |   1.123 us    |  _cond_resched();
      597.907362 |   1)   wtest-5339   |   1.027 us    |  find_vma();
      597.907364 |   1)   wtest-5339   |               |  handle_mm_fault() {
    ...
    

    ...其中与程序集对应的跟踪由断点 id 标记。值得庆幸的是,正如预期的那样,他们接连不断。然而,ftrace还捕获了一些中间的调试命令。无论如何,这就是我想看到的。

    以下是有关该模块的一些注意事项:
  • 大部分模块来自Execute/invoke user-space program, and get its pid, from a kernel module ;启动用户进程并获取pid
  • 由于我们必须进入 task_struct 才能获取 pid;在这里我保存了两个(有点多余)
  • 不导出函数符号的地方;如果符号在 kallsyms ,然后我使用一个指向地址的函数指针;其他需要的功能从源
  • 复制
  • 我不知道如何启动已停止的用户空间进程,因此在生成后我发出 SIGSTOP (就其本身而言,当时似乎有点不可靠),并将状态设置为 __TASK_STOPPED )。
  • 我可能仍然会在我不期望的情况下获得“可运行”状态 - 但是,如果 init 提前退出并出现错误,我已经注意到 wtest在它自然终止后很长时间卡在进程列表中,所以我想这是可行的。
  • 为了获得绝对/物理地址,我使用了 Walking page tables of a process in Linux去到一个虚拟地址对应的页面,然后通过内核源码我发现page_to_phys()到达地址(内部通过页框号); LDD3 ch.15 有助于理解 pfn 和物理地址之间的关系。
  • 因为这里我希望有物理地址,所以我不使用 PAGE_SHIFT,而是直接从 objdump 计算偏移量的汇编输出 - 不过,我不是 100% 确定这是正确的。
  • 注意,(另见 How to get a struct page from any address in the Linux kernel),模块输出表明虚拟地址 0x08048000两者都不是 is_vmalloc_addr也不是 virt_addr_valid ;我想,这应该告诉我,两者都不能使用 vmalloc_to_pfn()也不是 virt_to_page()到达它的物理地址!?
  • 设置 kprobesftrace从内核空间有点棘手(需要复制函数)
  • 试图设置 kprobe在我得到的物理地址上(例如 0x639ec474),结果总是“无法插入探针(-22)”
  • 只是为了查看格式是否被解析,我正在尝试使用 kallsyms tracing_on()的地址下面的函数(0xc10bcf60);这似乎有效 - 因为它引发了一个致命的“BUG:在原子性时调度”(显然,我们不打算在 module_init 中设置断点?)。 Bug 是致命的,因为它使 kprobes目录从 ftrace 消失调试目录
  • 刚刚创建 kprobe不会让它出现在 ftrace 中日志 - 它还需要启用;启用所需的代码在那里 - 但我从未尝试过,因为以前的错误
  • 最后,断点设置来自Watch a variable (memory address) change in Linux kernel, and print stack trace when it changes?
  • 我从未见过设置可执行硬件断点的示例;它对我来说一直失败,直到通过内核源代码搜索,我发现 HW_BREAKPOINT_X , attr.bp_len需要设置为 sizeof(long)
  • 如果我尝试 printk attr变量 - 来自 _init 或处理程序 - 某些东西被严重搞砸了,无论我接下来尝试打印什么变量,我都会得到它的值 0x5(或 0x48)(?!)
  • 由于我试图对两个断点使用单个处理程序函数,因此从 _init 到处理程序幸存下来的唯一可靠信息,能够区分两者,似乎是 bp->id
  • 这些 id 是自动分配的,如果您取消注册断点,它们似乎不会被重新声明(我不会取消注册它们以避免额外的 ftrace 打印输出)。

  • 就随机性而言,我认为这是因为进程不是在停止状态下启动的;当它停止时,它最终处于不同的状态(或者,很可能,我在某处丢失了一些锁定)。无论如何,你也可以期待在syslog :
    [ 1661.815114] callmodule: Trying to walk page table; addr task 0xEAF68CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4
    [ 1661.815319] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
    [ 1661.815837] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
    [ 1661.816846] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
    

    ...也就是说,即使有正确的任务指针(根据 start_code 判断),也只能获得 0x0 作为物理地址。有时您会得到相同的结果,但使用 start_code: 0x00000000 ->end_code: 0x00000000 .有时,一个 task_struct无法获得,即使 pid 可以:
    [  833.380417] callmodule:c: pid 7663
    [  833.380424] callmodule: everything all right; pid 7663 (7663)
    [  833.380430] callmodule: p is NULL - exiting
    [  833.516160] callmodule: < exit
    

    好吧,希望有人能评论和澄清这个模块的一些行为 :)希望这可以帮助某人,
    干杯!
    Makefile :
    EXTRA_CFLAGS=-g -O0
    obj-m += callmodule.o
    all:
      make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
    clean:
      make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
    
    callmodule.c :

    #include <linux/module.h>
    #include <linux/slab.h> //kzalloc
    #include <linux/syscalls.h> // SIGCHLD, ... sys_wait4, ...
    #include <linux/kallsyms.h> // kallsyms_lookup, print_symbol
    #include <linux/highmem.h> // ‘kmap_atomic’ (via pte_offset_map)
    #include <asm/io.h> // page_to_phys (arch/x86/include/asm/io.h)
    
    struct subprocess_infoB; // forward declare
    // global variable - to avoid intervening too much in the return of call_usermodehelperB:
    static int callmodule_pid;
    static struct subprocess_infoB* callmodule_infoB;
    #define TRY_USE_KPROBES 0 // 1 // enable/disable kprobes usage code
    #include <linux/kprobes.h> // enable_kprobe
    // for hardware breakpoint:
    #include <linux/perf_event.h>
    #include <linux/hw_breakpoint.h>
    
    // define a modified struct (with extra fields) here:
    struct subprocess_infoB {
      struct work_struct work;
      struct completion *complete;
      char *path;
      char **argv;
      char **envp;
      int wait; //enum umh_wait wait;
      int retval;
      int (*init)(struct subprocess_info *info);
      void (*cleanup)(struct subprocess_info *info);
      void *data;
      pid_t pid;
      struct task_struct *task;
      unsigned long long last_page_physaddr;
    };
    
    struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
                              char **envp, gfp_t gfp_mask);
    
    static inline int
    call_usermodehelper_fnsB(char *path, char **argv, char **envp,
                int wait, //enum umh_wait wait,
                int (*init)(struct subprocess_info *info),
                void (*cleanup)(struct subprocess_info *), void *data)
    {
      struct subprocess_info *info;
      struct subprocess_infoB *infoB;
      gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
      int ret;
    
      populate_rootfs_wait();
    
      infoB = call_usermodehelper_setupB(path, argv, envp, gfp_mask);
      printk(KBUILD_MODNAME ":a: pid %d\n", infoB->pid);
      info = (struct subprocess_info *) infoB;
    
      if (info == NULL)
          return -ENOMEM;
    
      call_usermodehelper_setfns(info, init, cleanup, data);
      printk(KBUILD_MODNAME ":b: pid %d\n", infoB->pid);
    
      // this must be called first, before infoB->pid is populated (by __call_usermodehelperB):
      ret = call_usermodehelper_exec(info, wait);
    
      // assign global pid (and infoB) here, so rest of the code has it:
      callmodule_pid = infoB->pid;
      callmodule_infoB = infoB;    
      printk(KBUILD_MODNAME ":c: pid %d\n", callmodule_pid);
    
      return ret;
    }
    
    static inline int
    call_usermodehelperB(char *path, char **argv, char **envp, int wait) //enum umh_wait wait)
    {
      return call_usermodehelper_fnsB(path, argv, envp, wait,
                         NULL, NULL, NULL);
    }
    
    static void __call_usermodehelperB(struct work_struct *work)
    {
      struct subprocess_infoB *sub_infoB =
          container_of(work, struct subprocess_infoB, work);
      int wait = sub_infoB->wait; // enum umh_wait wait = sub_info->wait;
      pid_t pid;
      struct subprocess_info *sub_info;
      // hack - declare function pointers
      int (*ptrwait_for_helper)(void *data);
      int (*ptr____call_usermodehelper)(void *data);
      // assign function pointers to verbatim addresses as obtained from /proc/kallsyms
      int killret;
      struct task_struct *spawned_task;
      ptrwait_for_helper = (void *)0xc1065b60;
      ptr____call_usermodehelper = (void *)0xc1065ed0;
    
      sub_info = (struct subprocess_info *)sub_infoB;
    
      if (wait == UMH_WAIT_PROC)
          pid = kernel_thread((*ptrwait_for_helper), sub_info, //(wait_for_helper, sub_info,
                      CLONE_FS | CLONE_FILES | SIGCHLD);
      else
          pid = kernel_thread((*ptr____call_usermodehelper), sub_info, //(____call_usermodehelper, sub_info,
                      CLONE_VFORK | SIGCHLD);
    
      spawned_task = pid_task(find_vpid(pid), PIDTYPE_PID);
    
      // stop/suspend/pause task
      killret = kill_pid(find_vpid(pid), SIGSTOP, 1); 
      if (spawned_task!=NULL) {
        // does this stop the process really?
        spawned_task->state = __TASK_STOPPED;
        printk(KBUILD_MODNAME ": : exst %d exco %d exsi %d diex %d inex %d inio %d\n", spawned_task->exit_state, spawned_task->exit_code, spawned_task->exit_signal, spawned_task->did_exec, spawned_task->in_execve, spawned_task->in_iowait);
      }
      printk(KBUILD_MODNAME ": : (kr: %d)\n", killret);
      printk(KBUILD_MODNAME ": : pid %d (%p) (%s)\n", pid, spawned_task,
        (spawned_task!=NULL)?((spawned_task->state==-1)?"unrunnable":((spawned_task->state==0)?"runnable":"stopped")):"null" );
      // grab and save the pid (and task_struct) here:
      sub_infoB->pid = pid;
      sub_infoB->task = spawned_task;
        switch (wait) {
        case UMH_NO_WAIT:
            call_usermodehelper_freeinfo(sub_info);
            break;
        case UMH_WAIT_PROC:
            if (pid > 0)
                break;
            /* FALLTHROUGH */
        case UMH_WAIT_EXEC:
            if (pid < 0)
                sub_info->retval = pid;
            complete(sub_info->complete);
        }
    }
    
    struct subprocess_infoB *call_usermodehelper_setupB(char *path, char **argv,
                              char **envp, gfp_t gfp_mask)
    {
        struct subprocess_infoB *sub_infoB;
        sub_infoB = kzalloc(sizeof(struct subprocess_infoB), gfp_mask);
        if (!sub_infoB)
            goto out;
    
        INIT_WORK(&sub_infoB->work, __call_usermodehelperB);
        sub_infoB->path = path;
        sub_infoB->argv = argv;
        sub_infoB->envp = envp;
      out:
        return sub_infoB;
    }
    
    #if TRY_USE_KPROBES
    // copy from /kernel/trace/trace_probe.c (is unexported)
    int traceprobe_command(const char *buf, int (*createfn)(int, char **))
    {
      char **argv;
      int argc, ret;
    
      argc = 0;
      ret = 0;
      argv = argv_split(GFP_KERNEL, buf, &argc);
      if (!argv)
        return -ENOMEM;
    
      if (argc)
        ret = createfn(argc, argv);
    
      argv_free(argv);
    
      return ret;
    }
    
    // copy from kernel/trace/trace_kprobe.c?v=2.6.38 (is unexported)
    #define TP_FLAG_TRACE   1
    #define TP_FLAG_PROFILE 2
    typedef void (*fetch_func_t)(struct pt_regs *, void *, void *);
    struct fetch_param {
      fetch_func_t    fn;
      void *data;
    };
    typedef int (*print_type_func_t)(struct trace_seq *, const char *, void *, void *);
    enum {
      FETCH_MTD_reg = 0,
      FETCH_MTD_stack,
      FETCH_MTD_retval,
      FETCH_MTD_memory,
      FETCH_MTD_symbol,
      FETCH_MTD_deref,
      FETCH_MTD_END,
    };
    // Fetch type information table * /
    struct fetch_type {
      const char      *name;          /* Name of type */
      size_t          size;           /* Byte size of type */
      int             is_signed;      /* Signed flag */
      print_type_func_t       print;  /* Print functions */
      const char      *fmt;           /* Fromat string */
      const char      *fmttype;       /* Name in format file */
      // Fetch functions * /
      fetch_func_t    fetch[FETCH_MTD_END];
    };
    struct probe_arg {
      struct fetch_param      fetch;
      struct fetch_param      fetch_size;
      unsigned int            offset; /* Offset from argument entry */
      const char              *name;  /* Name of this argument */
      const char              *comm;  /* Command of this argument */
      const struct fetch_type *type;  /* Type of this argument */
    };
    struct trace_probe {
      struct list_head        list;
      struct kretprobe        rp;     /* Use rp.kp for kprobe use */
      unsigned long           nhit;
      unsigned int            flags;  /* For TP_FLAG_* */
      const char              *symbol;        /* symbol name */
      struct ftrace_event_class       class;
      struct ftrace_event_call        call;
      ssize_t                 size;           /* trace entry size */
      unsigned int            nr_args;
      struct probe_arg        args[];
    };
    static  int probe_is_return(struct trace_probe *tp)
    {
      return tp->rp.handler != NULL;
    }
    static int probe_event_enable(struct ftrace_event_call *call)
    {
      struct trace_probe *tp = (struct trace_probe *)call->data;
    
      tp->flags |= TP_FLAG_TRACE;
      if (probe_is_return(tp))
        return enable_kretprobe(&tp->rp);
      else
        return enable_kprobe(&tp->rp.kp);
    }
    #define KPROBE_EVENT_SYSTEM "kprobes"
    #endif // TRY_USE_KPROBES
    
    // <<<<<<<<<<<<<<<<<<<<<<
    
    static struct page *walk_page_table(unsigned long addr, struct task_struct *intask)
    {
      pgd_t *pgd;
      pte_t *ptep, pte;
      pud_t *pud;
      pmd_t *pmd;
    
      struct page *page = NULL;
      struct mm_struct *mm = intask->mm;
    
      callmodule_infoB->last_page_physaddr = 0ULL; // reset here, in case of early exit
    
      printk(KBUILD_MODNAME ": walk_ 0x%lx ", addr);
    
      pgd = pgd_offset(mm, addr);
      if (pgd_none(*pgd) || pgd_bad(*pgd))
        goto out;
      printk(KBUILD_MODNAME ": Valid pgd ");
    
      pud = pud_offset(pgd, addr);
      if (pud_none(*pud) || pud_bad(*pud))
        goto out;
      printk( ": Valid pud");
    
      pmd = pmd_offset(pud, addr);
      if (pmd_none(*pmd) || pmd_bad(*pmd))
        goto out;
      printk( ": Valid pmd");
    
      ptep = pte_offset_map(pmd, addr);
      if (!ptep)
        goto out;
      pte = *ptep;
    
      page = pte_page(pte);
      if (page) {
        callmodule_infoB->last_page_physaddr = (unsigned long long)page_to_phys(page);
        printk( ": page frame struct is @ %p; *virtual (page_address) @ %p (is_vmalloc_addr %d virt_addr_valid %d virt_to_phys 0x%llx) page_to_pfn %lx page_to_phys 0x%llx", page, page_address(page), is_vmalloc_addr((void*)page_address(page)), virt_addr_valid(page_address(page)), (unsigned long long)virt_to_phys(page_address(page)), page_to_pfn(page), callmodule_infoB->last_page_physaddr);
      }
    
      //~ pte_unmap(ptep);
    
    out:
      printk("\n");
      return page;
    }
    
    static void sample_hbp_handler(struct perf_event *bp,
                 struct perf_sample_data *data,
                 struct pt_regs *regs)
    {
      trace_printk(KBUILD_MODNAME ": hwbp hit: id [%llu]\n", bp->id );
      //~ unregister_hw_breakpoint(bp);
    }
    
    // ----------------------
    
    static int __init callmodule_init(void)
    {
      int ret = 0;
      char userprog[] = "/path/to/wtest";
      char *argv[] = {userprog, "2", NULL };
      char *envp[] = {"HOME=/", "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };
      struct task_struct *p;
      struct task_struct *par;
      struct task_struct *pc;
      struct list_head *children_list_head;
      struct list_head *cchildren_list_head;
      char *state_str;
      unsigned long offset, taddr;
      int (*ptr_create_trace_probe)(int argc, char **argv); 
      struct trace_probe* (*ptr_find_probe_event)(const char *event, const char *group);
      //int (*ptr_probe_event_enable)(struct ftrace_event_call *call); // not exported, copy
      #if TRY_USE_KPROBES
      char trcmd[256] = "";
      struct trace_probe *tp;
      #endif //TRY_USE_KPROBES
      struct perf_event *sample_hbp, *sample_hbpb;
      struct perf_event_attr attr, attrb;
    
      printk(KBUILD_MODNAME ": > init %s\n", userprog);
    
      ptr_create_trace_probe = (void *)0xc10d5120;
      ptr_find_probe_event = (void *)0xc10d41e0;
      print_symbol(KBUILD_MODNAME ": symbol @ 0xc1065b60 is %s\n", 0xc1065b60); // shows wait_for_helper+0x0/0xb0
      print_symbol(KBUILD_MODNAME ": symbol @ 0xc1065ed0 is %s\n", 0xc1065ed0); // shows ____call_usermodehelper+0x0/0x90
      print_symbol(KBUILD_MODNAME ": symbol @ 0xc10d5120 is %s\n", 0xc10d5120); // shows create_trace_probe+0x0/0x590
      ret = call_usermodehelperB(userprog, argv, envp, UMH_WAIT_EXEC); 
      if (ret != 0)
          printk(KBUILD_MODNAME ": error in call to usermodehelper: %i\n", ret);
      else
          printk(KBUILD_MODNAME ": everything all right; pid %d (%d)\n", callmodule_pid, callmodule_infoB->pid);
      tracing_on(); // earlier, so trace_printk of handler is caught!
      // find the task:
      rcu_read_lock();
      p = pid_task(find_vpid(callmodule_pid), PIDTYPE_PID);
      rcu_read_unlock();
      if (p == NULL) {
        printk(KBUILD_MODNAME ": p is NULL - exiting\n");
        return 0;
      }
      state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
      printk(KBUILD_MODNAME ": pid task a: %p c: %s p: [%d] s: %s\n",
        p, p->comm, p->pid, state_str);
      // find parent task:
      par = p->parent;
      if (par == NULL) {
        printk(KBUILD_MODNAME ": par is NULL - exiting\n");
        return 0;
      }
      state_str = (par->state==-1)?"unrunnable":((par->state==0)?"runnable":"stopped");
      printk(KBUILD_MODNAME ": parent task a: %p c: %s p: [%d] s: %s\n",
        par, par->comm, par->pid, state_str);
    
      // iterate through parent's (and our task's) child processes:
      rcu_read_lock(); // read_lock(&tasklist_lock);
      list_for_each(children_list_head, &par->children){
        p = list_entry(children_list_head, struct task_struct, sibling);
        printk(KBUILD_MODNAME ": - %s [%d] \n", p->comm, p->pid);
        if (p->pid == callmodule_pid) {
          list_for_each(cchildren_list_head, &p->children){
            pc = list_entry(cchildren_list_head, struct task_struct, sibling);
            printk(KBUILD_MODNAME ": - - %s [%d] \n", pc->comm, pc->pid);
          }
        }
      }
      rcu_read_unlock(); //~ read_unlock(&tasklist_lock);
    
      // NOTE: here p == callmodule_infoB->task !!
      printk(KBUILD_MODNAME ": Trying to walk page table; addr task 0x%X ->mm ->start_code: 0x%08lX ->end_code: 0x%08lX \n", (unsigned int) callmodule_infoB->task, callmodule_infoB->task->mm->start_code, callmodule_infoB->task->mm->end_code);
      walk_page_table(0x08048000, callmodule_infoB->task);
      // 080483c0 is start of .text; 08048474 start of main; for objdump -S wtest
      walk_page_table(0x080483c0, callmodule_infoB->task);
      walk_page_table(0x08048474, callmodule_infoB->task);
    
      if (callmodule_infoB->last_page_physaddr != 0ULL) {
        printk(KBUILD_MODNAME ": physaddr ");
        taddr = 0x080483c0; // .text
        offset = taddr - callmodule_infoB->task->mm->start_code;
        printk(": (0x%08lx ->) 0x%08llx ", taddr, callmodule_infoB->last_page_physaddr+offset);
        taddr = 0x08048474; // main
        offset = taddr - callmodule_infoB->task->mm->start_code;
        printk(": (0x%08lx ->) 0x%08llx ", taddr, callmodule_infoB->last_page_physaddr+offset);
        printk("\n");
    
        #if TRY_USE_KPROBES // can't use this here (BUG: scheduling while atomic, if probe inserts)
        //~ sprintf(trcmd, "p:myprobe 0x%08llx", callmodule_infoB->last_page_physaddr+offset);
        // try symbol for c10bcf60 - tracing_on
        sprintf(trcmd, "p:myprobe 0x%08llx", (unsigned long long)0xc10bcf60);
        ret = traceprobe_command(trcmd, ptr_create_trace_probe); //create_trace_probe);
        printk("%s -- ret: %d\n", trcmd, ret);
        // try find probe and enable it (compiles, but untested):
        tp = ptr_find_probe_event("myprobe", KPROBE_EVENT_SYSTEM);
        if (tp != NULL) probe_event_enable(&tp->call);
        #endif //TRY_USE_KPROBES
      }
    
      hw_breakpoint_init(&attr);
      attr.bp_len = sizeof(long); //HW_BREAKPOINT_LEN_1;
      attr.bp_type = HW_BREAKPOINT_X ;
      attr.bp_addr = 0x08048474; // main
      sample_hbp = register_user_hw_breakpoint(&attr, (perf_overflow_handler_t)sample_hbp_handler, p);
      printk(KBUILD_MODNAME ": 0x08048474 id [%llu]\n", sample_hbp->id); //
      if (IS_ERR((void __force *)sample_hbp)) {
        int ret = PTR_ERR((void __force *)sample_hbp);
        printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n", ret);
        //~ return ret;
      }
    
      hw_breakpoint_init(&attrb);
      attrb.bp_len = sizeof(long);
      attrb.bp_type = HW_BREAKPOINT_X ;
      attrb.bp_addr = 0x08048475; // first instruction after main
      sample_hbpb = register_user_hw_breakpoint(&attrb, (perf_overflow_handler_t)sample_hbp_handler, p);
      printk(KBUILD_MODNAME ": 0x08048475 id [%llu]\n", sample_hbpb->id); //45
      if (IS_ERR((void __force *)sample_hbpb)) {
        int ret = PTR_ERR((void __force *)sample_hbpb);
        printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n", ret);
        //~ return ret;
      }
    
      printk(KBUILD_MODNAME ": (( 0x08048000 is_vmalloc_addr %d virt_addr_valid %d ))\n", is_vmalloc_addr((void*)0x08048000), virt_addr_valid(0x08048000));
    
      kill_pid(find_vpid(callmodule_pid), SIGCONT, 1); // resume/continue/restart task
      state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
      printk(KBUILD_MODNAME ": cont pid task a: %p c: %s p: [%d] s: %s\n",
        p, p->comm, p->pid, state_str);
    
      return 0;
    }
    
    static void __exit callmodule_exit(void)
    {
      tracing_off(); //corresponds to the user space /sys/kernel/debug/tracing/tracing_on file
      printk(KBUILD_MODNAME ": < exit\n");
    }
    
    module_init(callmodule_init);
    module_exit(callmodule_exit);
    MODULE_LICENSE("GPL");
    

    关于linux - 使用 ftrace 和 kprobes(通过使用虚拟地址转换)捕获用户空间程序集?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21629045/

    相关文章:

    linux - Linux 调度程序是否更喜欢在 fork() 之后运行子进程?

    c++ - 在 codelite 中调试 C++

    android - 调试端口 (8700) 正忙,请确保没有其他 Activity 的调试连接到同一应用程序?

    linux - 与 ssh 一起使用时看不到 nohup 命令的输出

    c - valgrind - 地址 ---- 在分配大小为 8 的 block 之后为 0 字节

    c++ - 如何在 Qt 项目(.pro)文件中指定 Linux 架构?

    c - C 简介 : Addition and Looping

    c - 内存映射内核空间的解剖结构

    c - 内核模块 : No printk messages showing. 是否调用了 init 函数?

    memory - 不可逐出的页面有什么特殊之处?