c++ - 编译器如何处理 a[i] 其中 a 是数组？如果 a 是一个指针呢？

c-faq 告诉我当 a 是数组或指针时，该编译器会做不同的事情来处理 a[i]。以下是来自 c-faq 的示例:

char a[] = "hello";
char *p = "world";
Given the declarations above, when the compiler sees the expression a[3], it emits code to start at the location ``a'', move three past it, and fetch the character there. When it sees the expression p[3], it emits code to start at the location ``p'', fetch the pointer value there, add three to the pointer, and finally fetch the character pointed to.

但是有人告诉我，在处理 a[i] 时，编译器倾向于将 a(它是一个数组)转换为指向数组的指针。所以我想看看汇编代码，看看哪个是对的。

编辑:

这是该声明的来源。 c-faq 并注意这句话:

an expression of the form a[i] causes the array to decay into a pointer, following the rule above, and then to be subscripted just as would be a pointer variable in the expression p[i] (although the eventual memory accesses will be different, "

我对此感到很困惑:既然 a 已经衰减为指针，那么他为什么要说“内存访问会有所不同？”

这是我的代码:

// array.cpp
#include <cstdio>
using namespace std;

int main()
{
    char a[6] = "hello";
    char *p = "world";
    printf("%c\n", a[3]);
    printf("%c\n", p[3]);
}

这是我使用 g++ -S array.cpp 得到的部分汇编代码

    .file   "array.cpp" 
    .section    .rodata
.LC0:
    .string "world"
.LC1:
    .string "%c\n"
    .text
.globl main
    .type   main, @function
main:
.LFB2:
    leal    4(%esp), %ecx
.LCFI0:
    andl    $-16, %esp
    pushl   -4(%ecx)
.LCFI1:
    pushl   %ebp
.LCFI2:
    movl    %esp, %ebp
.LCFI3:
    pushl   %ecx
.LCFI4:
    subl    $36, %esp
.LCFI5:
    movl    $1819043176, -14(%ebp)
    movw    $111, -10(%ebp)
    movl    $.LC0, -8(%ebp)
    movzbl  -11(%ebp), %eax
    movsbl  %al,%eax
    movl    %eax, 4(%esp)
    movl    $.LC1, (%esp)
    call    printf
    movl    -8(%ebp), %eax
    addl    $3, %eax
    movzbl  (%eax), %eax
    movsbl  %al,%eax
    movl    %eax, 4(%esp)
    movl    $.LC1, (%esp)
    call    printf
    movl    $0, %eax
    addl    $36, %esp
    popl    %ecx
    popl    %ebp
    leal    -4(%ecx), %esp
    ret

我无法从上面的代码中弄清楚 a[3] 和 p[3] 的机制。如:

“hello”在哪里初始化？
1819043176 美元是什么意思？可能是“hello”的内存地址(a的地址)？
我确定“-11(%ebp)”表示 a[3]，但为什么呢？
在“movl -8(%ebp), %eax”中，指针p的内容存储在EAX中，对吧？那么 $.LC0 是指指针 p 的内容吗？
“movsbl %al,%eax”是什么意思？
另外，请注意以下 3 行代码:
movl $1819043176, -14(%ebp)
movw $111, -10(%ebp)
movl $.LC0, -8(%ebp)

最后一个使用“movl”，但为什么没有覆盖-10(%ebp)的内容？ (我现在知道分析器了 :)，地址是增量的，“movl $.LC0 -8(%ebp) 只会覆盖 {-8, -7, -6, -5}(%ebp))

对不起，我对机制以及汇编代码完全感到困惑......

非常感谢您的帮助。

最佳答案

a 是指向字符数组的指针。 p 是一个指向 char 的指针，在这种情况下，它恰好指向一个字符串字面量。

movl    $1819043176, -14(%ebp)
movw    $111, -10(%ebp)

初始化堆栈上的本地“hello”(这就是通过 ebp 引用它的原因)。由于“hello”有4个多字节，所以需要两条指令。

movzbl  -11(%ebp), %eax
movsbl  %al,%eax

References a[3]:这两个步骤的过程是因为在访问通过 ebp 引用的内存方面存在限制(我的 x86-fu 有点生锈)。

movl -8(%ebp), %eax 确实引用了 p 指针。

LC0 引用一个“相对内存”位置:一旦程序加载到内存中，就会分配一个固定的内存位置。

movsbl %al,%eax 的意思是:“移动单个字节，降低”(给予或接受......我必须查一下......我对此有点生疏正面)。 al 表示来自寄存器 eax 的一个字节。

关于c++ - 编译器如何处理 a[i] 其中 a 是数组？如果 a 是一个指针呢？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2073079/

c++ - 编译器如何处理 a[i] 其中 a 是数组？如果 a 是一个指针呢？

上一篇：c# - 创建一个 C# DLL 并从非托管 C++ 中使用它

下一篇：c++ - 将任意位置的 N 位从一个 int 复制到另一个 int 的算法